• If you need help or want to discuss things, you now can also join us on our Discord Server!
  • A first preview of the unlimited version of SinusBot can be found in the Upcoming Changes thread. A version for Windows will follow, but we don't have a release date, yet.

Tutorial Using Speech-Recognition

flyth

is reticulating splines
Staff member
Developer
Contributor
This is a little tutorial on how to use speech recognition. The feature is still highly experimental and will cause increased CPU & RAM usage. I've tried to make it so it only activates if it is really necessary.

Some limitations upfront:
  • you need at least version 0.13.37
  • you will need to provide the commands that should be recognized beforehand for now
  • there is no continuous recognition
  • only 3 speakers will be recognized simultaneously, additional speakers will be ignored once one of the initial 3 speakers will stop speaking; in most cases that is more than enough
  • make sure your config.ini contains
    Code:
    [SpeechRecognition]
    Enable = true
So - let's get started!
  1. Download https://www.sinusbot.com/pre/speech.tar.bz2
  2. Extract that file to the same directory as your SinusBot (so you will have a new folder speech inside the SinusBot directory)
  3. Create a script that uses the speech recognition engine
  4. Restart the bot

Code:
registerPlugin({
    name: 'Speech Recognition Demo',
    version: '2.0',
    description: 'This is a simple script that will recognize the command "stop"',
    author: 'Michael Friese <[email protected]>',
    vars: [],
    voiceCommands: ['stop']
}, function(sinusbot, config) {
    var audio = require('audio');
    var event = require('event');
    var engine = require('engine');
    audio.setAudioReturnChannel(2); // this activates speech recognition
    event.on('speech', function(ev) {
        engine.log(ev.client.nick() + ' just said ' + ev.text);
    });
});

So it's pretty simple: you register some voiceCommands in the plugins' manifest. Once one of the registered commands is recognized, the speech event gets triggered with an object containing the clientId and the recognized command (a full client object will be added later on).

Important: Only words that are inside the ./speech/dict file will be recognized. It already contains many english words and their phonetic "translation". If you need words that are not in there, you have to add them (and the translation) manually before you can use them.

I'd love to get some feedback on how it worked out for you. In most simple cases it works well enough, so don't try to recognize full sentences but focus on simple commands. Also, you might want to let the commands start with a word like "bot", so that you don't trigger things by accident ;)
 
Last edited:

mxschmitt

Moderator
Staff member
is awesome!
V.I.P.
is uber awesome!
Contributor
Insider
Yes the speech recognition isn't working in the current beta. Known bug o_O
 

flyth

is reticulating splines
Staff member
Developer
Contributor
Updated first post to be compatible with version 0.13+.
If you're trying this, please make sure you have read EVERYTHING in that post beforehand :)
 

likesox

Donor
is awesome!
Updated first post to be compatible with version 0.13+.
If you're trying this, please make sure you have read EVERYTHING in that post beforehand :)

Does Speech recognition now work? :O

I did followed all steps from the tutorial, but it doesn't work still :D

Love ur work!
 

Maxetto

Member
For me it's working quite good, it's a nice feature that I wanted to try and I'm making good use of it. It's not perfect, but it mostly works.

Be sure to have a license as it is disabled for unlicensed Sinusbot instances.

P.S. I would really love a wildcard command (like "bot play *") that tries to understand every word before a longer pause.
 

likesox

Donor
is awesome!
For me it's working quite good, it's a nice feature that I wanted to try and I'm making good use of it. It's not perfect, but it mostly works.

Be sure to have a license as it is disabled for unlicensed Sinusbot instances.

P.S. I would really love a wildcard command (like "bot play *") that tries to understand every word before a longer pause.

Did you need to do something that helped you getting it to work besides the tut?


Because I did everything mentioned in the first post :/

And Yeah I have the right correct private.dat
:(
 

Maxetto

Member
No, I've done everything exactly how it is stated in the tutorial.

Just be sure to pronunce the command correctly, use Google Translate's TTS if you're not sure.
 

Diesmon

Tuetchen Dominator
is awesome!
Contributor
Insider
you need a stereo channel mic atm. Aka set windows recording to 2 channels
 

Tsunderella

Active Member
edit:Fixed the intial two problem i had

Is their any way to pick up words one after the next or lets say you say bot then it listens for the next 5 seconds for every word said?
 
Last edited:

Runningcore

Well-Known Member
Contributor
edit:Fixed the intial two problem i had

Is their any way to pick up words one after the next or lets say you say bot then it listens for the next 5 seconds for every word said?
Sure just implement an activation function, which stays with timer active for like 5 seconds, already implemented this in that way, works like a charm.
 

Tsunderella

Active Member
Sure just implement an activation function, which stays with timer active for like 5 seconds, already implemented this in that way, works like a charm.
I have a system where you say "justin" then anything along the lines of "stop the music" "turn the music off" "stop playing music" "shut up" "shut the music off" etc, but those area all using voice commands is it possible for the bot to detect words without it being registered voice command.
 

Runningcore

Well-Known Member
Contributor
I have a system where you say "justin" then anything along the lines of "stop the music" "turn the music off" "stop playing music" "shut up" "shut the music off" etc, but those area all using voice commands is it possible for the bot to detect words without it being registered voice command.

As far as i'm aware of the current implementation it is currently not possible, but i guess can be confirmed be someone here.
 
Top