Latest Update: 2009-03-02
Updates include:
IMPORTANT! 1.6 code is available again, and all code has major socket fix. Please also note, if you are using 1.6, I suggest using the LATEST svn head, as code was checked into Asterisk svn today (2009-03-02) with a fix to make SpeechBackground() work correctly.
new say.pl that allows pregeneration of files, for proper feeding and care of SpeechBackground().
Engine (client) code now uses non-blocking sockets so Asterisk will not stutter/lag. Many bugfixes, importantly missing options in sphinx.conf will no longer crash the system.
astsphinx, the server code, now comes with the GNU build system for easier installation.
Returns a SPEECH_SCORE with a 0-1000 range, but at this point this score is not especially helpful.
Now supports proper use of SpeechBackground; specifically, if you start talking before SpeechBackground is done playing, it will shut up and listen. Also, SpeechBackground's sound file used to stutter and slow down, this should no longer be an issue. Additionally, the timeout may or may not have worked in the old version, but should work reliably now.
This page contains some notes and some starting code for integrating the CMU Sphinx Speech Reconition System with the Digium Asterisk PBX, as a drop-in Generic Speech API engine.
Please help me to review this code. Any and all suggestions are welcome, tips and hints, too. I haven't written any C since well before the C99 standard, so I'm very rusty and wasn't good to begin with!
I can be contacted at scribblej@scribblej.com although I check the mail very rarely. If you'd like to get in touch, the fastest and best way to get my attention is as ScribbleJ on IRC, either the Freenode network or the Undernet network.
Note: This code is badly in need of cleaning up, and will be, but I figured I could put off sharing it forever if I waited 'till it was perfect, so why not release it now while it is lousy?
This is an Asterisk plugin for a client-server integration with the CMU Sphinx voice recognition system. There is a small, simple 'engine' plugin that goes into Asterisk (the client), and a small, simple Sphinx server that it communicates with. They may be on the same computer, or not. This way, the resources required for speech recognition do not necessarily need to compete with the resources required for Asterisk. In addition, this may appeal better to Digium, who could decide to include the 'client' code in Asterisk without needing to link against or distribute Sphinx.
You can try the above dialplan on my home Asterisk server at the following (Chicago, USA local) number: 312-283-0556
Also, I have not tested yet so I may have set this up incorrectly, but you may be able to reach it at sip:sphinxtest@home.scribblej.com
A simple plugin to Asterisk, requires sphinx.conf to be in your configuration directory. Valid configuration settings all go in the [general] section, and are listed here in this example config:
[general] ;ip and port of server serverip=127.0.0.1 serverport=10069 ;silence detection is performed by Asterisk DSP, how long to wait before we consider speech finished. silencetime=250 ;noiseframes; only here for troublehooting, leave set to 0 noiseframes=0 ;threshold defines how 'quiet' silence is, try raising to higher numbers if speech is detected too early silencethreshold=500
In theory, this should work exactly the same as any other Generic Speech Engine API plugin (e.g. LumenVox). Here is a sample dialplan I've used for testing (also makes use of the Festival TTS system and the AGI I use for that is available under 'Odds and ends' above). This dialplan just loops through alternatively asking you to say 'yes' or 'no' and asking you to say a compass direction (i.e. 'north','south','east','west'). The compass direction test is because the system sometimes seems to have difficulty recognizing 'south' specifically, although in my most recent tests it's more reliable (thanks to jaytee on IRC for suggesting this test).
exten => s,1,Answer()
exten => s,n,SpeechCreate(Sphinx)
exten => s,n,AGI(say.pl|'Welcome to the Sphinx and Asterisk integration test.')
exten => s,n,SpeechActivateGrammar(yesno)
exten => s,n,AGI(say.pl|'Please say yes or no.'|1|yornprompt)
exten => s,n,SpeechStart()
exten => s,n,SpeechBackground(/tmp/yornprompt|10)
exten => s,n,SpeechDeactivateGrammar(yesno)
exten => s,n,Log(NOTICE,${SPEECH_TEXT(0)})
exten => s,n,AGI(say.pl|'You said: ${SPEECH_TEXT(0)}')
exten => s,n,SpeechActivateGrammar(compass)
exten => s,n,AGI(say.pl|'Please say a compass direction.'|1|compassprompt)
exten => s,n,SpeechStart()
exten => s,n,SpeechBackground(/tmp/compassprompt,20)
exten => s,n,SpeechDeactivateGrammar(compass)
exten => s,n,AGI(say.pl|'You said: ${SPEECH_TEXT(0)}')
exten => s,n,Goto(4)
Requires a configuration file, something like the below:
-hmm /home/chris/ast/sphinx/Communicator_semi_40.cd_semi_6000 -dict dict -lm default -samprate 8000 -frate 50 -silprob 0.005
hmm points to the directory you unpacked the Communicator model into. dict points to your dictionary file. This is NOT cmudict, this is the dictionary generated by lmgen. lm needs to point to any one of your generated grammars (made using lmgen). samprate and frate are required to be set as above for operation with Asterisk.
when running the server, you need to specify on the commandline the port to listen on, the location of the above config file, and the names of all your generated grammars, including the one listed in the above file.
Please note; the dictionary you specify on the commandline is the same one that goes in the server config file above, and you must use the same file for all your grammars. So, for instance, if I have three text files, one with a 'yesno' grammar, one with a 'compass' grammar, and one with a 'phonetree' grammar, I would need to run:
$ lmgen.pl yesno.txt mydict yesno $ lmgen.pl phonetree.txt mydict phonetree $ lmgen.pl compass.txt mydict compass
Then my server configfile might look like:
-hmm /home/chris/ast/sphinx/Communicator_semi_40.cd_semi_6000 -dict mydict -lm yesno -samprate 8000 -frate 50 -silprob 0.005
And my commandline to start the server might be:
$ astsphinx 10069 ./configfile yesno phonetree compass 2>/dev/null