Asterisk - Sphinx Speech Recognition Engine Plugin

News

Latest Update: 2009-03-02

Updates include:

IMPORTANT! 1.6 code is available again, and all code has major socket fix. Please also note, if you are using 1.6, I suggest using the LATEST svn head, as code was checked into Asterisk svn today (2009-03-02) with a fix to make SpeechBackground() work correctly.

new say.pl that allows pregeneration of files, for proper feeding and care of SpeechBackground().

Engine (client) code now uses non-blocking sockets so Asterisk will not stutter/lag. Many bugfixes, importantly missing options in sphinx.conf will no longer crash the system.

astsphinx, the server code, now comes with the GNU build system for easier installation.

Returns a SPEECH_SCORE with a 0-1000 range, but at this point this score is not especially helpful.

Now supports proper use of SpeechBackground; specifically, if you start talking before SpeechBackground is done playing, it will shut up and listen. Also, SpeechBackground's sound file used to stutter and slow down, this should no longer be an issue. Additionally, the timeout may or may not have worked in the old version, but should work reliably now.

Intro and Contact Info

This page contains some notes and some starting code for integrating the CMU Sphinx Speech Reconition System with the Digium Asterisk PBX, as a drop-in Generic Speech API engine.

Please help me to review this code. Any and all suggestions are welcome, tips and hints, too. I haven't written any C since well before the C99 standard, so I'm very rusty and wasn't good to begin with!

I can be contacted at scribblej@scribblej.com although I check the mail very rarely. If you'd like to get in touch, the fastest and best way to get my attention is as ScribbleJ on IRC, either the Freenode network or the Undernet network.

Note: This code is badly in need of cleaning up, and will be, but I figured I could put off sharing it forever if I waited 'till it was perfect, so why not release it now while it is lousy?

What is here

What is not here

Overview

This is an Asterisk plugin for a client-server integration with the CMU Sphinx voice recognition system. There is a small, simple 'engine' plugin that goes into Asterisk (the client), and a small, simple Sphinx server that it communicates with. They may be on the same computer, or not. This way, the resources required for speech recognition do not necessarily need to compete with the resources required for Asterisk. In addition, this may appeal better to Digium, who could decide to include the 'client' code in Asterisk without needing to link against or distribute Sphinx.

Try it

You can try the above dialplan on my home Asterisk server at the following (Chicago, USA local) number: 312-283-0556

Also, I have not tested yet so I may have set this up incorrectly, but you may be able to reach it at sip:sphinxtest@home.scribblej.com

QuickStart

Just some helpful guidelines. No one should follow these commands verbatim.

Server:

Client

Client

A simple plugin to Asterisk, requires sphinx.conf to be in your configuration directory. Valid configuration settings all go in the [general] section, and are listed here in this example config:

[general]
;ip and port of server
serverip=127.0.0.1
serverport=10069
;silence detection is performed by Asterisk DSP, how long to wait before we consider speech finished.
silencetime=250
;noiseframes; only here for troublehooting, leave set to 0
noiseframes=0
;threshold defines how 'quiet' silence is, try raising to higher numbers if speech is detected too early
silencethreshold=500

In theory, this should work exactly the same as any other Generic Speech Engine API plugin (e.g. LumenVox). Here is a sample dialplan I've used for testing (also makes use of the Festival TTS system and the AGI I use for that is available under 'Odds and ends' above). This dialplan just loops through alternatively asking you to say 'yes' or 'no' and asking you to say a compass direction (i.e. 'north','south','east','west'). The compass direction test is because the system sometimes seems to have difficulty recognizing 'south' specifically, although in my most recent tests it's more reliable (thanks to jaytee on IRC for suggesting this test).

exten => s,1,Answer()
exten => s,n,SpeechCreate(Sphinx)
exten => s,n,AGI(say.pl|'Welcome to the Sphinx and Asterisk integration test.')
exten => s,n,SpeechActivateGrammar(yesno)
exten => s,n,AGI(say.pl|'Please say yes or no.'|1|yornprompt)
exten => s,n,SpeechStart()
exten => s,n,SpeechBackground(/tmp/yornprompt|10)
exten => s,n,SpeechDeactivateGrammar(yesno)
exten => s,n,Log(NOTICE,${SPEECH_TEXT(0)})
exten => s,n,AGI(say.pl|'You said: ${SPEECH_TEXT(0)}')
exten => s,n,SpeechActivateGrammar(compass)
exten => s,n,AGI(say.pl|'Please say a compass direction.'|1|compassprompt)
exten => s,n,SpeechStart()
exten => s,n,SpeechBackground(/tmp/compassprompt,20)
exten => s,n,SpeechDeactivateGrammar(compass)
exten => s,n,AGI(say.pl|'You said: ${SPEECH_TEXT(0)}')
exten => s,n,Goto(4)

Server

This server is written using PocketSphinx 0.5.1 and SphinxBase 0.4.1 - the latest release versions as of this writing. It should also compile cleanly against the latest svn. Also requires you to have the Communicator model, linked above, and some suitable grammars which you can generate using lmgen and cmudict, also above.

Requires a configuration file, something like the below:

-hmm
/home/chris/ast/sphinx/Communicator_semi_40.cd_semi_6000
-dict
dict
-lm
default
-samprate
8000
-frate
50
-silprob
0.005

hmm points to the directory you unpacked the Communicator model into. dict points to your dictionary file. This is NOT cmudict, this is the dictionary generated by lmgen. lm needs to point to any one of your generated grammars (made using lmgen). samprate and frate are required to be set as above for operation with Asterisk.

when running the server, you need to specify on the commandline the port to listen on, the location of the above config file, and the names of all your generated grammars, including the one listed in the above file.

lmgen

lmgen (based somewhat on SimpleLM) creates grammars for use with the server. For input, it requires a copy of cmudict, and a simple text file containing the phrases you'd like to recognize, one per line. This will also require that you have installed the cmuclmtk, available with the other Sphinx Downloads above.

Please note; the dictionary you specify on the commandline is the same one that goes in the server config file above, and you must use the same file for all your grammars. So, for instance, if I have three text files, one with a 'yesno' grammar, one with a 'compass' grammar, and one with a 'phonetree' grammar, I would need to run:

$ lmgen.pl yesno.txt mydict yesno
$ lmgen.pl phonetree.txt mydict phonetree
$ lmgen.pl compass.txt mydict compass

Then my server configfile might look like:

-hmm
/home/chris/ast/sphinx/Communicator_semi_40.cd_semi_6000
-dict
mydict
-lm
yesno
-samprate
8000
-frate
50
-silprob
0.005

And my commandline to start the server might be:

$ astsphinx 10069 ./configfile yesno phonetree compass 2>/dev/null

Thanks