Saturday, May 11, 2013

More Two Way Interaction With Android Speech Recognition

Let's start with the demo first. The video shows commands and queries being spoken and recognized by Android speech recognition. Our web GUI is also in the shot - since the wife forbids me to walk around filming the insides of our house for the world to see :) - so you can at least "see" some of the status being queried and the results of some actions. There are some annotations on the video, but you can't see them on the embedded player. Click through to YouTube to see the video with annotations.



What makes all that stuff work is the queries are passed to a server for processing. That gives the opportunity for two way interactions where you can not only control your system but query it as well. As I mentioned in my previous post on this topic, using IM as a transport mechanism allows the recognized phrase to be sent to the server and the responses sent back to the Android device. Over on our server, EVERY device and its state is logged in our MySQL database. This was done when we built our AJAX based GUI. Also, since our system is distributed, MySQL provides a place for status to be updated and synced between various devices. Below is a snapshot of a phpMyAdmin page showing part of one of our database tables. The table contains the device name, type, its state and when it was turned on and off.



Every device and its state is stored: every light, appliance, AV device, motion sensor, door, window, lock, car, phone, computer, etc. Whenever a device's state changes, a function gets triggered in whatever software is interfacing to that device (which is for the most part Windows scripting like the jScript below):

function setStatusOnOff(device,type,state,secs) {
    try {
        if (state=="off") {
            mysqlrs.Open("insert into status (device,type,state,secs_off) values ('"+device+"','"+type+"','"+state+"','"+secs+"') on duplicate key update state='"+state+"', secs_off='"+secs+"'",mysql);
        } else {
            mysqlrs.Open("insert into status (device,type,state,secs) values ('"+device+"','"+type+"','"+state+"','"+secs+"') on duplicate key update state='"+state+"', secs='"+secs+"'",mysql);
        }
...
}

Since the device name is stored as a normal non-abbreviated name ("family room tv" instead of "frtv"), it's straightforward to use the recognized speech to search for devices using MySQL queries. The next step is to figure out what type of command is being issued. For example, a command will have the phrase "turn on" or "turn off" in it. Since I use Python on the server to process the speech, I use its regular expression (regex) functions to pattern match for commands:

reTurn = re.compile('(^|\s+)turn.*\s+o(n|(f[f]*))($|\s+)',re.I) # recognize "turn on", "turn this and that on", "turn this and that blah blah blah off" or just "turn off", even "turn of" anywhere in a sentence

After figuring out if it's a command or query, my script then strips out extraneous text to simplify extracting the device and type. What gets stripped out depends on how things are phrased in your household. Here's a snippet to do that, where msg is the recognized phrase:

msg = re.sub('(^|\s)+(turn)|(the)|(a)|(can)|(you)|(please)|(will)\s+',' ',msg) # strip out unneeded text

I'm experimenting with natural language processing to strip out unnecessary words automatically but it's not ready yet. Next, the script figures out the type of device involved. For lighting, it would use a regex similar to this:

reLight = re.compile('\s+(light[s]*)|(lamp[s]*)|(chandalier[s]*)|(halogen[s]*)|(sconce[s]*)($|\s?)',re.I)

Since all the extra words have been stripped out and the type has been determined, all that's left is to formulate a MySQL query like this to get the actual device name:

msg = re.sub("\s+","%",msg) # replace spaces with wildcard character %
if reLight.search(msg):
  query="select * from lighting where device like '%"+msg+"%'"

This is necessary to remove ambiguities in the recognition. A light may be named "guestbath" in the HA system, but Google may pass the recognized phrase as "guest bath." With the actual device name, the final steps are to issue the command and send a response back to the Android device. As lights and other devices are added to HA system, nothing else needs to be added. Contrast that with other automation systems where you have to setup a recognition phrase for every device and possibly every state in your system. In our system, new device names will be parsed out of the database, and no changes are required on the Android device. Queries also follow a similar flow, except instead of issuing a command, a response is formulated with the status and sent back to the user.

That's the backend. I'll cover the frontend in another post.

2 comments:

  1. THAT'S AWESOME!
    How do you connect your devices with the system? Raspberry Pi?

    ReplyDelete
    Replies
    1. lots of different ways, x10, zwave, upb, bluetooth, rfid, ir, gameports as contact closure inputs, webcontrol, beaglebone black, hp thin clients...

      Delete