Congratulations to all the winners of April 2013, they have won INR 3400 cash and INR 20147 worth prizes !
Go to DotNetFunda.com
Twitter TwitterLinkedIn
YouTubeGoogle
 Online : 22970 |  Welcome, Guest!   Register  Login
Home > Articles > Kinect > Speech Recognition in Kinect

Speech Recognition in Kinect

2 vote(s)
Rating: 5 out of 5
Article posted by Vuyiswamb on 12/21/2012 | Views: 2079 | Category: Kinect | Level: Beginner | Points: 250 red flag


There are few things that I have seen while developing Kinect examples, I wouldn't want to say apps, because I have not yet developed a full-fledged application. Kinect user interface, is different from the traditional user interface, where we will see buttons that needs to be clicked. Using voice you can control or do things that you used to do with a mouse, using hand gesture you can control your application like you used to do with a mouse.
In this article i will demonstrate to you, on how you can control your application using voice commands.

Download


 Download source code for Speech Recognition in Kinect



Introduction

 
There are few things that I have seen while developing Kinect examples, I wouldn't want to say apps, because I have not yet developed a full-fledged application. Kinect user interface, is different from the traditional user interface, where we will see buttons that needs to be clicked. Using voice you can control or do things that you used to do with a mouse, using hand gesture you can control your application like you used to do with a mouse.
In this article i will demonstrate to you, on how you can control your application using voice commands.
 

Objective

 
The objective of this article is to demonstrate to you on how to control your application using voice commands instead of a mouse.   Authors Preface  
When I started creating the example of this article, I first created buttons and later i removed those buttons. The reason I finally removed the buttons, is because I did not need them anymore, I could control my example app using voice commands and the nice part is that I controlled the application using my mother tongue. This might be the last I article I will write this year, I will be going a small pause for two weeks, and after that I might give you a new one before the 3rd of January 2013.

Name Spaces

 
There are namespaces that you will need to add which you never added when following my previous article on the subject of Microsoft Kinect.

Figure 1.1
 
The common location of this namespace is
C:\Windows\assembly\GAC_MSIL\Microsoft.Speech\11.0.0.0__31bf3856ad364e35\Microsoft.Speech.dll 
 
After you have added the required namespace’s , we must setup our grammar file , So Basically we want to create an application that plays a video , but we don’t want to pause or play or stop the video using  a mouse or clicking a button, we want to speak commands and the application must respond otherwise.
 

Setup the Grammar File

 
The grammar file is just an xml file with the following tags
 

Rule

 
A rule definition is represented by the rule element. The id attribute of the element indicates the name of the rule and must be unique within the grammar (this is enforced by XML). 
 

ITEM

 
An item element can surround any expansion to permit a repeat attribute or language identifier to be attached. The weight attribute of item is ignored unless the element appears within a one-of element.
 

TAG


A tag is a legal rule expansion (a tag can also be declared in the grammar header - see S4.1).
A tag is an arbitrary string that may be included inline within any legal rule expansion. Any number of tags may be included inline within a rule expansion.
Tags do not affect the legal word patterns defined by the grammars or the process of recognizing speech or other input given a grammar.
Tags may contain content for semantic interpretation. The semantic interpretation processes may affect the recognition result. 
 

Grammar 

 
tag is a legal rule expansion (a tag can also be declared in the grammar header - see S4.1).
A tag is an arbitrary string that may be included inline within any legal rule expansion. Any number of tags may be included inline within a rule expansion.
Tags do not affect the legal word patterns defined by the grammars or the process of recognizing speech or other input given a grammar.
Tags may contain content for semantic interpretation. The semantic interpretation processes may affect the recognition result. 
 

Recap

 
Wikipedia has lots of articles on grammar file specifications. Basically this file contains the possible words that can be used to control our application. When we are done, we will create resource for our file and access it in our application as a resource as depicted below
 

Figure 1.2

You full grammar file for this example should look like this
 

Full Grammar File

 
<grammar version="1.0" xml:lang="en-US" tag-format="semantics/1.0-literals" xmlns="http://www.w3.org/2001/06/grammar">
 
    <rule id="PLAYrule" scope="public"> 
      <one-of>
      <item>
        <tag>PLAY</tag>
        <one-of>
          <item>Cala</item> <!--In Zulu(Mother Tounge) it means Start-->
          <item>Start Video</item>
          <item>Dlala</item> <!--In Zulu(Mother Tounge) it means Play-->
          <item>Play</item>
        </one-of>
      </item>
    </one-of>
  </rule>
  
 
  
   <rule id="Stoprule" scope="public"> 
      <one-of>
      <item>
        <tag>STOP</tag>
        <one-of>
          <item>IMA</item>          <!--In Zulu(Mother Tounge) it means STOP-->
          <item>Stop Video</item>
          <item>Stop</item>
          <item>Misa i Video</item>           <!--In Zulu(Mother Tounge) it means STOP the Video-->
        </one-of>
      </item>
    </one-of>
  </rule>

  <rule id="Pauserule" scope="public">
    <one-of>
      <item>
        <tag>PAUSE</tag>
        <one-of>
          <item>WAIT</item> 
          <item>Pause</item>
        </one-of>
      </item>
    </one-of>
  </rule>



  <rule id="Volumeup" scope="public">
    <one-of>
      <item>
        <tag>UP</tag>
        <one-of>
          <item>PHEZULU</item>           <!--In Zulu(Mother Tounge) it means UP-->
          <item>KODIMU</item>            <!--In SLANG SOTHO(Mother Tounge) it means UP-->
        </one-of>
      </item>
    </one-of>
  </rule>


  <rule id="Volumedown" scope="public">
    <one-of>
      <item>
        <tag>DOWN</tag>
        <one-of>
          <item>PHANSI</item>           <!--In Zulu(Mother Tounge) it means DOWN-->
          <item>KOTLASI</item>           <!--In SLANG SOTHO(Mother Tounge) it means DOWN-->
        </one-of>
      </item>
    </one-of>
  </rule>
</grammar>
  
As usually, you know that when creating Kinect applications we create a WPF application and add a reference to our friend “Microsoft.Kinect” library, if you don’t know this, I suggest you read my previous articles on the subject of Microsoft Kinect.

<Window x:Class="VoiceCommandsInKinect.MainWindow" 
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" 
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" 
        Title="MainWindow" Height="572.244" Width="848.43"> 
    <Grid Margin="0,0,-5.6,4.4"> 
      
        <Label x:Name="Status" HorizontalAlignment="Left" VerticalAlignment="Top" Height="60" Width="100" Margin="0,0,748,478"  > 
        </Label> 
  
        <Image Source="..\image\Logo.png"   Height="60" Width="100" HorizontalAlignment="Center" VerticalAlignment="Top"  Margin="371,2,0,453"    /> 
        <MediaElement x:Name="VideoPlayer"  LoadedBehavior="Manual" UnloadedBehavior="Stop"    Margin="0,90,0,10"  ></MediaElement> 
    </Grid> 
</Window>    
This is just a simple media element with a name “VideoPlayer” and I decorated my window with a Microsoft Kinect logo so that our example looks cool. I always try to comment my code line by line where it might not make sense for the new reader in the technology. The following code is commented to the best of my ability, if you have any question, you can write on the comment and I will explain and add a comment where needed. 
using Microsoft.Kinect; 
using Microsoft.Speech.Recognition;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text; 
using System.Windows;
using System.Windows.Controls; 
using System.Windows.Media; 

namespace VoiceCommandsInKinect
{
    /// 
    /// Interaction logic for MainWindow.xaml
    /// 
    public partial class MainWindow : Window
    {


        #region "Constants"
   
        /// 
        /// Name of speech grammar corresponding to file. Note that the name must be the same, it is case sensative
        /// 
        //For the Play Functionality 
        private const string PLAYrule = "PLAYrule";
        //For the Stop Functionality 
        private const string Stoprule = "Stoprule";
        //For the Pause funtionality
        private const string Pauserule = "Pauserule";
        //for Volume down funtionality
        private const string Volumedownrule = "Volumedown";
        //for Volume up functionality
        private const string Volumeuprule = "Volumeup";
        
        /// 
        /// Speech recognizer used to detect voice commands issued by application users.
        /// 
        private SpeechRecognizer speechRecognizer;
 
        /// 
        /// Speech grammar used during Application.
        ///  
        private Grammar PlayGrammar; 
        private Grammar StopGrammar;
        private Grammar PauseGrammar;
        private Grammar VolumeupGrammar;
        private Grammar VolumedownGrammar;
        #endregion

        /// 
        /// Initializes a new instance of the MainWindow class.
        /// 
        public MainWindow()
        {
            InitializeComponent();
            //What should happen when the applicatioj is loaded
            Loaded += MainWindow_Loaded;
            //what should happen when the application is unloaded
            Unloaded += MainWindow_Unloaded;
            //What should happen when the application is closing
            Closing += MainWindow_Closing;

          
        }

        //Stop the Sensor when the application is being closed
        void MainWindow_Closing(object sender, System.ComponentModel.CancelEventArgs e)
        {
            sensor.Stop(); 
        }
        //Stop the Sensor when the application is being closed
        void MainWindow_Unloaded(object sender, RoutedEventArgs e)
        {
            sensor.Stop(); 
                
        }

        //get the First Sensor
        KinectSensor sensor = KinectSensor.KinectSensors[0];

        void MainWindow_Loaded(object sender, RoutedEventArgs e)
        {
          

            //Check if the Sensor is Connected
            if (sensor.Status == KinectStatus.Connected)
            {
                //Start the Sensor
                sensor.Start();
               
                //nice message with Colors to alert you if your sensor is working or not
                Status.Content = "Kinect Ready";
                Status.Background = new SolidColorBrush(Colors.Green);
                Status.Foreground = new SolidColorBrush(Colors.White);
                 
                // Create and configure speech grammars and recognizer  
                this.PlayGrammar = CreateGrammar(PLAYrule);
                this.StopGrammar = CreateGrammar(Stoprule);
                this.PauseGrammar = CreateGrammar(Pauserule);
                this.VolumedownGrammar = CreateGrammar(Volumedownrule); 
                this.VolumeupGrammar = CreateGrammar(Volumeuprule);

                //recognize the speech
                this.speechRecognizer = SpeechRecognizer.Create(new[] { PlayGrammar, StopGrammar, PauseGrammar ,VolumeupGrammar,VolumedownGrammar});

                if (null != speechRecognizer)
                {
                    this.speechRecognizer.SpeechRecognized += SpeechRecognized;

                    this.speechRecognizer.Start(sensor.AudioSource);
                }
            }
            else if (sensor.Status == KinectStatus.Disconnected)
            {
                //nice message with Colors to alert you if your sensor is working or not
                Status.Content = "Kinect Sensor is not Connected";
                Status.Background = new SolidColorBrush(Colors.Orange);
                Status.Foreground = new SolidColorBrush(Colors.Black);

            }
            else if (sensor.Status == KinectStatus.NotPowered)
            {//nice message with Colors to alert you if your sensor is working or not
                Status.Content = "Kinect Sensor is not Powered";
                Status.Background = new SolidColorBrush(Colors.Red);
                Status.Foreground = new SolidColorBrush(Colors.Black);
            }
            else if (sensor.Status == KinectStatus.NotReady)
            {//nice message with Colors to alert you if your sensor is working or not

                Status.Content = "Kinect Sensor is not Ready";
                Status.Background = new SolidColorBrush(Colors.Red);
                Status.Foreground = new SolidColorBrush(Colors.Black);

            } 
        }
 
 

        private void SpeechRecognized(object sender, SpeechRecognizerEventArgs e)
        { 
            //Play the Video
            const string Play = "PLAY";
            //Stop the Video 
            const string StopCommand  = "STOP";
            //Pause
            const string PauseCommand = "PAUSE";
            //Volume Down
            const string VolumedownCommand = "DOWN";
            //Volume Up
            const string VolumeupCommand = "UP";


            if (null == e.SemanticValue)
            {
                return;
            }

            // Handle game mode control commands
            switch (e.SemanticValue)
            {
                 
                case Play:
                    PlayVideo();
                    return;

                case StopCommand:
                    VideoPlayer.Stop();
                    return;

                case PauseCommand:
                    VideoPlayer.Pause();
                    return;

                case VolumedownCommand:

                    VideoPlayer.Volume = 0;
                    return;


                case VolumeupCommand:
                    VideoPlayer.Volume = 1;
                    return;
            } 

            // We only handle speech commands with an associated sound source angle, so we can find the
            // associated player
            if (!e.SourceAngle.HasValue)
            {
                return;
            } 
        }
          
 
        /// 
        /// Create a grammar from grammar definition XML file.
        /// 
        /// 
        /// Rule corresponding to grammar we want to use.
        /// Tha
        /// 
        /// New grammar object corresponding to specified rule.
        /// 
        private Grammar CreateGrammar(string ruleName)
        {
            Grammar grammar;

            using (var memoryStream = new MemoryStream(Encoding.ASCII.GetBytes(Properties.Resources.SpeechGrammar)))  //Access a Gramar File
            {
                grammar = new Grammar(memoryStream, ruleName);
            }

            return grammar;
        }
        //Function to Play a Video
        private void  PlayVideo()
        {
            VideoPlayer.Source = new Uri(@"D:\Articles\How to use Voice Commands in Kinect\VoiceCommandsInKinect\VoiceCommandsInKinect\KinectSDK.wmv", UriKind.Absolute);
            VideoPlayer.LoadedBehavior = MediaState.Manual;
            VideoPlayer.Play();
         

        }
    }
}

 

Demonstration

 
When you run your application, you will notice
 

Figure 1.3

And when I speak “PLAY” the video started playing
 

Figure 1.4
 

Figure 1.5

Figure 1.6
 
I was able to say Pause or use my mother tongue and said “IMA” it stopped. I have attached an example project that will guide you
 

Reference

 
http://www.w3.org/TR/speech-grammar/#S2
 
http://www.dotnetfunda.com/articles/article2050-introduction-to-microsoft-kinect.aspx
 

Conclusion

 
Thank you again for Visiting Dotnetfunda to learn about this exciting technology, more articles will come soon.
 
Microsoft Kinect is the Future.
 
Vuyiswa Maseko

If you like this article, subscribe to our RSS Feed. You can also subscribe via email to our Interview Questions, Codes and Forums section.

Page copy protected against web site content infringement by Copyscape
Found interesting? Add this to:



Please Sign In to vote for this post.

Experience:11 year(s)
Home page:http://www.Dotnetfunda.com
Member since:Sunday, July 06, 2008
Level:NotApplicable
Status: [Member] [MVP] [Administrator]
Biography:Vuyiswa Junius Maseko is a programmer and a moderator in ".NetFunda. Vuyiswa has been developing for 9 years now. his major strength are C# 1.1,2.0,3.0,3.5 and sql and his interest are in Silverlight,WPF,C#,Kinect , Xbox Gaming Dev.
>> Write Response - Respond to this post and get points
Related Posts

The Kinect for Windows sensor is a fully-tested and supported Kinect experience on Windows with features such as “near mode,” skeletal tracking control, API improvements, and improved USB support across a range of Windows computers and Windows-specific 10’ acoustic models.

Navigation in Kinect applications cannot be the same as other traditional applications with small buttons that are made to save safe and also Tabs that can allow a user to navigate with the application. Kinect approach is different and if you are new to Kinect, you probably came across the UI navigation problem as I did before, but was able to resolve the problem through friends who are also doing Kinect for Windows Development.

In this article i demonstrate a simple way to tilt the Kinect Sensor.

This is a Review of the Book titled "Kinect for Windows SDK Programming Guide" by abhijit jana

This is my first article for the year 2013, am still excited about Microsoft Kinect for windows sdk. In this article i will answer some of the questions asked to me after I published the previous article. This article will focus on the Cursor and selected and Clicking the Buttons in a Kinect application.

More ...
About Us | Contact Us | The Team | Advertise | Software Development | Write for us | Testimonials | Privacy Policy | Terms of Use | Link Exchange | Members | Go Top
General Notice: If you find plagiarised (copied) contents on this page, please let us know the original source along with your correct email id (to communicate) for further action.
Copyright © DotNetFunda.Com. All Rights Reserved. Copying or mimicking the site design and layout is prohibited. Logos, company names used here if any are only for reference purposes and they may be respective owner's right or trademarks. | 5/18/2013 12:37:39 PM