There are few things that I have seen while developing Kinect examples, I wouldn't want to say apps, because I have not yet developed a full-fledged application. Kinect user interface, is different from the traditional user interface, where we will see buttons that needs to be clicked. Using voice you can control or do things that you used to do with a mouse, using hand gesture you can control your application like you used to do with a mouse.
In this article i will demonstrate to you, on how you can control your application using voice commands.
Introduction
There are few things that I have seen while developing Kinect
examples, I wouldn't want to say apps, because I have not yet developed a
full-fledged application. Kinect user interface, is different from the
traditional user interface, where we will see buttons that needs to be clicked.
Using voice you can control or do things that you used to do with a mouse,
using hand gesture you can control your application like you used to do with a
mouse.
In this article i will demonstrate to you, on how you can control
your application using voice commands.
Objective
The objective of this article is to demonstrate to you on how to
control your application using voice commands instead of a mouse.
Authors Preface
When I started creating the example of this article, I first
created buttons and later i removed those buttons. The reason I finally removed
the buttons, is because I did not need them anymore, I could control my example
app using voice commands and the nice part is that I controlled the application
using my mother tongue. This might be the last I article I will write this
year, I will be going a small pause for two weeks, and after that I might give
you a new one before the 3rd of January 2013.
Name Spaces
There
are namespaces that you will need to add which you never added when following
my previous article on the subject of Microsoft Kinect.

Figure
1.1
The
common location of this namespace is
C:\Windows\assembly\GAC_MSIL\Microsoft.Speech\11.0.0.0__31bf3856ad364e35\Microsoft.Speech.dll
After
you have added the required namespace’s , we must setup our grammar file , So
Basically we want to create an application that plays a video , but we don’t want
to pause or play or stop the video using
a mouse or clicking a button, we want to speak commands and the
application must respond otherwise.
Setup the Grammar File
The
grammar file is just an xml file with the following tags
Rule
A rule definition is represented by the rule element. The id attribute of the element indicates the name of the rule and
must be unique within the grammar (this is enforced by XML).
ITEM
An item element can surround any expansion to permit a
repeat attribute or
language identifier to be
attached. The weight attribute of item is ignored unless
the element appears within a one-of element.
TAG
A tag is a
legal rule expansion (a
tag can also be declared in the grammar header - see
S4.1).
A tag is an arbitrary
string that may be included inline within any legal rule expansion. Any
number of tags may be included inline within a rule expansion.
Tags do not affect the legal word
patterns defined by the grammars or the process of recognizing speech or other
input given a grammar.
Tags may contain content for
semantic interpretation.
The semantic interpretation processes may affect the recognition result.
A
tag is a
legal rule expansion (a
tag can also be declared in the grammar header - see
S4.1).
A tag is
an
arbitrary string that may be included inline
within any legal rule expansion. Any number of tags may be included inline
within a rule expansion.
Tags
do not affect the legal word patterns defined by the grammars or the process of
recognizing speech or other input given a grammar.
Tags
may contain content for
semantic
interpretation. The semantic interpretation processes may affect the
recognition result.
Recap
Wikipedia
has lots of articles on grammar file specifications. Basically this file
contains the possible words that can be used to control our application. When
we are done, we will create resource for our file and access it in our
application as a resource as depicted below

Figure
1.2
You
full grammar file for this example should look like this
Full Grammar File
<grammar version="1.0" xml:lang="en-US" tag-format="semantics/1.0-literals" xmlns="http://www.w3.org/2001/06/grammar">
<rule id="PLAYrule" scope="public">
<one-of>
<item>
<tag>PLAY</tag>
<one-of>
<item>Cala</item> <!--In Zulu(Mother Tounge) it means Start-->
<item>Start Video</item>
<item>Dlala</item> <!--In Zulu(Mother Tounge) it means Play-->
<item>Play</item>
</one-of>
</item>
</one-of>
</rule>
<rule id="Stoprule" scope="public">
<one-of>
<item>
<tag>STOP</tag>
<one-of>
<item>IMA</item> <!--In Zulu(Mother Tounge) it means STOP-->
<item>Stop Video</item>
<item>Stop</item>
<item>Misa i Video</item> <!--In Zulu(Mother Tounge) it means STOP the Video-->
</one-of>
</item>
</one-of>
</rule>
<rule id="Pauserule" scope="public">
<one-of>
<item>
<tag>PAUSE</tag>
<one-of>
<item>WAIT</item>
<item>Pause</item>
</one-of>
</item>
</one-of>
</rule>
<rule id="Volumeup" scope="public">
<one-of>
<item>
<tag>UP</tag>
<one-of>
<item>PHEZULU</item> <!--In Zulu(Mother Tounge) it means UP-->
<item>KODIMU</item> <!--In SLANG SOTHO(Mother Tounge) it means UP-->
</one-of>
</item>
</one-of>
</rule>
<rule id="Volumedown" scope="public">
<one-of>
<item>
<tag>DOWN</tag>
<one-of>
<item>PHANSI</item> <!--In Zulu(Mother Tounge) it means DOWN-->
<item>KOTLASI</item> <!--In SLANG SOTHO(Mother Tounge) it means DOWN-->
</one-of>
</item>
</one-of>
</rule>
</grammar>
As
usually, you know that when creating Kinect applications we create a WPF
application and add a reference to our friend “Microsoft.Kinect” library, if
you don’t know this, I suggest you read my previous articles on the subject of
Microsoft Kinect.
<Window x:Class="VoiceCommandsInKinect.MainWindow"
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
Title="MainWindow" Height="572.244" Width="848.43">
<Grid Margin="0,0,-5.6,4.4">
<Label x:Name="Status" HorizontalAlignment="Left" VerticalAlignment="Top" Height="60" Width="100" Margin="0,0,748,478" >
</Label>
<Image Source="..\image\Logo.png" Height="60" Width="100" HorizontalAlignment="Center" VerticalAlignment="Top" Margin="371,2,0,453" />
<MediaElement x:Name="VideoPlayer" LoadedBehavior="Manual" UnloadedBehavior="Stop" Margin="0,90,0,10" ></MediaElement>
</Grid>
</Window>
This is just a simple media element with a
name “VideoPlayer” and I decorated my window with a Microsoft Kinect logo so
that our example looks cool. I always try to comment my code line by line where
it might not make sense for the new reader in the technology. The following
code is commented to the best of my ability, if you have any question, you can
write on the comment and I will explain and add a comment where needed.
using Microsoft.Kinect;
using Microsoft.Speech.Recognition;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Media;
namespace VoiceCommandsInKinect
{
///
/// Interaction logic for MainWindow.xaml
///
public partial class MainWindow : Window
{
#region "Constants"
///
/// Name of speech grammar corresponding to file. Note that the name must be the same, it is case sensative
///
//For the Play Functionality
private const string PLAYrule = "PLAYrule";
//For the Stop Functionality
private const string Stoprule = "Stoprule";
//For the Pause funtionality
private const string Pauserule = "Pauserule";
//for Volume down funtionality
private const string Volumedownrule = "Volumedown";
//for Volume up functionality
private const string Volumeuprule = "Volumeup";
///
/// Speech recognizer used to detect voice commands issued by application users.
///
private SpeechRecognizer speechRecognizer;
///
/// Speech grammar used during Application.
///
private Grammar PlayGrammar;
private Grammar StopGrammar;
private Grammar PauseGrammar;
private Grammar VolumeupGrammar;
private Grammar VolumedownGrammar;
#endregion
///
/// Initializes a new instance of the MainWindow class.
///
public MainWindow()
{
InitializeComponent();
//What should happen when the applicatioj is loaded
Loaded += MainWindow_Loaded;
//what should happen when the application is unloaded
Unloaded += MainWindow_Unloaded;
//What should happen when the application is closing
Closing += MainWindow_Closing;
}
//Stop the Sensor when the application is being closed
void MainWindow_Closing(object sender, System.ComponentModel.CancelEventArgs e)
{
sensor.Stop();
}
//Stop the Sensor when the application is being closed
void MainWindow_Unloaded(object sender, RoutedEventArgs e)
{
sensor.Stop();
}
//get the First Sensor
KinectSensor sensor = KinectSensor.KinectSensors[0];
void MainWindow_Loaded(object sender, RoutedEventArgs e)
{
//Check if the Sensor is Connected
if (sensor.Status == KinectStatus.Connected)
{
//Start the Sensor
sensor.Start();
//nice message with Colors to alert you if your sensor is working or not
Status.Content = "Kinect Ready";
Status.Background = new SolidColorBrush(Colors.Green);
Status.Foreground = new SolidColorBrush(Colors.White);
// Create and configure speech grammars and recognizer
this.PlayGrammar = CreateGrammar(PLAYrule);
this.StopGrammar = CreateGrammar(Stoprule);
this.PauseGrammar = CreateGrammar(Pauserule);
this.VolumedownGrammar = CreateGrammar(Volumedownrule);
this.VolumeupGrammar = CreateGrammar(Volumeuprule);
//recognize the speech
this.speechRecognizer = SpeechRecognizer.Create(new[] { PlayGrammar, StopGrammar, PauseGrammar ,VolumeupGrammar,VolumedownGrammar});
if (null != speechRecognizer)
{
this.speechRecognizer.SpeechRecognized += SpeechRecognized;
this.speechRecognizer.Start(sensor.AudioSource);
}
}
else if (sensor.Status == KinectStatus.Disconnected)
{
//nice message with Colors to alert you if your sensor is working or not
Status.Content = "Kinect Sensor is not Connected";
Status.Background = new SolidColorBrush(Colors.Orange);
Status.Foreground = new SolidColorBrush(Colors.Black);
}
else if (sensor.Status == KinectStatus.NotPowered)
{//nice message with Colors to alert you if your sensor is working or not
Status.Content = "Kinect Sensor is not Powered";
Status.Background = new SolidColorBrush(Colors.Red);
Status.Foreground = new SolidColorBrush(Colors.Black);
}
else if (sensor.Status == KinectStatus.NotReady)
{//nice message with Colors to alert you if your sensor is working or not
Status.Content = "Kinect Sensor is not Ready";
Status.Background = new SolidColorBrush(Colors.Red);
Status.Foreground = new SolidColorBrush(Colors.Black);
}
}
private void SpeechRecognized(object sender, SpeechRecognizerEventArgs e)
{
//Play the Video
const string Play = "PLAY";
//Stop the Video
const string StopCommand = "STOP";
//Pause
const string PauseCommand = "PAUSE";
//Volume Down
const string VolumedownCommand = "DOWN";
//Volume Up
const string VolumeupCommand = "UP";
if (null == e.SemanticValue)
{
return;
}
// Handle game mode control commands
switch (e.SemanticValue)
{
case Play:
PlayVideo();
return;
case StopCommand:
VideoPlayer.Stop();
return;
case PauseCommand:
VideoPlayer.Pause();
return;
case VolumedownCommand:
VideoPlayer.Volume = 0;
return;
case VolumeupCommand:
VideoPlayer.Volume = 1;
return;
}
// We only handle speech commands with an associated sound source angle, so we can find the
// associated player
if (!e.SourceAngle.HasValue)
{
return;
}
}
///
/// Create a grammar from grammar definition XML file.
///
///
/// Rule corresponding to grammar we want to use.
/// Tha
///
/// New grammar object corresponding to specified rule.
///
private Grammar CreateGrammar(string ruleName)
{
Grammar grammar;
using (var memoryStream = new MemoryStream(Encoding.ASCII.GetBytes(Properties.Resources.SpeechGrammar))) //Access a Gramar File
{
grammar = new Grammar(memoryStream, ruleName);
}
return grammar;
}
//Function to Play a Video
private void PlayVideo()
{
VideoPlayer.Source = new Uri(@"D:\Articles\How to use Voice Commands in Kinect\VoiceCommandsInKinect\VoiceCommandsInKinect\KinectSDK.wmv", UriKind.Absolute);
VideoPlayer.LoadedBehavior = MediaState.Manual;
VideoPlayer.Play();
}
}
}
Demonstration
When you run your application, you will notice

Figure
1.3
And when I speak “PLAY” the video
started playing

Figure
1.4

Figure
1.5

Figure
1.6
I was able to say Pause or use my
mother tongue and said “IMA” it stopped. I have attached an example project
that will guide you
Reference
http://www.w3.org/TR/speech-grammar/#S2
http://www.dotnetfunda.com/articles/article2050-introduction-to-microsoft-kinect.aspx
Conclusion
Thank you again for Visiting Dotnetfunda to learn about this
exciting technology, more articles will come soon.
Microsoft Kinect is the Future.
Vuyiswa Maseko