Kinect Toolbox 1.1 : Template based posture detector and Voice Commander

In a previous article I introduced the Kinect Toolbox : https://blogs.msdn.com/b/eternalcoding/archive/2011/07/04/gestures-and-tools-for-kinect.aspx.

Kinect Toolbox v1.1 is now out and this new version adds support for some cool features:

  • Templated posture detector
  • Voice Commander
  • NuGet package

You can find the toolbox here : https://kinecttoolbox.codeplex.com or you can grad it using NuGet : https://nuget.org/List/Packages/KinectToolbox

Templated posture detector

Using the same algorithm as TemplatedGestureDetector, you can now use a learning machine and a matching system to detect postures. In the sample attached with the toolbox I detect the ”T” posture (i.e. when you body is like the T letter):

To do that, I developed a new class : TemplatedPostureDetector which uses an internal learning machine (like the gesture detector) :




public class TemplatedPostureDetector : PostureDetector
{
const float Epsilon = 0.02f;
const float MinimalScore = 0.95f;
const float MinimalSize = 0.1f;
readonly LearningMachine learningMachine;
readonly string postureName;




public LearningMachine LearningMachine
{
get { return learningMachine; }
}




public TemplatedPostureDetector(string postureName, Stream kbStream) : base(4)
{
this.postureName = postureName;
learningMachine = new LearningMachine(kbStream);
}




public override void TrackPostures(ReplaySkeletonData skeleton)
{
if (LearningMachine.Match(skeleton.Joints.ToListOfVector2(), Epsilon, MinimalScore, MinimalSize))
RaisePostureDetected(postureName);
}




public void AddTemplate(ReplaySkeletonData skeleton)
{
RecordedPath recordedPath = new RecordedPath(skeleton.Joints.Count);




recordedPath.Points.AddRange(skeleton.Joints.ToListOfVector2());




LearningMachine.AddPath(recordedPath);
}




public void SaveState(Stream kbStream)
{
LearningMachine.Persist(kbStream);
}
}





To use this class, we only need to instantiate it and give it some templates (using the [Capture T] button or using a previously saved file). After that, the class can track postures for each skeleton it receives:




Stream recordStream = File.Open(letterT_KBPath, FileMode.OpenOrCreate);
templatePostureDetector = new TemplatedPostureDetector(“T”, recordStream);
templatePostureDetector.PostureDetected += templatePostureDetector_PostureDetected;






templatePostureDetector.TrackPostures(skeleton);






void templatePostureDetector_PostureDetected(string posture)
{
MessageBox.Show(“Give me a…….” + posture);
}




Voice Commander


One thing worth noting when you develop with Kinect is that you will spend your time getting up and sitting down Sourire. In the previous article, I introduced the replay system which is very useful to record a Kinect session.


But when you are alone, even the recording is painful because you cannot be at the same time in front of the sensor and in front of your keyboard to start/stop the record.


So here enters the Voice Commander (tadam!!). This class can use a list of words and raise an event when it detect one of them (using the microphone array of the sensor). So for example, you can use “record” and “stop” orders to launch and stop the recording session while you stay in front of the sensor!


The code is really simple (thanks to Kinect for Windows SDK and Microsoft Speech Platform SDK):




public class VoiceCommander
{
const string RecognizerId = “SR_MS_en-US_Kinect_10.0”;
Thread workingThread;
readonly Choices choices;
bool isRunning;




public event Action<string> OrderDetected;




public VoiceCommander(params string[] orders)
{
choices = new Choices();
choices.Add(orders);
}




public void Start()
{
workingThread = new Thread(Record);
workingThread.IsBackground = true;
workingThread.SetApartmentState(ApartmentState.MTA);
workingThread.Start();
}




void Record()
{
using (KinectAudioSource source = new KinectAudioSource
{
FeatureMode = true,
AutomaticGainControl = false,
SystemMode = SystemMode.OptibeamArrayOnly
})
{
RecognizerInfo recognizerInfo = SpeechRecognitionEngine.InstalledRecognizers().Where(r => r.Id == RecognizerId).FirstOrDefault();




if (recognizerInfo == null)
return;




SpeechRecognitionEngine speechRecognitionEngine = new SpeechRecognitionEngine(recognizerInfo.Id);




var gb = new GrammarBuilder {Culture = recognizerInfo.Culture};
gb.Append(choices);




var grammar = new Grammar(gb);




speechRecognitionEngine.LoadGrammar(grammar);
using (Stream sourceStream = source.Start())
{
speechRecognitionEngine.SetInputToAudioStream(sourceStream, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));




isRunning = true;
while (isRunning)
{
RecognitionResult result = speechRecognitionEngine.Recognize();




if (result != null && OrderDetected != null && result.Confidence > 0.7)
OrderDetected(result.Text);
}
}
}
}




public void Stop()
{
isRunning = false;
}
}




<p>
  Using this class is really simple:
</p>

<div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; float: none; padding-top: 0px" id="scid:9ce6104f-a9aa-4a17-a79f-3a39532ebf7c:b69b555c-7afd-4ff3-9170-4a90f0997b96" class="wlWriterEditableSmartContent">
  <div style="border: #000080 1px solid; color: #000; font-family: 'Courier New', Courier, Monospace; font-size: 10pt">
    <div style="background-color: #ffffff; overflow: auto; padding: 2px 5px;">
      voiceCommander = <span style="color:#0000ff">new</span> <span style="color:#2b91af">VoiceCommander</span>(<span style="color:#a31515">"record"</span>, <span style="color:#a31515">"stop"</span>);<br /> voiceCommander.OrderDetected += voiceCommander_OrderDetected;</p> 

      <p>
        voiceCommander.Start();
      </p>
    </div></p>
  </div></p>
</div>

<div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; float: none; padding-top: 0px" id="scid:9ce6104f-a9aa-4a17-a79f-3a39532ebf7c:8e8c243c-89a5-494e-a962-29ee664dee83" class="wlWriterEditableSmartContent">
  <div style="border: #000080 1px solid; color: #000; font-family: 'Courier New', Courier, Monospace; font-size: 10pt">
    <div style="background-color: #ffffff; overflow: auto; padding: 2px 5px;">
      <span style="color:#0000ff">void</span> voiceCommander_OrderDetected(<span style="color:#0000ff">string</span> order)<br /> {<br />     Dispatcher.Invoke(<span style="color:#0000ff">new</span> <span style="color:#2b91af">Action</span>(() =><br />     {<br />         <span style="color:#0000ff">if</span> (audioControl.IsChecked == <span style="color:#0000ff">false</span>)<br />             <span style="color:#0000ff">return</span>;</p> 

      <p>
                <span style="color:#0000ff">switch</span> (order)<br />         {<br />             <span style="color:#0000ff">case</span> <span style="color:#a31515">"record"</span>:<br />                 DirectRecord(<span style="color:#2b91af">Path</span>.Combine(<span style="color:#2b91af">Environment</span>.GetFolderPath(<span style="color:#2b91af">Environment</span>.<span style="color:#2b91af">SpecialFolder</span>.Desktop), <span style="color:#a31515">"kinectRecord"</span> + <span style="color:#2b91af">Guid</span>.NewGuid() + <span style="color:#a31515">".replay"</span>));<br />                 <span style="color:#0000ff">break</span>;<br />             <span style="color:#0000ff">case</span> <span style="color:#a31515">"stop"</span>:<br />                 StopRecord();<br />                 <span style="color:#0000ff">break</span>;<br />         }<br />     }));<br /> }</div>
      </p>
    </div></p>
  </div>

  <h1>
    Conclusion
  </h1>

  <p>
    With Kinect Toolbox 1.1, you have a set of tools to help you develop fun and powerful applications with Kinect for Windows SDK!
  </p>