Gestures and Tools for Kinect

You have certainly not missed (as a regular reader of this blog Sourire) that the Kinect for Windows SDK is out!

For now, however, no gestures recognition services are available. So throughout this paper we will create our own library that will automatically detect simple movements such as swipe but also movements more complex such as drawing a circle with your hand.


image_thumb6image_thumb8

The detection of such gestures enable Powerpoint control the Jedi way ! (similar to the Kinect Keyboard Simulator demo).

If you are not familiar with the Kinect for Windows SDK, you should read a previous post that addressed the topic: https://blogs.msdn.com/b/eternalcoding/archive/2011/06/13/unleash-the-power-of-kinect-for-windows-sdk.aspx

How to detect gestures ?

There is an infinite number of solutions for detecting a gesture. In this article I will explorer two of them:

  • Algorithmic search
  • Template based search

Note that these two techniques have many variants and refinements.

You can find the code used in this article just here: **[https://kinecttoolbox.codeplex.com**](https://kinecttoolbox.codeplex.com)

image_thumb[15]

GestureDetector class

To standardize the use of our gestures system, we will therefore propose an abstract class GestureDetector inherited by all gesture classes:

image_thumb5

This class provides the Add method used to record the different positions of the skeleton’s joints.

It also provides the abstract method LookForGesture implemented by the children.

It stores a list of Entry in the property Entries whose role is to save the properties and timing of each recorded position.

Drawing stored positions

The Entry class also stores an WPF ellipse that will be used to draw the stored position:image_thumb7

Via the TraceTo method of the GestureDetector class, we will indicate which canvas will be used to draw the stored positions.

In the end, all the work is done in the Add method:






  1. public virtual void Add(Vector position, SkeletonEngine engine)


  2. {


  3.     Entry newEntry = new Entry {Position = position.ToVector3(), Time = DateTime.Now};


  4.     Entries.Add(newEntry);


  5.  


  6.     if (displayCanvas != null)


  7.     {


  8.         newEntry.DisplayEllipse = new Ellipse


  9.         {


  10.             Width = 4,


  11.             Height = 4,


  12.             HorizontalAlignment = HorizontalAlignment.Left,


  13.             VerticalAlignment = VerticalAlignment.Top,


  14.             StrokeThickness = 2.0,


  15.             Stroke = new SolidColorBrush(displayColor),


  16.             StrokeLineJoin = PenLineJoin.Round


  17.         };


  18.  


  19.  


  20.         float x, y;


  21.  


  22.         engine.SkeletonToDepthImage(position, out x, out y);


  23.  


  24.         x = (float)(x displayCanvas.ActualWidth);


  25.         y = (float)(y displayCanvas.ActualHeight);


  26.  


  27.         Canvas.SetLeft(newEntry.DisplayEllipse, x – newEntry.DisplayEllipse.Width / 2);


  28.         Canvas.SetTop(newEntry.DisplayEllipse, y – newEntry.DisplayEllipse.Height / 2);


  29.  


  30.         displayCanvas.Children.Add(newEntry.DisplayEllipse);


  31.     }


  32.  


  33.     if (Entries.Count > WindowSize)


  34.     {


  35.         Entry entryToRemove = Entries[0];


  36.         


  37.         if (displayCanvas != null)


  38.         {


  39.             displayCanvas.Children.Remove(entryToRemove.DisplayEllipse);


  40.         }


  41.  


  42.         Entries.Remove(entryToRemove);


  43.     }


  44.  


  45.     LookForGesture();


  46. }




Note the use of the SkeletonToDepthImage method which converts a 3D coordinate to a 2D coordinate between 0 and 1 on each axis.

So in addition to saving the position of the joints, the GestureDetector class can draw them to give visual feedback that greatly simplifies the development and debug phases:

image_thumb

As we can see above, the positions being analyzed are shown in red above the Kinect image. To activate this service, the developer just needs to put a canvas over the image that shows the stream of the Kinect camera and pass this canvas to the GestureDetector.TraceTo method:






  1. <Viewbox Margin=”5” Grid.RowSpan=”5”>


  2.     <Grid Width=”640” Height=”480” ClipToBounds=”True”>


  3.         <Image x:Name=”kinectDisplay”></Image>


  4.         <Canvas x:Name=”kinectCanvas”></Canvas>


  5.         <Canvas x:Name=”gesturesCanvas”></Canvas>


  6.         <Rectangle Stroke=”Black” StrokeThickness=”1”/>


  7.     </Grid>


  8. </Viewbox>




The Viewbox is used to keep the image and the canvas at the same size. The second canvas (kinectCanvas) is used to display the green skeleton (using a class available in the sample : SkeletonDisplayManager).

Event-based approach

GestureDetector class provides one last service to his children: the RaiseGestureDetected method that reports the detection of a new gesture via an OnGestureDetected event. The SupportedGesture argument of this event contains the following values:

  • SwipeToLeft
  • SwipeToRight
  • Circle

Obviously the solution is extensible and I encourage you to add new gestures to the system.

The RaiseGestureDetected method (with MinimalPeriodBetweenGestures property) also guarantee that a certain time will elapse between two gestures (in order to remove badly executed gestures).

Now that our foundations are laid, we can develop our algorithms.

Algorithmic search

The algorithmic search browses the list of positions and checks that predefined constraints are always valid.

The SwipeGestureDetector class is responsible for this search:

image_thumb3

For the SwipeToRight gesture, we will use the following constraints:

  • Each new position should be to the right of the previous one
  • Each position must not exceed in height the first by more than a given distance (20 cm)
  • The time between the first and last position must be between 250ms and 1500ms
  • The gesture must be at least 40 cm long

The SwipeToLeft gesture is based on the same constraints except for the direction of the movement of course.

To effectively manage these two gestures, we use a generic algorithm that checks the four constraints mentioned above:






  1. bool ScanPositions(Func<Vector3, Vector3, bool> heightFunction, Func<Vector3, Vector3, bool> directionFunction, Func<Vector3, Vector3, bool> lengthFunction, int minTime, int maxTime)


  2. {


  3.     int start = 0;


  4.  


  5.     for (int index = 1; index < Entries.Count – 1; index++)


  6.     {


  7.         if (!heightFunction(Entries[0].Position, Entries[index].Position) || !directionFunction(Entries[index].Position, Entries[index + 1].Position))


  8.         {


  9.             start = index;


  10.         }


  11.  


  12.         if (lengthFunction(Entries[index].Position, Entries[start].Position))


  13.         {


  14.             double totalMilliseconds = (Entries[index].Time – Entries[start].Time).TotalMilliseconds;


  15.             if (totalMilliseconds >= minTime && totalMilliseconds <= maxTime)


  16.             {


  17.                 return true;


  18.             }


  19.         }


  20.     }


  21.  


  22.     return false;


  23. }




To use this method, we must provide three functions and a time-delay to check.

So to manage the two gestures, simply call the following code:






  1. protected override void LookForGesture()


  2. {


  3.     // Swipe to right


  4.     if (ScanPositions((p1, p2) => Math.Abs(p2.Y – p1.Y) < SwipeMaximalHeight, // Height


  5.         (p1, p2) => p2.X – p1.X > -0.01f, // Progression to right


  6.         (p1, p2) => Math.Abs(p2.X – p1.X) > SwipeMinimalLength, // Length


  7.         SwipeMininalDuration, SwipeMaximalDuration)) // Duration


  8.     {


  9.         RaiseGestureDetected(SupportedGesture.SwipeToRight);


  10.         return;


  11.     }


  12.  


  13.     // Swipe to left


  14.     if (ScanPositions((p1, p2) => Math.Abs(p2.Y – p1.Y) < SwipeMaximalHeight,  // Height


  15.         (p1, p2) => p2.X – p1.X < 0.01f, // Progression to right


  16.         (p1, p2) => Math.Abs(p2.X – p1.X) > SwipeMinimalLength, // Length


  17.         SwipeMininalDuration, SwipeMaximalDuration))// Duration


  18.     {


  19.         RaiseGestureDetected(SupportedGesture.SwipeToLeft);


  20.         return;


  21.     }


  22. }




With this class it is really simple to add gestures that are detectable with constraints.

Skeleton stability

To ensure the proper operation for our detection, we must validate that the skeleton is actually static so as not to generate wrong gestures (i.e. detection of movement associated with a swipe of the whole body) .

To do that, we will use the BarycenterHelper class:






  1. public class BarycenterHelper


  2. {


  3.     readonly Dictionary<int, List<Vector3>> positions = new Dictionary<int, List<Vector3>>();


  4.     readonly int windowSize;


  5.  


  6.     public float Threshold { get; set; }


  7.  


  8.     public BarycenterHelper(int windowSize = 20, float threshold = 0.05f)


  9.     {


  10.         this.windowSize = windowSize;


  11.         Threshold = threshold;


  12.     }


  13.  


  14.     public bool IsStable(int trackingID)


  15.     {


  16.         List<Vector3> currentPositions = positions[trackingID];


  17.         if (currentPositions.Count != windowSize)


  18.             return false;


  19.  


  20.         Vector3 current = currentPositions[currentPositions.Count – 1];


  21.  


  22.         for (int index = 0; index < currentPositions.Count – 2; index++)


  23.         {


  24.             Debug.WriteLine((currentPositions[index] – current).Length());


  25.  


  26.             if ((currentPositions[index] – current).Length() > Threshold)


  27.                 return false;


  28.         }


  29.  


  30.         return true;


  31.     }


  32.  


  33.     public void Add(Vector3 position, int trackingID)


  34.     {


  35.         if (!positions.ContainsKey(trackingID))


  36.             positions.Add(trackingID, new List<Vector3>());


  37.  


  38.         positions[trackingID].Add(position);


  39.  


  40.         if (positions[trackingID].Count > windowSize)


  41.             positions[trackingID].RemoveAt(0);


  42.     }


  43. }




By providing to this class the successive positions of the skeleton, it will tell us (via the IsStable method) if the skeleton is moving or static.

Thus, we can use this information to send the positions of the joints to the detection systems only when the skeleton is not in motion:






  1. void ProcessFrame(ReplaySkeletonFrame frame)


  2. {


  3.     Dictionary<int, string> stabilities = new Dictionary<int, string>();


  4.     foreach (var skeleton in frame.Skeletons)


  5.     {


  6.         if (skeleton.TrackingState != SkeletonTrackingState.Tracked)


  7.             continue;


  8.  


  9.         barycenterHelper.Add(skeleton.Position.ToVector3(), skeleton.TrackingID);


  10.  


  11.         stabilities.Add(skeleton.TrackingID, barycenterHelper.IsStable(skeleton.TrackingID) ? “Stable” : “Unstable”);


  12.         if (!barycenterHelper.IsStable(skeleton.TrackingID))


  13.             continue;


  14.  


  15.         foreach (Joint joint in skeleton.Joints)


  16.         {


  17.             if (joint.Position.W < 0.8f || joint.TrackingState != JointTrackingState.Tracked)


  18.                 continue;


  19.  


  20.             if (joint.ID == JointID.HandRight)


  21.             {


  22.                 swipeGestureRecognizer.Add(joint.Position, kinectRuntime.SkeletonEngine);


  23.                 circleGestureRecognizer.Add(joint.Position, kinectRuntime.SkeletonEngine);


  24.             }


  25.         }


  26.  


  27.         postureRecognizer.TrackPostures(skeleton);


  28.     }


  29.  


  30.     skeletonDisplayManager.Draw(frame);


  31.  


  32.     stabilitiesList.ItemsSource = stabilities;


  33.  


  34.     currentPosture.Text = “Current posture: “ + postureRecognizer.CurrentPosture.ToString();


  35. }




Replay & Recording tools

Another important point when developing with the Kinect for Windows SDK is the ability to efficiently test. It may sound silly, but to test you must get up, come in front of the sensor and make the appropriate gesture. And unless you have an assistant, this can quickly become painful.

So we’ll have a record and replay service for the information sent by Kinect.

Recording

The recording part is pretty simple because we only have to take a SkeletonFrame and browse each skeleton to serialize its contents:






  1. public void Record(SkeletonFrame frame)


  2. {


  3.     if (writer == null)


  4.         throw new Exception(“You must call Start before calling Record”);


  5.  


  6.     TimeSpan timeSpan = DateTime.Now.Subtract(referenceTime);


  7.     referenceTime = DateTime.Now;


  8.     writer.Write((long)timeSpan.TotalMilliseconds);


  9.     writer.Write(frame.FloorClipPlane);


  10.     writer.Write((int)frame.Quality);


  11.     writer.Write(frame.NormalToGravity);


  12.  


  13.     writer.Write(frame.Skeletons.Length);


  14.  


  15.     foreach (SkeletonData skeleton in frame.Skeletons)


  16.     {


  17.         writer.Write((int)skeleton.TrackingState);


  18.         writer.Write(skeleton.Position);


  19.         writer.Write(skeleton.TrackingID);


  20.         writer.Write(skeleton.EnrollmentIndex);


  21.         writer.Write(skeleton.UserIndex);


  22.         writer.Write((int)skeleton.Quality);


  23.  


  24.         writer.Write(skeleton.Joints.Count);


  25.         foreach (Joint joint in skeleton.Joints)


  26.         {


  27.             writer.Write((int)joint.ID);


  28.             writer.Write((int)joint.TrackingState);


  29.             writer.Write(joint.Position);


  30.         }


  31.     }


  32. }




Replay

The main problem with the replay mechanism is about data reconstruction. Indeed, Kinect classes are sealed and do not expose their constructor. To work around this problem, we will replicate and mimic the Kinect class hierarchy adding implicit cast operators to Kinect classes:

image_thumb[3]

The SkeletonReplay class is responsible of the replay with its Start method:






  1. public void Start()


  2. {


  3.     context = SynchronizationContext.Current;


  4.  


  5.     CancellationToken token = cancellationTokenSource.Token;


  6.  


  7.     Task.Factory.StartNew(() =>


  8.     {


  9.         foreach (ReplaySkeletonFrame frame in frames)


  10.         {


  11.             Thread.Sleep(TimeSpan.FromMilliseconds(frame.TimeStamp));


  12.  


  13.             if (token.IsCancellationRequested)


  14.                 return;


  15.                                       


  16.             ReplaySkeletonFrame closure = frame;


  17.             context.Send(state =>


  18.                             {


  19.                                 if (SkeletonFrameReady != null)


  20.                                     SkeletonFrameReady(this, new ReplaySkeletonFrameReadyEventArgs {SkeletonFrame = closure});


  21.                             }, null);


  22.         }


  23.     }, token);


  24. }




Finally, we can record and replay gestures to debug our application:

image_thumb[6]

You can download a replay sample here: https://www.catuhe.com/msdn/davca.replay.zip

Know when to start

The last remaining problem to resolve is deciding when to begin the analysis of gestures. We already know that we must do so when the body is stable but it is not enough.

Even if I stay static, I use my hands a lot when I speak and I can inadvertently trigger a gesture.

To protect ourselves from this, it is possible to complete the detection by adding a posture condition.

That’s why we are going to use the PostureDetector class:






  1. public class PostureDetector


  2. {


  3.     const float Epsilon = 0.1f;


  4.     const float MaxRange = 0.25f;


  5.     const int AccumulatorTarget = 10;


  6.  


  7.     Posture previousPosture = Posture.None;


  8.     public event Action<Posture> PostureDetected;


  9.     int accumulator;


  10.     Posture accumulatedPosture = Posture.None;


  11.  


  12.     public Posture CurrentPosture


  13.     {


  14.         get { return previousPosture; }


  15.     }


  16.  


  17.     public void TrackPostures(ReplaySkeletonData skeleton)


  18.     {


  19.         if (skeleton.TrackingState != SkeletonTrackingState.Tracked)


  20.             return;


  21.  


  22.         Vector3? headPosition = null;


  23.         Vector3? leftHandPosition = null;


  24.         Vector3? rightHandPosition = null;


  25.  


  26.         foreach (Joint joint in skeleton.Joints)


  27.         {


  28.             if (joint.Position.W < 0.8f || joint.TrackingState != JointTrackingState.Tracked)


  29.                 continue;


  30.  


  31.             switch (joint.ID)


  32.             {


  33.                 case JointID.Head:


  34.                     headPosition = joint.Position.ToVector3();


  35.                     break;


  36.                 case JointID.HandLeft:


  37.                     leftHandPosition = joint.Position.ToVector3();


  38.                     break;


  39.                 case JointID.HandRight:


  40.                     rightHandPosition = joint.Position.ToVector3();


  41.                     break;


  42.             }


  43.         }


  44.  


  45.         // HandsJoined


  46.         if (CheckHandsJoined(rightHandPosition, leftHandPosition))


  47.             return;


  48.  


  49.         // LeftHandOverHead


  50.         if (CheckHandOverHead(headPosition, leftHandPosition))


  51.         {


  52.             RaisePostureDetected(Posture.LeftHandOverHead);


  53.             return;


  54.         }


  55.  


  56.         // RightHandOverHead


  57.         if (CheckHandOverHead(headPosition, rightHandPosition))


  58.         {


  59.             RaisePostureDetected(Posture.RightHandOverHead);


  60.             return;


  61.         }


  62.  


  63.         // LeftHello


  64.         if (CheckHello(headPosition, leftHandPosition))


  65.         {


  66.             RaisePostureDetected(Posture.LeftHello);


  67.             return;


  68.         }


  69.  


  70.         // RightHello


  71.         if (CheckHello(headPosition, rightHandPosition))


  72.         {


  73.             RaisePostureDetected(Posture.RightHello);


  74.             return;


  75.         }


  76.  


  77.         previousPosture = Posture.None;


  78.         accumulator = 0;


  79.     }


  80.  


  81.     bool CheckHandOverHead(Vector3? headPosition, Vector3? handPosition)


  82.     {


  83.         if (!handPosition.HasValue || !headPosition.HasValue)


  84.             return false;


  85.  


  86.         if (handPosition.Value.Y < headPosition.Value.Y)


  87.             return false;


  88.  


  89.         if (Math.Abs(handPosition.Value.X – headPosition.Value.X) > MaxRange)


  90.             return false;


  91.  


  92.         if (Math.Abs(handPosition.Value.Z – headPosition.Value.Z) > MaxRange)


  93.             return false;


  94.  


  95.         return true;


  96.     }


  97.  


  98.  


  99.     bool CheckHello(Vector3? headPosition, Vector3? handPosition)


  100.     {


  101.         if (!handPosition.HasValue || !headPosition.HasValue)


  102.             return false;


  103.  


  104.         if (Math.Abs(handPosition.Value.X – headPosition.Value.X) < MaxRange)


  105.             return false;


  106.  


  107.         if (Math.Abs(handPosition.Value.Y – headPosition.Value.Y) > MaxRange)


  108.             return false;


  109.  


  110.         if (Math.Abs(handPosition.Value.Z – headPosition.Value.Z) > MaxRange)


  111.             return false;


  112.  


  113.         return true;


  114.     }


  115.  


  116.     bool CheckHandsJoined(Vector3? leftHandPosition, Vector3? rightHandPosition)


  117.     {


  118.         if (!leftHandPosition.HasValue || !rightHandPosition.HasValue)


  119.             return false;


  120.  


  121.         float distance = (leftHandPosition.Value – rightHandPosition.Value).Length();


  122.  


  123.         if (distance > Epsilon)


  124.             return false;


  125.  


  126.         RaisePostureDetected(Posture.HandsJoined);


  127.         return true;


  128.     }


  129.  


  130.     void RaisePostureDetected(Posture posture)


  131.     {


  132.         if (accumulator < AccumulatorTarget)


  133.         {


  134.             if (accumulatedPosture != posture)


  135.             {


  136.                 accumulator = 0;


  137.                 accumulatedPosture = posture;


  138.             }


  139.             accumulator++;


  140.             return;


  141.         }


  142.  


  143.         if (previousPosture == posture)


  144.             return;


  145.  


  146.         previousPosture = posture;


  147.         if (PostureDetected != null)


  148.             PostureDetected(posture);


  149.  


  150.         accumulator = 0;


  151.     }


  152. }




The PostureDetector class is based on a comparative analysis of the positions of each joint. For example, to trigger the Hello posture, we have to know if the hand is at the same height as the head and if it is at least 25cm on the side.

In addition, the system use an accumulator to ensure that the posture is stable during a given number of frames.

Once again this class is highly extensible.

Template-based search

The main drawback of the algorithmic search is that all gestures are not easily describable with constraints. We will therefore consider another more general approach.

We will assume that a gesture can be recorded and subsequently, the system will determine if the current gesture is one that is already known.

Finally, our goal will be to efficiently compare two gestures.

Compare the comparable

Before we start to write a comparison algorithm, we have to standardize our data.

Indeed, a gesture is a sequence of points (for this article we will simply compare 2D gestures such as circles). However, the coordinates of these points define a distance to the sensor and we will have to bring them together to a common reference. To do this we will do the following;

  1. Generate a new gesture with a defined number of points
  2. Rotate the gesture so the first point is at 0 degree
  3. Rescale the gesture to a 1×1 reference graduation
  4. Center the gesture to origin

With these changes, we will be able to compare points array of the same size centered on a common graduation with a common direction.

To pack points array using this technique, we use the following code:






  1. public static List<Vector2> Pack(List<Vector2> positions, int samplesCount)


  2. {


  3.     List<Vector2> locals = ProjectListToDefinedCount(positions, samplesCount);


  4.  


  5.     float angle = GetAngleBetween(locals.Center(), positions[0]);


  6.     locals = locals.Rotate(-angle);


  7.  


  8.     locals.ScaleToReferenceWorld();


  9.     locals.CenterToOrigin();


  10.  


  11.     return locals;


  12. }




Methods and extension methods are available in the GoldenSectionExtensions static class:






  1. public static class GoldenSectionExtensions


  2. {


  3.     // Get length of path


  4.     public static float Length(this List<Vector2> points)


  5.     {


  6.         float length = 0;


  7.  


  8.         for (int i = 1; i < points.Count; i++)


  9.         {


  10.             length += (points[i – 1] – points[i]).Length();


  11.         }


  12.  


  13.         return length;


  14.     }


  15.  


  16.     // Get center of path


  17.     public static Vector2 Center(this List<Vector2> points)


  18.     {


  19.         Vector2 result = points.Aggregate(Vector2.Zero, (current, point) => current + point);


  20.  


  21.         result /= points.Count;


  22.  


  23.         return result;


  24.     }


  25.  


  26.     // Rotate path by given angle


  27.     public static List<Vector2> Rotate(this List<Vector2> positions, float angle)


  28.     {


  29.         List<Vector2> result = new List<Vector2>(positions.Count);


  30.         Vector2 c = positions.Center();


  31.  


  32.         float cos = (float)Math.Cos(angle);


  33.         float sin = (float)Math.Sin(angle);


  34.  


  35.         foreach (Vector2 p in positions)


  36.         {


  37.             float dx = p.X – c.X;


  38.             float dy = p.Y – c.Y;


  39.  


  40.             Vector2 rotatePoint = Vector2.Zero;


  41.             rotatePoint.X = dx cos – dy sin + c.X;


  42.             rotatePoint.Y = dx sin + dy cos + c.Y;


  43.  


  44.             result.Add(rotatePoint);


  45.         }


  46.         return result;


  47.     }


  48.  


  49.     // Average distance betweens paths


  50.     public static float DistanceTo(this List<Vector2> path1, List<Vector2> path2)


  51.     {


  52.         return path1.Select((t, i) => (t – path2[i]).Length()).Average();


  53.     }


  54.  


  55.     // Compute bounding rectangle


  56.     public static Rectangle BoundingRectangle(this List<Vector2> points)


  57.     {


  58.         float minX = points.Min(p => p.X);


  59.         float maxX = points.Max(p => p.X);


  60.         float minY = points.Min(p => p.Y);


  61.         float maxY = points.Max(p => p.Y);


  62.  


  63.         return new Rectangle(minX, minY, maxX – minX, maxY – minY);


  64.     }


  65.  


  66.     // Check bounding rectangle size


  67.     public static bool IsLargeEnough(this List<Vector2> positions, float minSize)


  68.     {


  69.         Rectangle boundingRectangle = positions.BoundingRectangle();


  70.  


  71.         return boundingRectangle.Width > minSize && boundingRectangle.Height > minSize;


  72.     }


  73.  


  74.     // Scale path to 1×1


  75.     public static void ScaleToReferenceWorld(this List<Vector2> positions)


  76.     {


  77.         Rectangle boundingRectangle = positions.BoundingRectangle();


  78.         for (int i = 0; i < positions.Count; i++)


  79.         {


  80.             Vector2 position = positions[i];


  81.  


  82.             position.X = (1.0f / boundingRectangle.Width);


  83.             position.Y = (1.0f / boundingRectangle.Height);


  84.  


  85.             positions[i] = position;


  86.         }


  87.     }


  88.  


  89.     // Translate path to origin (0, 0)


  90.     public static void CenterToOrigin(this List<Vector2> positions)


  91.     {


  92.         Vector2 center = positions.Center();


  93.         for (int i = 0; i < positions.Count; i++)


  94.         {


  95.             positions[i] -= center;


  96.         }


  97.     }


  98. }




Golden Section

The comparison between our data could be done via a simple average distance function between each point. However, this solution is not accurate enough.

So I used a much more powerful algorithm called Golden Section Search. He comes from a paper available here: https://www.math.uic.edu/~jan/mcs471/Lec9/gss.pdf

A JavaScript implementation is available here: https://depts.washington.edu/aimgroup/proj/dollar/

The implementation in C#:






  1. public static float Search(List<Vector2> current, List<Vector2> target, float a, float b, float epsilon)


  2. {


  3.     float x1 = ReductionFactor a + (1 – ReductionFactor) b;


  4.     List<Vector2> rotatedList = current.Rotate(x1);


  5.     float fx1 = rotatedList.DistanceTo(target);


  6.  


  7.     float x2 = (1 – ReductionFactor) a + ReductionFactor b;


  8.     rotatedList = current.Rotate(x2);


  9.     float fx2 = rotatedList.DistanceTo(target);


  10.  


  11.     do


  12.     {


  13.         if (fx1 < fx2)


  14.         {


  15.             b = x2;


  16.             x2 = x1;


  17.             fx2 = fx1;


  18.             x1 = ReductionFactor a + (1 – ReductionFactor) b;


  19.             rotatedList = current.Rotate(x1);


  20.             fx1 = rotatedList.DistanceTo(target);


  21.         }


  22.         else


  23.         {


  24.             a = x1;


  25.             x1 = x2;


  26.             fx1 = fx2;


  27.             x2 = (1 – ReductionFactor) a + ReductionFactor b;


  28.             rotatedList = current.Rotate(x2);


  29.             fx2 = rotatedList.DistanceTo(target);


  30.         }


  31.     }


  32.     while (Math.Abs(b – a) > epsilon);


  33.  


  34.     float min = Math.Min(fx1, fx2);


  35.  


  36.     return 1.0f – 2.0f * min / Diagonal;


  37. }




With this algorithm, we can simply compare a template with the current gesture and get a score (between 0 and 1).

Learning Machine

To improve our success rate, we need to have multiple templates. For this, we will work with the LearningMachine class whose role is precisely to store our templates (i.e. to learn new models) and to compare with the current gesture:






  1. public class LearningMachine


  2. {


  3.     readonly List<RecordedPath> paths;


  4.  


  5.     public LearningMachine(Stream kbStream)


  6.     {


  7.         if (kbStream == null || kbStream.Length == 0)


  8.         {


  9.             paths = new List<RecordedPath>();


  10.             return;


  11.         }


  12.  


  13.         BinaryFormatter formatter = new BinaryFormatter();


  14.  


  15.         paths = (List<RecordedPath>)formatter.Deserialize(kbStream);


  16.     }


  17.  


  18.     public List<RecordedPath> Paths


  19.     {


  20.         get { return paths; }


  21.     }


  22.  


  23.     public bool Match(List<Vector2> entries, float threshold, float minimalScore, float minSize)


  24.     {


  25.         return Paths.Any(path => path.Match(entries, threshold, minimalScore, minSize));


  26.     }


  27.  


  28.     public void Persist(Stream kbStream)


  29.     {


  30.         BinaryFormatter formatter = new BinaryFormatter();


  31.  


  32.         formatter.Serialize(kbStream, Paths);


  33.     }


  34.  


  35.     public void AddPath(RecordedPath path)


  36.     {


  37.         path.CloseAndPrepare();


  38.         Paths.Add(path);


  39.     }


  40. }




Each RecordedPath implements the Match method and therefore call the Golden Section Search:






  1. public bool Match(List<Vector2> positions, float threshold, float minimalScore, float minSize)


  2. {


  3.     if (positions.Count < samplesCount)


  4.         return false;


  5.  


  6.     if (!positions.IsLargeEnough(minSize))


  7.         return false;


  8.  


  9.     List<Vector2> locals = GoldenSection.Pack(positions, samplesCount);


  10.  


  11.     float score = GoldenSection.Search(locals, points, –MathHelper.PiOver4, MathHelper.PiOver4, threshold);


  12.  


  13.     return score > minimalScore;


  14. }




Thus, thanks to our detection algorithm and to our learning machine, we are dealing with a system that is both reliable and very little dependent on the quality of information supplied by Kinect.

You will find below an example of a knowledge base used to recognize a circle: https://www.catuhe.com/msdn/circleKB.zip.

Conclusion

So we have at our disposal a set of tools for working with Kinect. In addition we have two systems to detect a large number of gestures.

It’s now your turn to use these services in your Kinect applications!

To go further