This page covers the basic on Kinect gesture recognition. While on the page structure the implementation of the gesture controllers in PresentationAid is described, here we describe how a gesture can be recognized from a Kinect SkeletonFrame object.

If you know the difference between a gesture and a posture, you can read further, otherwise it is advised to read first the questions section.

Kinect SkeletonFrame object contains one or more tracked skeletons. The first step is to check which skeleton is actually tracked person, by checking the TrackingState property, which must be equal to tracking state: Tracked.

Once you get the skeletons you need, the object contains the joint information (see MSDN for more details), that tell you which joints can be seen by Kinect sensor. Joints are parts of human body that can be tracked by the position in 3D space. From this point, you focus on the joints that are important for your gesture. For example, if you are doing a swipe left gesture with your hand, the lower part of the body is not important.

In general, there are two ways to detect a gesture.

Arithmetic gesture detection

Arithmetic gesture detection is recommended more for postures than gestures, as it consists of simple calculations and comparing the positions of different joints. For example, detecting a joined hands posture is done by calculating the distance between the hands in 3D space, using normal vector arithmetic. If the distance is smaller than a certain threshold, the posture is recognized.

Same logic can be applied for a posture of hand over head, which easily compares the Y coordinate of the head to Y coordinate of the hand.

Few more examples are in related project Kinect Toolbox if you need more information.

But you can detect gestures too using this method. Michael Tsikkos and James Glading wrote an article that greatly explains the logic behind gesture detection. Thanks to both authors for their work.

All those methods work in real time, but if we require a more complex gestures, that is where the preprocessing phase comes into play.

Learned gesture detection

The learned gesture detection uses machine learning techniques such as neural network or classificators to learn the way gesture is performed. In so called preprocessing phase the user keeps doing the same gesture multiple times, so the algorithm "learns" the way gesture is performed. During this phase, no other gesture should be performed by the user, as it might throw the algorithm off course. After finishing this phase, the data is saved to a file, which is then read in recognition phase.

Recognition phase uses the classificators to decide whether the user has performed the correct gesture or not, by comparing the gesture data to previously learned gesture.

Great work on this field was done on Stanford university and if you are more interested in their work, check out their Github page.

Included gesture recognizers

PresentationAid comes with few gesture recognizers. First one to point out is a Fizbin gesture recognizer, from which the source is posted on Github. It supports Wave, Swipe, JoinedHands and Zoom gestures. Although only Wave and JoinedHands gestures are supported in PresentationAid, you can easily add support for others. The method behind this recognizer is entirely arithmetic.

The second one is a from Kinect Toolbox and supports same type of gestures with the exception of Zoom. Again, only Wave/Hello and JoinedHands postures are recognized in PresentationAid and they are also recognized with arithmetic methods. However, Kinect Toolbox does include a learning machine to learn the postures.

The last one is from Microsoft Kinect Toolkit example, because the recognition of swipe gesture is very good. Microsoft included the source code for the example, but they did not include the code for the Swipe gesture recognizer. After a small analysis we discovered that it uses a random forest classifier to detect the gesture and it is the only learning based gesture recognizer in PresentationAid.

Last edited Feb 11, 2013 at 10:02 AM by Legoless, version 2


No comments yet.