Hybrid dynamical systems (HDSs) are dynamical systems that exhibit both continuous and discrete behaviors. In this project, we are using HDSs to model skilled human activities. The basic premise is that a skilled activity can be described as the composition of basic units of motion (surgemes, movemes, etc.). In surgery, for example, suturing can be divided into grabbing a needle, positioning a needle, inserting a needle, and so forth. Each such unit of motion, e.g. inserting a needle, involves some characteristic dynamics.
Our first goal is to distinguish these basic units of motion from both kinematic and video data of skilled human activities, such as surgery or sports. Such units of motion can be thought of as a dictionary based on which more complex motions can be expressed. The way in which one transitions from one moveme to another is analogous to the way in which our speech transitions from one phoneme to another. We thus aim to discover this language of motion, including both the words as well as the grammar rules.
Once the language has been discovered from data (kinematic data, videos, etc.), we aim to use it for recognizing the level of skill. Our premise is that both the dynamics of each individual motion unit (the movemes) as well as the way in which different motion units are composed (the grammar) are different for different skill levels. Therefore, we aim to train different models depending on the level of skill (novice, intermediate, expert) and to use these models for recognizing the level of skill.
HDS Models of Skilled Movements using Sparse Representation Theory
Our general approach will be to make use of models of dynamical systems to perform “local” modeling of observed data. The parameters of the local model are then a representation of the local structure of the data. These can then be used directly in further modeling stages, or clustered to produce a discrete “dictionary”. To explore the data structure, we use an approach based on recent developments in compressed sensing and sparse representation. Notice that each output in the data, can be represented by a linear combination of past outputs. Therefore, if we consider all the measurements as a dictionary, then each output can always be expressed as a sparse linear combination of elements of the dictionary. The obtained sparse coefficients will be used to cluster the data according to multiple models and identify their parameters.
Clustering and Classifying HDS Models of Skilled Movements
In this project, we propose to compare discrete states using distances based on the similarity of their dynamical models. This distance is effectively a distance between HDSs, which can be used to 1) Cluster similar systems into statistically similar groupings, 2) Classify systems using a variety of methods, and 3)Develop metrics for locating similar trials from a corpus of (e.g., expert) data. Our work  proposed to compute distances between LDSs based on the Binet-Cauchy kernels. These kernels are obtained by computing the trace of a compound matrix of order q built from the output trajectories. In this project, the problem is more challenging to compare HDSs. The fundamental question we will study is how Levenshtein and Binet-Cauchy distances may be combined to yield a distance for hybrid systems. Once the distance of HDSs is given, we can classify HDSs based on machine learning methods, such as K-nearest neighbors (KNN) and Support Vector Machines (SVM).
Transfer from motion data to video data
The overall goal of this project is to build language models for human dexterity from both kinematic data and video data. We believe that video data can provide additional contextual information not available in motion data. For example, in the video we can detect surgical tools, segment tissue, or recognize a hand and the object to be grasped. A simple approach to incorporating video information would be to assume that kinematic and video data are independent, given the sequence of dexemes. However, the two signals are ultimately measurements of the same process, thus it is unlikely that they are independent. In this project, we will take advantage of the fact that we have already learned a model for kinematic data given the sequence of dexemes. Assuming for the moment that the underlying gesture language is fixed, we can imagine first using motion data to estimate labels for previously unlabeled sequences, and then inferring an output model for it. An EM-like procedure then can be used to estimate the model.
Publications related to this work
 S. Vishwanathan, A. Smola, and R. Vidal. Binet-Cauchy kernels on dynamical systems and its application to the analysis of dynamic scenes. International Journal of Computer Vision, 73(1):95–119,2007.
 M. Petreczky and R. Vidal. Metrics and topology for nonlinear and hybrid systems. In Hybrid Systems: Computation and Control. Springer Verlag, 2007.
 E Elhamifar, R Vidal. Clustering Disjoint Subspaces via Sparse Representation. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , 2010.
 E Elhamifar, R Vidal. Robust Classification using Structured Sparse Representation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2011.