Video modeling and learning on Riemannian manifold for emotion recognition in the wild

In this paper, we present the method for our submission to the emotion recognition in the wild challenge (EmotiW). The challenge is to automatically classify the emotions acted by human subjects in video clips under real-world environment. In our method, each video clip can be represented by three types of image set models (i.e. linear subspace, covariance matrix, and Gaussian distribution) respectively, which can all be viewed as points residing on some Riemannian manifolds. Then different Riemannian kernels are employed on these set models correspondingly for similarity/distance measurement. For classification, three types of classifiers, i.e. kernel SVM, logistic regression, and partial least squares, are investigated for comparisons. Finally, an optimal fusion of classifiers learned from different kernels and different modalities (video and audio) is conducted at the decision level for further boosting the performance. We perform extensive evaluations on the EmotiW 2014 challenge data (including validation set and blind test set), and evaluate the effects of different components in our pipeline. It is observed that our method has achieved the best performance reported so far. To further evaluate the generalization ability, we also perform experiments on the EmotiW 2013 data and two well-known lab-controlled databases: CK+ and MMI. The results show that the proposed framework significantly outperforms the state-of-the-art methods.

10.1007/s12193-015-0204-5

113–124

Liu, Mengyi

675d70e6-dc60-47f2-8da3-4fd7f6b6e297

Wang, Ruiping

d44a3866-4f48-4323-bba6-3bcb145ed34a

Li, Shaoxin

a371fce9-d471-4020-9fc0-2a21a7e49d19

Huang, Zhiwu

84f477cd-9097-44dd-a33e-ff71f253d36b

Shan, Shiguang

72278811-5f18-4dc9-ab05-64668aaee9ad

Chen, Xilin

094f7c27-74a6-44e2-80c3-c1a2b0db0a40

11 November 2015