Speaker: Adrien Gaidon, Doctoral candidate at Lear project, INRIA Rhone-Alpes, Montbonnot, France.

"Current state-of-the-art models of human actions in realistic videos, e.g. the bag of spatio-temporal visual words, are often based on the aggregation of local features in an orderless fashion. However, actions are by essence temporal phenomena and some actions, like "sitting down" and "getting up", can only be reliably classified if their models incorporate some temporal structure. We present two recent results on incorporating temporal information in state-of-the-art recognition methods".

Click to watch the video