Speaker: Karteek Alahari, researcher at Lear project, INRIA Rhône-Alpes, Montbonnot, France


The first part of the talk presents a method to obtain pixel-wise segmentation and pose estimation of multiple people in the context of stereoscopic videos. This task involves challenges such as dealing with unconstrained stereoscopic video, non-stationary cameras, and complex indoor and outdoor dynamic scenes. We cast the problem as a discrete labelling task involving multiple person labels, devise a suitable cost function, and optimize it efficiently. We develop a segmentation model incorporating person detection, pose estimation, as well as colour, motion, and disparity cues. Our new model explicitly represents depth ordering and occlusion. We also introduce a stereoscopic dataset with frames extracted from feature-length movies "StreetDance 3D" and "Pina".

The second part focuses on our approach for decomposing global representations of an image into representations specific to the constituent objects and regions of the image. This task is formulated as an optimization problem, given a set of linear classifiers, which can effectively discriminate the object categories present in the image. Our decomposition bypasses harder problems associated with accurately localizing and segmenting objects. In addition to merely measuring the accuracy of decomposition, we also show the utility of the estimated object and background histograms for the task of image classification on the PASCAL VOC dataset.