Summarizing big data using submodular functions

Published by NAVER LABS Europe at 25 April 2016

Speaker: Baharan Mirzasoleiman, doctoral candidate at Swiss Federal Institute of Technology, Zürich, Switzerland.

Abstract: Many large-scale machine learning problems–clustering, non-parametric learning, kernel machines, etc.–require selecting a small yet representative subset from a large dataset. Such problems can often be reduced to maximizing a submodular set function subject to various constraints. Submodular functions exhibit a natural diminishing returns property: the marginal benefit of any given element decreases as we select more and more elements. Although maximizing a submodular function is NP-hard in general, a simple greedy algorithm produces solutions competitive with the optimal (intractable) solution. However, this greedy algorithm is impractical for truly large-scale problems where the data is residing on disk, or arriving/changing over time at a fast pace. In this talk, we consider the problem of submodular function maximization in distributed and streaming fashions. We briefly overview the theoretical results, as well as the effectiveness of these techniques on several real-world applications on millions of data points

NAVER FRANCE Gender Equality 2024

NAVER FRANCE Gender Equality 2023

VISION

Perception to help robots understand and interact with the environment.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

Action

All

Publications

Blog

News

Careers

People

Summarizing big data using submodular functions

All

Publications

Blog

News

Careers

People

Cookie settings