Nonstochastic Bandit Problems on Graphs

16 novembre 2017

Time: November 16th, 2017, h.11.00 am
Location: Garda room, Polo scientifico e tecnologico "Fabio Ferrari", Building Povo 1, via Sommarive 5, Povo (Trento)

Speaker

Prof. Nicolò Cesa-Bianchi, University of Milano - Italy

Abstract

Multiarmed bandits are a class of sequential decision problems that have been successfully used to model applications ranging from routing in networks to online advertising. Bandit algorithms are characterized by a tension between exploitation of partial knowledge and exploration of new actions. Bandit problems are often analyzed in terms of sequential learning with partial feedback, a broad setting where the learner has a costrained access to the information provided by the environment. These constraints can be naturally expressed using graphs, whose edges indicate observability relationships. In this talk we will revisit nonstochastic bandits problems using two conceptually different observability models. In the first one the actions in the bandit problem are nodes in a graph, and the learner's feedback in each decision round is the outcome of any action in the out-neighborhood of the action chosen at that round. In the second one the learning agents are nodes in a communication network, which they use to cooperatively solve a bandit problem. In both cases, we study the extent to which observation and communication, measured through graph-theoretic quantities, improve the minimax learning rate of the underlying bandit problem.

About the Speaker

Nicolò Cesa-Bianchi is professor of Computer Science at the University of Milano, Italy. He served as President of the Association for Computational Learning and as member of the steering committee of the EC-funded Network of Excellence PASCAL2. He held visiting positions with UC Santa Cruz, Graz Technical University, Ecole Normale Superieure, Google, and Microsoft Research. He is co-author of the monographs "Prediction, Learning, and Games" and "Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems". He is the recipient of a Google Research Award, a Xerox Foundation UAC Award, and a Criteo Faculty Research Award. His main research area is the design and analysis algorithms for machine learning and sequential decision problems. Recently, he has been working on topics a the interface of online learning, multiarmed bandit problems, and learning of networked data.

Contact Person Regarding this Talk: Prof. giuseppe.riccardi [at] unitn.it (Giuseppe Riccardi )