Seminar

Convergence and optimality of neural networks for reinforcement learning

Periodic seminar of the Department of Mathematics

Series:

Mathematics for Data Science, Artificial Intelligence and Machine Learning 2023/2024

7 December 2023

Start time

2:30 pm

PovoZero - Via Sommarive 14, Povo (Trento)

Vai alla mappa

Seminar Room "1" (Povo 0) e via Zoom (please contact dept.math@unitn.it to get the code)

Target audience:

University community

UniTrento students

Attendance:

Free

Online

Registration email:

dept.math@unitn.it

Contact person:

Prof. Gian Paolo Leonardi

Contact details:

Università degli Studi Trento 38123 Povo (TN) - Staff Dipartimento di Matematica

+39 0461/281508-1625-1701-1980-3898

dept.math@unitn.it

Speaker:

Andrea Agazzi (Università di Pisa)

Abstract

Recent groundbreaking results have established a convergence theory for wide neural networks in the supervised learning setting. Under an appropriate scaling of parameters at initialization, the (stochastic) gradient descent dynamics of these models converge towards a so-called “mean-field” limit, identified as a Wasserstein gradient flow. In this talk, we extend some of these recent results to examples of prototypical algorithms in reinforcement learning: Temporal-Difference learning and Policy Gradients. In the first case, we prove convergence and optimality of wide neural network training dynamics, bypassing the lack of gradient flow structure in this context by leveraging sufficient expressivity of the activation function. We further show that similar optimality results hold for wide, single layer neural networks trained by entropy-regularized softmax Policy Gradients despite the nonlinear and nonconvex nature of the risk function.