Seminario

Convergence and optimality of neural networks for reinforcement learning

Seminario periodico del Dipartimento di Matematica
7 dicembre 2023
Orario di inizio 
14:30
PovoZero - Via Sommarive 14, Povo (Trento)
Aula Seminari 1– Povo0 e online Zoom (contattare dept.math@unitn.it per le credenziali)
Destinatari: 
Comunità universitaria
Comunità studentesca UniTrento
Partecipazione: 
Ingresso libero
Online
Email per prenotazione: 
Referente: 
Prof. Gian Paolo Leonardi
Contatti: 
Università degli Studi Trento 38123 Povo (TN) - Staff Dipartimento di Matematica
+39 0461/281508-1625-1701-1980-3898
Speaker: 
Andrea Agazzi (Università di Pisa)

Abstract

Recent groundbreaking results have established a convergence theory for wide neural networks in the supervised learning setting. Under an appropriate scaling of parameters at initialization, the (stochastic) gradient descent dynamics of these models converge towards a so-called “mean-field” limit, identified as a Wasserstein gradient flow. In this talk, we extend some of these recent results to examples of prototypical algorithms in reinforcement learning: Temporal-Difference learning and Policy Gradients. In the first case, we prove convergence and optimality of wide neural network training dynamics, bypassing the lack of gradient flow structure in this context by leveraging sufficient expressivity of the activation function. We further show that similar optimality results hold for wide, single layer neural networks trained by entropy-regularized softmax Policy Gradients despite the nonlinear and nonconvex nature of the risk function.