Convergence and optimality of neural networks for reinforcement learning

Periodic seminar of the Department of Mathematics
7 December 2023
Start time 
2:30 pm
PovoZero - Via Sommarive 14, Povo (Trento)
Seminar Room "1" (Povo 0) e via Zoom (please contact to get the code)
Target audience: 
University community
UniTrento students
Registration email: 
Contact person: 
Prof. Gian Paolo Leonardi
Contact details: 
Università degli Studi Trento 38123 Povo (TN) - Staff Dipartimento di Matematica
+39 0461/281508-1625-1701-1980-3898
Andrea Agazzi (Università di Pisa)


Recent groundbreaking results have established a convergence theory for wide neural networks in the supervised learning setting. Under an appropriate scaling of parameters at initialization, the (stochastic) gradient descent dynamics of these models converge towards a so-called “mean-field” limit, identified as a Wasserstein gradient flow. In this talk, we extend some of these recent results to examples of prototypical algorithms in reinforcement learning: Temporal-Difference learning and Policy Gradients. In the first case, we prove convergence and optimality of wide neural network training dynamics, bypassing the lack of gradient flow structure in this context by leveraging sufficient expressivity of the activation function. We further show that similar optimality results hold for wide, single layer neural networks trained by entropy-regularized softmax Policy Gradients despite the nonlinear and nonconvex nature of the risk function.