Efficiency as an Inductive Bias: Towards Tokenizer-free and Dynamically Sparse Language Models
Efficiency in language models is often hailed as a solution to democratise access to AI technology and to make it more environmentally sustainable. In this talk, I emphasise an additional and sometimes neglected advantage of efficiency: namely, providing an inductive bias for language use and acquisition. Firstly, I will explore how dynamically compressing token representations and/or the memory (key-value cache) in Transformer LLMs boosts memory and time efficiency. In addition, this process also discovers abstractions from raw data and results in tokenizer-free models. Secondly, I will demonstrate how fine-tuning subnetworks in LLMs allows for adapting them with limited memory and parameter budgets. In addition, learning how to route information through such neural pathways also leads to better generalisation in new tasks.
Short bio: Edoardo M. Ponti is an assistant professor in natural language processing at the University of Edinburgh and a visiting professor at NVIDIA. Previously, he was a visiting postdoctoral scholar at Stanford University and a postdoctoral fellow at Mila and McGill University in Montreal. In 2021, he obtained a PhD in computational linguistics from the University of Cambridge, St John’s College. His main research foci are modular deep learning, efficient neural architectures, and computational typology. His research earned him a Google Research Faculty Award and 2 Best Paper Awards at EMNLP 2021 and RepL4NLP 2019. He is a (terrible) violinist, football and tennis player, and an aspiring practitioner of heroic viticulture.