Seminario

Unraveling Unnatural Language Models: First Insights into Large Language Models' Puzzling Out-of-Distribution Behavior

15 marzo 2024

Orario di inizio

13:00

Palazzo Piomarta - Corso Bettini 84, Rovereto

Vai alla mappa

Aula Magna

Organizzato da:

Centro Interdipartimentale Mente/Cervello - CIMeC

Destinatari:

Comunità universitaria

Partecipazione:

Ingresso libero

Speaker:

Marco Baroni, ICREA and Department of Linguistics at Pompeu Fabra University, Barcelona

Il seminario si terrà in lingua inglese

Large language models (LLMs) respond with uncanny fluency when prompted in a natural language such as English. However, they can also produce predictable, semantically meaningful output when prompted with low-likelihood "gibberish" strings, a phenomenon exploited for developing effective information extraction prompts (Shin et al. 2020) and bypassing security checks in adversarial attacks (Zou et al. 2023). Moreover, the same "unnatural" prompts often trigger the same behavior across LLMs (Rakotonirina et al. 2023, Zou et al. 2023), hinting at a shared "universal" LLM code.

In his talk, prof. Marco Baroni will use unnatural prompts as a tool to gain insights into how LLMs process language-like input. He will delve into his recent and ongoing work on three fronts:

generalizable generation of unnatural prompts, as a window into the invariances of LLMs;
mechanistic interpretability exploration of the activation pathways triggered by natural and unnatural prompts (Kervadec et al. 2023);
"minimal pair" comparison of natural and equivalent unnatural prompts.

Although a comprehensive understanding of how and why LLMs respond to unnatural language remains elusive, prof. Baroni is to present intriguing facts that will pique your curiosity about this phenomenon.

References

C. Kervadec, F. Franzon and M. Baroni. 2023. Unnatural language processing: How do language models handle machine-generated prompts? Findings of EMNLP

N. Rakotonirina, R. Dessi, F. Petroni, S. Riedel and M. Baroni. 2023. Can discrete information extraction prompts generalize across language models? Proceedings of ICLR

T. Shin, Y. Razeghi, R. Logan IV, E. Wallace and S. Singh. 2020. AutoPrompt: Eliciting knowledge from language models with automatically generated prompts. Proceedings of EMNLP

A. Zou, Z. Wang, N. Carlini, M. Nasr, J.Z. Kolter and M. Fredrikson. 2023. Universal and transferable adversarial attacks on aligned language models. arXiv