Joint Artificial Intelligence Seminar / Computer Science Speaking Skills Talk

— 5:00pm

Location:
In Person and Zoom Participation - ET - Newell-Simon 3305 and Zoom

Speaker:
MINGJIE SUN , Ph.D. Student, Computer Science Department, Carnegie Mellon University
https://eric-mingjie.github.io/

Massive Activations in Large Language Models

In the 2020s, Transformers have dominated the deep learning landscape, powering almost all advanced AI systems. Despite their promising capabilities, their inner workings are often overlooked and poorly understood.  

In this talk, we delve into an intriguing phenomenon we observe in Large Language Models (LLMs): very few activations within the hidden states exhibit exceptionally high magnitudes, e.g., 100,000 times greater than others. We call them massive activations.  

We present our investigation of massive activations in LLMs and show how they are closely connected to the self-attention mechanism — the core building block of Transformers. Last, we go beyond the language domain and discuss the presence of massive activations in Vision Transformers.

 — 

Mingjie Sun is a Ph.D. student in the Computer Science Department at CMU. His research focuses on improving the efficiency and empirical understanding of foundation models. 

Presented as part of the CMU Artificial Intelligence Seminar Series 

Presented in Partial Fulfillment of the CSD Speaking Skills Requirement. 

In Person and Zoom Participation.  See announcement.

Event Website:
http://www.cs.cmu.edu/~aiseminar/


Add event to Google
Add event to iCal