arxiv Preprint - Neurons in Large Language Models: Dead, N-gram, Positional

In this episode we discuss Neurons in Large Language Models: Dead, N-gram, Positional
by Elena Voita, Javier Ferrando, Christoforos Nalmpantis. In this paper, the authors analyze a family of language models called OPT models and focus on the activation of neurons in the feedforward blocks. They find that there are many inactive “dead” neurons in the early part of the network and that active neurons in this region primarily act as token and n-gram detectors. The authors also identify positional neurons that are activated based on position rather than textual data.

arxiv Preprint – Neurons in Large Language Models: Dead, N-gram, Positional