arxiv Preprint - eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models

In this episode we discuss eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models
by Minsik Cho, Keivan A. Vahid, Qichen Fu, Saurabh Adya, Carlo C Del Mundo, Mohammad Rastegari, Devang Naik, Peter Zatloukal. The paper proposes eDKM, a memory-efficient implementation of weight clustering for large language models (LLMs). LLMs have high performance but require compression for storage-limited devices. The eDKM technique reduces the memory footprint of Differentiable KMeans Clustering (DKM) by orders of magnitudes, allowing for efficient LLM compression with good accuracy.

arxiv Preprint – eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models