In this episode, we discuss A Tale of Tails: Model Collapse as a Change of Scaling Laws by Elvis Dohmatob, Yunzhen Feng, Pu Yang, Francois Charton, Julia Kempe. The paper investigates the impact of incorporating synthetic data into training datasets on neural scaling laws and future model performance, questioning whether this integration will lead to continuous improvements or model collapse. It develops a theoretical framework to analyze potential decay phenomena such as loss of scaling and “un-learning” of skills, validated with experiments on arithmetic tasks and text generation. The study underscores the complexity of model success as AI-generated content increases and highlights the need for deeper exploration of models trained on synthesized data from other models.
arxiv preprint – A Tale of Tails: Model Collapse as a Change of Scaling Laws