In this episode, we discuss Detection and Measurement of Syntactic Templates in Generated Text by Chantal Shaib, Yanai Elazar, Junyi Jessy Li, Byron C. Wallace. The paper investigates syntactic features in text generated by large language models (LLMs), revealing higher rates of templated text in these models compared to human-generated text. It finds that a significant portion of these templates originates from pre-training data and remain unchanged during fine-tuning. The study demonstrates that syntactic templates can distinguish between different models and tasks, and serves as an effective tool for evaluating style memorization in LLMs.
arxiv preprint – Detection and Measurement of Syntactic Templates in Generated Text
by
Tags: