arxiv preprint – Language Model Inversion


In this episode, we discuss Language Model Inversion by John X. Morris, Wenting Zhao, Justin T. Chiu, Vitaly Shmatikov, Alexander M. Rush. The paper explores language model inversion, revealing that the probabilities given by language models for the next token can reveal significant details about the preceding text. The authors introduce a technique to reconstruct hidden prompts solely based on the model’s probability outputs, even without full access to all token predictions. They demonstrate the effectiveness of this method on Llama-2 7b, achieving 59 BLEU score, 78 token-level F1, and an exact recovery of 27% of the prompts.


Posted

in

by

Tags: