arxiv preprint - Language Model Inversion

In this episode, we discuss Language Model Inversion by John X. Morris, Wenting Zhao, Justin T. Chiu, Vitaly Shmatikov, Alexander M. Rush. The paper explores language model inversion, revealing that the probabilities given by language models for the next token can reveal significant details about the preceding text. The authors introduce a technique to reconstruct hidden prompts solely based on the model’s probability outputs, even without full access to all token predictions. They demonstrate the effectiveness of this method on Llama-2 7b, achieving 59 BLEU score, 78 token-level F1, and an exact recovery of 27% of the prompts.

arxiv preprint – Language Model Inversion