arxiv preprint - Model-tuning Via Prompts Makes NLP Models Adversarially Robust

In this episode we discuss Model-tuning Via Prompts Makes NLP Models Adversarially Robust
by Mrigank Raman, Pratyush Maini, J. Zico Kolter, Zachary C. Lipton, Danish Pruthi. The discussed paper presents a new method called Model-tuning Via Prompts (MVP) that significantly improves the adversarial robustness of pretrained language models over the standard multilayer perceptron fine-tuning (MLP-FT) approach. MVP appends a prompt to the input instead of an MLP head, leading to an average 8% performance increase against adversarial attacks across various datasets and models, and even surpassing state-of-the-art defenses by 3.5%. The research suggests that MVP’s robustness gains stem from better alignment with pre-training tasks and avoidance of the vulnerabilities introduced by the random initialization of MLP parameters.

arxiv preprint – Model-tuning Via Prompts Makes NLP Models Adversarially Robust