AI Breakdown

Podcast

The podcast where we breakdown the recent AI papers and explain them in simple terms for you to understand.

Arxiv Paper – Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation AI Breakdown

In this episode, we discuss Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation by Danny Halawi, Alexander Wei, Eric Wallace, Tony T. Wang, Nika Haghtalab, Jacob Steinhardt. The paper highlights security risks in black-box finetuning interfaces for large language models and introduces covert malicious finetuning, a method to compromise a model's safety undetected. This involves creating an innocuous-looking dataset that, collectively, trains the model to handle and produce harmful content. When tested on GPT-4, the method was able to execute harmful instructions 99% of the time while bypassing typical safety measures, underscoring the difficulty in safeguarding finetuning processes from advanced threats.
  1. Arxiv Paper – Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
  2. Arxiv Paper – Video Instruction Tuning With Synthetic Data
  3. Arxiv Paper – Generative Agent Simulations of 1,000 People
  4. NeurIPS 2024 – Moving Off-the-Grid: Scene-Grounded Video Representations
  5. Arxiv Paper – Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution

News