ICLR 2023 – Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

In this episode we discuss Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning
by Anton Bakhtin, David J Wu, Adam Lerer, Jonathan Gray, Athul Paul Jacob, Gabriele Farina, Alexander H Miller, Noam Brown. The paper introduces a strategy called DiL-piKL that combines human imitation learning with reinforcement learning and planning to improve performance in the game of No-press Diplomacy. This algorithm regularizes a reward-maximizing policy towards a policy learned from human imitation, resulting in a no-regret learning algorithm. Building upon DiL-piKL, the paper proposes an extended self-play reinforcement learning algorithm called RL-DiL-piKL, which trains an agent that responds well to human play while also modeling human behavior.

Posted

August 3, 2023

Uncategorized

podcast

Tags: