ICLR 2023 – Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning


In this episode we discuss Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning
by Anton Bakhtin, David J Wu, Adam Lerer, Jonathan Gray, Athul Paul Jacob, Gabriele Farina, Alexander H Miller, Noam Brown. The paper introduces a strategy called DiL-piKL that combines human imitation learning with reinforcement learning and planning to improve performance in the game of No-press Diplomacy. This algorithm regularizes a reward-maximizing policy towards a policy learned from human imitation, resulting in a no-regret learning algorithm. Building upon DiL-piKL, the paper proposes an extended self-play reinforcement learning algorithm called RL-DiL-piKL, which trains an agent that responds well to human play while also modeling human behavior.


Posted

in

by

Tags: