Neurips 2023 – Evaluating Cognitive Maps and Planning in Large Language Models with CogEval


In this episode we discuss Evaluating Cognitive Maps and Planning in Large Language Models with CogEval
by Ida Momennejad, Hosein Hasanbeig, Felipe Vieira, Hiteshi Sharma, Robert Osazuwa Ness, Nebojsa Jojic, Hamid Palangi, Jonathan Larson. The paper presents CogEval, a protocol designed to evaluate the cognitive abilities of Large Language Models (LLMs). The authors note the lack of rigorous evaluation in previous studies claiming human-level cognitive abilities in LLMs and propose CogEval as a framework for systematic evaluation. They apply CogEval to assess the cognitive maps and planning skills of eight different LLMs, finding that while they perform well in simpler planning tasks, there are significant failure modes such as hallucinations and being trapped in loops, indicating a lack of understanding of underlying cognitive structures.


Posted

in

by

Tags: