ResearcharXiv AI / CL
Grading the Grader: Lessons from Evaluating an Agentic Data Analysis System

Summary
Agentic data analysis systems produce rich outputs, including code, numerical results, and verbal diagnostics. This makes them more challenging to evaluate than single-turn LLM responses.
Region
Global
Heat Score
73
Category
Research
Language
en
