Paper Trail: June 11, 2026
Published:
Back to the Paper Trail archive
LegalAgentBench: Evaluating LLM Agents in Legal Domain
Authors: Haitao Li, Junjie Chen, Jingli Yang, Qingyao Ai, Wei Jia, Youfeng Liu, Kai Lin, Yueyue Wu, Guozhi Yuan, Yiran Hu, Wuyue Wang, Yiqun Liu, Minlie Huang
Link: paper & code
Venue: ACL 2025
Summary. This paper introduces LegalAgentBench, a comprehensive benchmark specifically designed to evaluate LLM Agents in the Chinese legal domain. LegalAgentBench includes 17 corpora from real-world legal scenarios and provides 37 tools for interacting with external knowledge. To cover tasks of varying difficulty and types, this paper designed a scalable task construction process that enables a more precise evaluation of performance in both tool utilization and reasoning. Performance were assessed through both success rate of final outcomes and keyword analysis during intermediate processes to calculate progress rates. Eight popular LLMs were evaluated.
Legal Agent Benchmark (LAB): An open-source benchmark for evaluating agents on real legal work
Authors: Harvey AI
Link: code & data & announcement
Summary. An open-source benchmark built to evaluate and improve agent capabilities for supporting legal work. Each task consists of an instruction, a client matter containing relevant materials, and a requirement that the agent produce a work product for review, mirroring how work is assigned, performed, and reviewed at large law firms.
Reflection. An interesting agent benchmark for long-horizon legal work. Tasks are constructed to be client-matter centric. Evaluation is conducted using expert rubrics, which outline what a correct answer must produce in terms of format, facts, and analysis, emulating the scrutiny work product undergoes when handed off to partners and clients.
