We introduce the Cambridge Law Corpus (CLC), a corpus for legal AI research. It consists of over 250 000 court cases from the UK. Most cases are from the 21st century, but the corpus includes cases as old as the 16th century. The corpus contains raw text and meta-data. Together with the corpus, we provide annotations on case outcomes for 638 cases, done by legal experts. Reflecting legal and ethical considerations, we are only releasing the corpus for research purposes under restrictions.

Our paper introducing the corpus has been accepted at NeurIPS Datasets and Benchmarks Track 2023, titled “The Cambridge Law Corpus: A Dataset for Legal AI Research”. The DOI of the corpus is 10.17863/CAM.100221. To download an example of the dataset consisting of 15 court cases, please visit our project website.