BNE Hemeroteca OCR

19th-Century Spanish OCR Dataset

A dataset of over 40,000 PDF documents comprising more than 800,000 pages and over 800 million text tokens, drawn from 19th-century Spanish publications in the Biblioteca Nacional de España (BNE) – Hemeroteca Digital.