The Gutenberg-HathiTrust Parallel Corpus: A Real-World Dataset for Noise Investigation in Uncorrected OCR Texts

Ming Jiang, Yuerong Hu, Glen Cameron Layne-Worthey, Ryan Dubnicek, Boris Capitanu, Deren E. Kudeki, J. Stephen Downie

Research output: Contribution to conferencePaperpeer-review

Fingerprint

Dive into the research topics of 'The Gutenberg-HathiTrust Parallel Corpus: A Real-World Dataset for Noise Investigation in Uncorrected OCR Texts'. Together they form a unique fingerprint.

Engineering & Materials Science