Lecture and workshop: "Shift Happens: Unlocking Old Corpora with Transkribus” with Quinn Dombrowski (Stanford University)

Submitted by Isabelle Schlegel on

Thursday, May 23, 2024, 4:30 – 6:30 p.m. in Denny Hall 159.

Link to event details on the Slavic Department Calendar

From corpora available for purchase to natural language processing models, to out-of-the-box tools for text analysis, scholars working with English enjoy many advantages. The comparatively limited market for digitized corpora with reasonable metadata in languages other than English means that many scholars are left to fend for themselves, or collaborate within their linguistic communities, to create the materials that form the starting point for their research. The combination of this DIY spirit with resource constraints -- such as difficulties finding volunteers for crowdsourced manual transcription -- has led to innovations in shared infrastructure for AI-powered transcription.

Transcription example


This talk will introduce Transkribus, a platform for training and sharing machine-learning models on handwritten and historic printed text, as well as the READ Co-Op that maintains the software and shapes its developments. Drawing on multilingual examples from Stanford University Library Special Collections, researchers' own materials, and several ongoing proofs-of-concept ranging from Arabic newspapers to medieval Slavic manuscripts to the bilingual (Spanish/Nahuatl) Florentine Codex, the talk will look towards the unprecedented opportunities for asking new kinds of research questions, using material long written off as inaccessible to digital methods.

Transcription example

Quinn Dombrowski is Academic Technology Specialist in the Division of Literatures, Cultures, and Languages at Stanford University. Quinn holds an MA/BA in Slavic Linguistics from the University of Chicago and an MLIS from the University of Illinois, Urbana-Champaign. Quinn supports many non-English DH-projects and regularly teaches on non-English DH. Quinn is currently co-VP of the Association for Computers and the Humanities along with Roopika Risam, and advocates for better support for DH in languages other than English