Historical Life Course Studies https://hlcs.nl/ <p><em>Historical Life Course Studies</em> is the electronic journal of the European Historical Population Samples Network (EHPS-Net) and is published by the International Institute of Social History (IISH). The journal is the primary publishing outlet for research involved in the conversion of existing European and non-European large historical demographic databases into a common format, the Intermediate Data Structure, and for studies based on these databases. The journal publishes both methodological and substantive research articles.</p> en-US ehps-journal@iisg.nl (Marja Koster) info@openjournals.nl (Editorial Support) Thu, 06 Jan 2022 08:49:18 +0100 OJS 3.3.0.7 http://blogs.law.harvard.edu/tech/rss 60 Lessons Learned Developing and Using a Machine Learning Model to Automatically Transcribe 2.3 Million Handwritten Occupation Codes https://hlcs.nl/article/view/11331 <p>Machine learning approaches achieve high accuracy for text recognition and are therefore increasingly used for the transcription of handwritten historical sources. However, using machine learning in production requires a streamlined end-to-end pipeline that scales to the dataset size and a model that achieves high accuracy with few manual transcriptions. The correctness of the model results must also be verified. This paper describes our lessons learned developing, tuning and using the <em>Occode</em> end-to-end machine learning pipeline for transcribing 2.3 million handwritten occupation codes from the Norwegian 1950 population census. We achieve an accuracy of 97% for the automatically transcribed codes, and we send 3% of the codes for manual verification . We verify that the occupation code distribution found in our results matches the distribution found in our training data, which should be representative for the census as a whole. We believe our approach and lessons learned may be useful for other transcription projects that plan to use machine learning in production. The source code is available at <a href="https://github.com/uit-hdl/rhd-codes">https://github.com/uit-hdl/rhd-codes</a>.</p> Bjørn-Richard Pedersen, Einar Holsbø, Trygve Andersen, Nikita Shvetsov, Johan Ravn, Hilde Leikny Sommerseth, Lars Ailo Bongo Copyright (c) 2022 Bjørn-Richard Pedersen, Einar Holsbø, Trygve Andersen, Nikita Shvetsov, Johan Ravn, Hilde Leikny Sommerseth, Lars Ailo Bongo https://creativecommons.org/licenses/by/4.0 https://hlcs.nl/article/view/11331 Thu, 06 Jan 2022 00:00:00 +0100