Historical Life Course Studies https://hlcs.nl/ <p><em>Historical Life Course Studies</em> is the electronic journal of the European Historical Population Samples Network (EHPS-Net) and is published by the International Institute of Social History (IISH). The journal is the primary publishing outlet for research involved in the conversion of existing European and non-European large historical demographic databases into a common format, the Intermediate Data Structure, and for studies based on these databases. The journal publishes both methodological and substantive research articles.</p> European Historical Population Samples Network en-US Historical Life Course Studies 2352-6343 Lessons Learned Developing and Using a Machine Learning Model to Automatically Transcribe 2.3 Million Handwritten Occupation Codes https://hlcs.nl/article/view/11331 <p>Machine learning approaches achieve high accuracy for text recognition and are therefore increasingly used for the transcription of handwritten historical sources. However, using machine learning in production requires a streamlined end-to-end pipeline that scales to the dataset size and a model that achieves high accuracy with few manual transcriptions. The correctness of the model results must also be verified. This paper describes our lessons learned developing, tuning and using the <em>Occode</em> end-to-end machine learning pipeline for transcribing 2.3 million handwritten occupation codes from the Norwegian 1950 population census. We achieve an accuracy of 97% for the automatically transcribed codes, and we send 3% of the codes for manual verification . We verify that the occupation code distribution found in our results matches the distribution found in our training data, which should be representative for the census as a whole. We believe our approach and lessons learned may be useful for other transcription projects that plan to use machine learning in production. The source code is available at <a href="https://github.com/uit-hdl/rhd-codes">https://github.com/uit-hdl/rhd-codes</a>.</p> Bjørn-Richard Pedersen Einar Holsbø Trygve Andersen Nikita Shvetsov Johan Ravn Hilde Leikny Sommerseth Lars Ailo Bongo Copyright (c) 2022 Bjørn-Richard Pedersen, Einar Holsbø, Trygve Andersen, Nikita Shvetsov, Johan Ravn, Hilde Leikny Sommerseth, Lars Ailo Bongo https://creativecommons.org/licenses/by/4.0 2022-01-06 2022-01-06 12 1 17 10.51964/hlcs11331 The Impact of Microdata in Norwegian Historiography 1970 to 2020 https://hlcs.nl/article/view/11675 <p>The establishment of the Norwegian Historical Data Centre, the 1801 project at the University of Bergen and the data transcriptions and scanned versions of the sources in the National Archives made Norwegian microdata much more available. A more detailed description of the digital techniques applied to the wealth of censuses, church records and other types of nominative data from the 18th century onwards, will be presented in a separate article. Our main focus here is to summarize the impact of the research that has been produced based on the Norwegian historical microdata. These studies span a wide range of fields within social history and historical demography: Emigration, immigration, internal migration, fertility, nuptiality, family history and last but not least mortality studies with a priority given to infant mortality. A recent development is the building of a national historical population register covering the 19th and 20th centuries.</p> Hilde Leikny Sommerseth Gunnar Thorvaldsen Copyright (c) 2022 Hilde L. Sommerseth, Gunnar Thorvaldsen https://creativecommons.org/licenses/by/4.0 2022-03-01 2022-03-01 12 18 41 10.51964/hlcs11675 Building an Archival Database for Visualizing Historical Networks. A Case for Pre-Modern Korea https://hlcs.nl/article/view/11718 <p>In this paper, we share the experience of collecting and organizing pre-modern Korean historical materials into a searchable digital archive. The Ajou Interdisciplinary Research Group (AIRG) has continuously collected historical data of pre-modern Korea for the past 10 years to assist the study of family history, historical demographics, and social mobility. This paper describes the rich data sources for historical studies of Korea, such as household registers, genealogies, and state examination registers, and we summarize contributions to the study of historical demography and related fields.</p> Seungmin Paek Jong Hee Park Sangkuk Lee Copyright (c) 2022 Seungmin Paek, Jong Hee Park, Sangkuk Lee https://creativecommons.org/licenses/by/4.0 2022-04-21 2022-04-21 12 42 57 10.51964/hlcs11718 The Utah Population Database. A Model for Linking Medical and Genealogical Records for Population Health Research https://hlcs.nl/article/view/11681 <p>Improving our understanding of the socio-environmental and genetic bases of disease and health outcomes among individuals, families, and populations over time requires extensive longitudinal data on multiple attributes for entire communities, states or nations. This requirement can be difficult to achieve. In this paper we describe a successful example of a database that meets these needs. The Utah Population Database (UPDB) is a unique and powerful database rarely found in the world that has been addressing these data requirements for over 40 years. The UPDB at the University of Utah is one of the world’s richest sources of in-depth information that supports research on genetics, epidemiology, demography, history, and public health. Genetic researchers have used UPDB to identify and study individuals and families that have higher than normal incidence of diseases or other traits, to analyze patterns of genetic inheritance, and to identify specific genetic mutations. Demographers and other social scientists are increasingly using the UPDB to study issues such as trends in fertility transitions and shifts in mortality patterns for both infants and adults. A central component of the UPDB is an extensive set of Utah family histories, in which family members are linked to demographic and medical information. The UPDB includes medical information about cancer, causes of death, and medical details associated with births. It also includes diagnostic records from statewide insurance claims data and healthcare facilities (hospital discharge, ambulatory surgery, emergency department encounters). UPDB is also linked to Medicare claims data, a federal health insurance program generally for persons age 65 or older. The UPDB provides access to information on more than 11 million individuals and supports nearly 400 research projects. We describe in detail the data components of the UPDB, how it can be accessed, issues related to its development, record linkage, governance and privacy protections, as well as plans for future developments.</p> Ken R. Smith Alison Fraser Diana Lane Reed Jahn Barlow Heidi A. Hanson Jennifer West Stacey Knight Navina Forsythe Geraldine P. Mineau Copyright (c) 2022 Ken R. Smith, Alison Fraser, Diana Lane Reed, Jahn Barlow, Heidi A. Hanson, Jennifer West, Stacey Knight, Navina Forsythe, Geraldine P. Mineau https://creativecommons.org/licenses/by/4.0 2022-05-03 2022-05-03 12 58 77 10.51964/hlcs11681 The Groningen Integral History Cohort Database. Development, Design and Output https://hlcs.nl/article/view/12033 <p>The Groningen Integral History project launched in 1987 aimed to sketch the lives of people from various social classes in the Dutch province of Groningen in the 19th and early 20th century. One part was the creation of the Groningen Integral History Cohort Database (GIHCD), reconstructing complete individual life courses of 5,280 persons (RPs) born between 1811 and 1872. The quality of the database has become very high by now, despite the lengthy and difficult process of shaping it over 35 years. More than 98% of the RPs (and for some parts of the database even more than 99%) could be followed until their death or until a migration abroad. Even for the life courses of those moving abroad information is available for most RPs. In this article, we primarily focus on the rural part of the database (n = 4,320), the quality of which is the highest and has had the most significant tangible research impact. Building on information from the Dutch civil registration system (from 1811) and the population registers (from 1850), the database includes multiple individual-level variables. In the technical part of the article, we provide an extensive overview of the available variables and summarize the transformation of the rural part of the database into an Intermediate Data Structure (IDS). Since the early 1990s, historians from the University of Groningen have used GIHCD in quite some publications. At the end of this article, we provide a summary of the main outcomes of these publications.</p> Richard Paping Dinos Sevdalakis Copyright (c) 2022 Richard Paping, Dinos Sevdalakis https://creativecommons.org/licenses/by/4.0 2022-06-20 2022-06-20 12 78 98 10.51964/hlcs12033 The Barcelona Historical Marriage Database and the Baix Llobregat Demographic Database. From Algorithms for Handwriting Recognition to Individual-Level Demographic and Socioeconomic Data https://hlcs.nl/article/view/11971 <p>The Barcelona Historical Marriage Database (BHMD) gathers records of the more than 600,000 marriages celebrated in the Diocese of Barcelona and their taxation registered in Barcelona Cathedral's so-called Marriage Licenses Books for the long period 1451–1905 and the BALL Demographic Database brings together the individual information recorded in the population registers, censuses and fiscal censuses of the main municipalities of the county of Baix Llobregat (Barcelona). In this ongoing collection 263,786 individual observations have been assembled, dating from the period between 1828 and 1965 by December 2020. The two databases started as part of different interdisciplinary research projects at the crossroads of Historical Demography and Computer Vision. Their construction uses artificial intelligence and computer vision methods as Handwriting Recognition to reduce the time of execution. However, its current state still requires some human intervention which explains the implemented crowdsourcing and game sourcing experiences. Moreover, knowledge graph techniques have allowed the application of advanced record linkage to link the same individuals and families across time and space. Moreover, we will discuss the main research lines using both databases developed so far in historical demography.</p> <p>This paper was awarded the <a href="https://population-europe.eu/network/news-network/eshd-announces-years-eshd-award-winners">Louis Henry award</a> from the European Society of Historical Demography. The Louis Henry Award has been established by the European Society of Historical Demography to recognize methodological innovations in data collection, visualization or analysis. </p> <p> </p> Joana Maria Pujadas-Mora Alícia Fornés Oriol Ramos Terrades Josep Lladós Jialuo Chen Miquel Valls-Fígols Anna Cabré Copyright (c) 2022 Joana Maria Pujadas-Mora, Alícia Fornés, Oriol Ramos Terrades, Josep Lladós, Jialuo Chen, Miquel Valls-Fígols, Anna Cabré https://creativecommons.org/licenses/by/4.0 2022-06-23 2022-06-23 12 99 132 10.51964/hlcs11971 Historical Population Database of Transylvania. Sources, Particularities, Challenges, and Early Findings https://hlcs.nl/article/view/12038 <p>The Historical Population Database of Transylvania (HPDT) is a research tool for population studies developed since 2014 at the Centre for Population Studies in Cluj-Napoca, financed by an SEE-Norway Grant. HPDT employs a source-oriented approach for recording data from the parish registers kept by the Transylvanian churches, focusing primarily on the main vital events such as births, marriages, and deaths. The data entry process was followed by the standardization of various information, such as names, occupations, locations and causes of death, thus allowing the initiation of a linkage process. The database has already been employed in a wide-ranging series of analyses conducted on datasets extracted from HPDT, which include infant and adult mortality, nuptiality and age at first marriage, social mobility, and the medicalization of childbirth. The wealth of information it includes will enable many more scientific investigations.</p> Luminița Dumănescu Mihaela Hărăguș Angela Lumezeanu Elena Crinela Holom Nicoleta Hegedűs Daniela Mârza Diana Covaci Ioan Bolovan Copyright (c) 2022 Luminița Dumănescu, Mihaela Hărăguș, Angela Lumezeanu, Elena Crinela Holom, Nicoleta Hegedűs, Daniela Mârza, Diana Covaci, Ioan Bolovan https://creativecommons.org/licenses/by/4.0 2022-06-28 2022-06-28 12 133 150 10.51964/hlcs12038 The Ural Population Project. Demography and Culture From Microdata in a European-Asian Border Region https://hlcs.nl/article/view/12320 <p>The Ural Population Project (URAPP) is built from individual level data transcriptions of 19th- to early 20th-century parish records and mid-19th-century census-like tax revisions manuscripts. This article discusses the source material, the contents, the history of creation and the strategy of the URAPP database and the outcome of the main research topics so far, including historical demography, Jewish studies, indigenous studies and studies of religious minorities in the Urals and Siberia. Our studies of the ethno-religious cultural landscape of the Urals and northwestern Siberia as well as participation in population history projects was more vital backgrounds than the traditional focus on aggregates. The over 65,000 vital events transcribed from parish records of Russian Orthodox Churches and minority religions in and around Ekaterinburg have been the basis for studies of mortality, nuptiality, religion and other characteristics. We found that the Jewish population kept their traditions and connections with relatives in the Pale of Settlement. Prisoners of WWI usually marrying within their own religious group. Infant mortality in Ekaterinburg was lower among Jews and the Catholics, minorities with higher education and western background, while the Orthodox majority exposed their newborn to extremely tough baptism. The burial records show cases of the Spanish flu in 1918–1919, but on a lower level than in the West, supporting recent theories that estimates of flu mortality may be too high. Based on the tax revisions, polygyny was officially recognized among the indigenous Siberian people. The strategy of the URAPP project has evolved from transcribing microdata about minorities towards covering the whole population.</p> Elena Glavatskaya Julia Borovik Gunnar Thorvaldsen Copyright (c) 2022 Elena Glavatskaya, Julia Borovik, Gunnar Thorvaldsen https://creativecommons.org/licenses/by/4.0 2022-07-07 2022-07-07 12 151 172 10.51964/hlcs12320 What was Killing Babies in Ipswich Between 1872 and 1909? https://hlcs.nl/article/view/11592 <p>This paper examines the causes of infant mortality for the port town of Ipswich between 1872 and 1909. Ipswich is the only town in England for which a complete run of computer-readable, individual-level causes of death are available in the late 19th and early 20th century. Our work makes use of the ICD10h coding system being developed to contribute to two projects: Digitising Scotland (University of Edinburgh) and SHiP — Studying the history of Health in Port Cities (Radboud University, Nijmegen). We consider annual and quinquennial mortality rates amongst Ipswich's youngest residents by age, sex, seasonality and cause. The individual causes of death not only offer insight into conditions in the town, but also highlight questions concerning how best to interpret the information provided when both medical terminology and registration practices were changing over the decades of the study. Ipswich infant mortality rates very closely mirrored those of England as a whole, rather than the most unhealthy large cities, such as Liverpool or Manchester. It becomes clear that birth itself was a major cause of neonatal, even some post-neonatal, deaths. While water-food borne diseases killed large numbers in the summer months, it was the ever-present airborne diseases which carried off a greater number of small victims. Although the records offer a rich vein of data to explore, some causes of death, such as convulsions and teething, remain enigmatic and require further research.</p> Eilidh Garrett Alice Reid Copyright (c) 2022 Eilidh, Alice Reid https://creativecommons.org/licenses/by/4.0 2022-07-14 2022-07-14 12 173 204 10.51964/hlcs11592 What was Killing Babies in Hermoupolis, Greece? An Investigation of Infant Mortality Using Individual Level Causes of Death, 1861–1930 https://hlcs.nl/article/view/11601 <p>This paper employs individual level cause of death data from the port city of Hermoupolis on the Greek island of Syros, in order to test the newly-constructed ICD10h coding system. By constructing cause specific death rates for infants from the late 19th century to early 20th century, the paper contributes to a comparative approach, which aims to show how causes of death differ across several locations within Europe and how they develop over time. Given the scarcity of cause of death data both at the individual and aggregate level in Greece roughly prior to the 1920s, the availability of such data in the draft death registers (for sporadic runs of years in the second half of the 19th and early 20th century) and the civil registration (from 1916 onwards) in Hermoupolis provides a deeper understanding of the history of cause-of-death reporting in the country. Infant mortality in Hermoupolis was relatively high throughout the study period, with water-food borne diseases accounting for the highest number of infant deaths, especially during the hot and dry summer months. While the prominent winter peak of neonatal mortality but also congenital-birth disorders could be partially associated with birth seasonality and/or low temperatures over the winter months. Finally, certain vague terms such as 'atrophy' and 'athrepsy', but especially 'drakos' require further investigation until they are firmly understood.</p> Michail Raftakis Copyright (c) 2022 Michail Raftakis https://creativecommons.org/licenses/by/4.0 2022-07-21 2022-07-21 12 205 232 10.51964/hlcs11601 Nominative Linkage of Records of Officials in the China Government Employee Dataset-Qing (CGED-Q) https://hlcs.nl/article/view/11902 <p>We introduce our approach to the nominative linkage of records of Qing officials who were included in the China Government Employee Datasets-Qing (CGED-Q) Jinshenlu (JSL) and Examination Records (ER). We constructed these datasets by transcription of quarterly rosters of civil and military officials produced by the government and by commercial presses, and records of examination degree holders. We assess each of the primary attributes available in the original sources in terms of their usefulness for disambiguation, focusing on their diversity and potential for inconsistent recording. For officials who were not affiliated with the Eight Banners, these primary attributes include surname, given name, and province and county of origin. For the small subset of officials who were affiliated with the Bannermen, we assess the available data separately. We also assess secondary attributes available in the data that may be useful for adjudicating candidate matches. We then describe the approach that we developed that addresses the issues we identified with the primary and secondary attributes. The issues we have identified and the approach that we have developed will be of interest to researchers engaged in similar efforts to construct and link datasets based on elite males in historical China.</p> Cameron Campbell Bijia Chen Copyright (c) 2022 Cameron Campbell, Bijia Chen https://creativecommons.org/licenses/by/4.0 2022-09-08 2022-09-08 12 233 259 10.51964/hlcs11902 Historical Life Courses and Family Reconstitutions. The Scientific Impact of the Antwerp COR*-Database https://hlcs.nl/article/view/12914 <p>The Antwerp COR*-database is a longitudinal micro-level database, which covers all entries from individuals whose last names started with the letters COR (and individuals who shared at some moment in time a household with a COR*-person) from the population registers and the vital registration of births, marriages and deaths for the 19th- and early-20th-century Antwerp district in Flanders, the northern Dutch-speaking part of Belgium. As such the database allows the reconstruction of historical life courses and families, and the analysis of key demographic characteristics and developments regarding marriage, fertility, migration, social mobility, health, mortality and longevity, as well as their interplay within and across households, families and generations. After a short description of the source material and the construction of the database, a review of the literature based on the database is presented in order to provide the reader with an encompassing overview of the research that has been carried out with this database and the knowledge and insights it has generated since its first release in 2010. The article ends with a discussion of potential pathways for future research, including new topics, and future extension of the database through citizen science projects.</p> Paul Puschmann Hideko Matsuo Koen Matthijs Copyright (c) 2022 Paul Puschmann, Hideko Matsuo, Koen Matthijs https://creativecommons.org/licenses/by/4.0 2022-10-07 2022-10-07 12 260 278 10.51964/hlcs12914