Historical Life Course Studies

Unravelling Mortality in Enslavement. Patterns and Determinants of Mortality Among the Enslaved Population of Curaçao, 1839–1863

2024-09-16T08:07:06+02:00

Examining demographic patterns of enslaved populations can shed light on living conditions of people living in slavery. During the 19th century, enslaved populations in Latin America and the Caribbean were characterized by an excess of deaths, indicating the harsh living conditions they were forced to live in. The island of Curaçao — at that time a Dutch colony — however, has formed an exception to this trend, as the enslaved population was characterized by natural increase. Today, newly crowdsourced data of the slave registers of Curaçao enables the examination of the entire enslaved population, which facilitates the study of (changes in) the demographic development on the island. This paper aims to gain insight into patterns of mortality between 1839 and 1863 in order to shed light on the determinants of the exceptional demographic development of the enslaved population of Curaçao. This research examines the mortality rates of the population and explores possible determinants — seasonality, sex and age — of this unique mortality pattern (n = 12,793). Previous research stressed the importance of the fertility pattern of Curaçao as the determinant in the natural increase of the population, however, the results showed that mortality might have played a more decisive role. The relatively low mortality rates allowed fertility rates to exceed, and so, this positive difference between birth and death resulted in a unique pattern of natural increase of an enslaved population in the Caribbean.

Persons in Context. A Model to Represent Observations and Reconstructions of Historical Persons in Linked Data

2024-10-29T10:06:31+01:00

Reconstruction of historical persons and family ties is the bread and butter of many researchers and genealogists. With the increasing digital availability of historical person records, the scope and depth of person reconstructions speaks to the imagination of researchers and genealogists. Yet, the lack of standardisation in the description of historical person data has hurt the interoperability and sustainability of both small and large databases. Persons in Context, or PiCo, presents a data model for historical person records within the Resource Description Framework (RDF). RDF or Linked Data is specifically designed for clear, unambiguous information exchange between multiple parties over the internet. We show how reuse of existing ontologies and concentric description are the building blocks of a flexible, straightforward, and stringent data model that emphasises provenance.

Editorial

2024-10-07T11:10:03+02:00

In this new editorial of Historical Life Course Studies key shifts are discussed. We welcome Joana Maria Pujadas-Mora as new co-editor in chief and Eva van der Heijden as associate editor. We thank Luciana Quaranta for her contributions to the development of the journal. Over the past years, Historical Life Course Studies has experienced substantial growth in terms of output and impact. The journal got indexed in Scopus and an application for admission to the Web of Science is pending. Social media engagement and upcoming video releases further enhance the journal's outreach. Gratitude extends to sponsors for Diamond Open Access support, as well as to all authors, reviewers, and readers who jointly contribute to the success of Historical Life Course Studies. The editors look with confidence to the future and hope to welcome many submissions over the next years.

What was Killing Babies in Rostock? An Investigation of Infant Mortality Using Individual-Level Cause-of-Death Data, 1800–1904

2024-07-09T08:00:20+02:00

This paper examines the causes of infant mortality for the Hanseatic city of Rostock, Germany, between 1800 and 1904. Based on unique individual-level church records from Rostock's largest inner-city parish, St. Jakobi, we apply the novel ICD10h coding system for the first time to the German context. Using this coding system, we analyse cause-specific patterns of infant, neonatal and post-neonatal mortality in an internationally comparable way and bring new insights into the determinants of 19th-century infant mortality, which was shaped by increase and stagnation in wide parts of Germany. Our results show that Rostock experienced a stagnating infant mortality rate at a low level in international comparison during the first 40 years of the 19th century, followed by severe increases during the next 20 years and a stage of slight decline and stagnation towards the end of the study period. This suboptimal development from 1840 was strongly related to post-neonatal mortality and causes of death that are related to unfavourable sanitary conditions and/or poor nutrition, which possibly hints at worsening housing and living conditions following accelerated population growth. Our analyses also reveal that water-food borne diseases were underestimated in Rostock, even though symptomatic disease terms such as convulsions and teething, that were frequently recorded over much of the 19th century, had deviating seasonality patterns and thus cannot entirely refer to this disease group but rather to a wide field of different diseases. The applied coding scheme is a significant step forward to foster comparative international research on historical cause-specific mortality.

The Life Course of 20th-Century Lyon Silk Workers. A Pilot Study

2024-07-23T08:18:34+02:00

At the time of her death in October 2002, Dr. Tamara Hareven was in the process of completing a large cross-cultural examination of the global declines in the silk and textile industries. A small sample of her interview data transcripts from canuts in Lyon have, more than 20 years after her death, been translated into English and coded for themes as a pilot study of a larger data set. Six themes emerged from the participants' data. Participants sensed that the industry was disappearing, that the industry was something that was looked at as a historical artifact to be studied rather than a profession, and that there was not enough being done to encourage young people to enter the industry. Gender disparities within the industry continued to a lesser extent than before the 20th century began, but still seemed profound, especially as girls who were recruited for apprenticeships were often minors when they were moved away from their families. The apprenticeship conditions continued to be less than desirable well into the 20th century. Economically, the silk industry is often poorly paid and vulnerable to economic crises as fashion and world economics change. Large social changes often had impacts on the family life of the silk worker families. Finally, just as economics tended to ebb and flow for the silk industry, so did the labor conditions.

More Efficient Manual Review of Automatically Transcribed Tabular Data

2024-10-09T09:00:11+02:00

Any machine learning method for transcribing historical text requires manual verification and correction, which is often time-consuming and expensive. Our aim is to make it more efficient. Previously, we developed a machine learning model to transcribe 2.3 million handwritten occupation codes from the Norwegian 1950 census. Here, we manually review the 90,000 codes (3%) for which our model had the lowest confidence scores. We allocated these codes to human reviewers, who used our custom annotation tool to review them. The reviewers agreed with the model's labels 31.9% of the time. They corrected 62.8% of the labels, and 5.1% of the images were uncertain or assigned invalid labels. 9,000 images were reviewed by multiple reviewers, resulting in an agreement of 86.4% and a disagreement of 9%. The results suggest that one reviewer per image is sufficient. We recommend that reviewers indicate any uncertainty about the label they assign to an image by adding a flag to their label. Our interviews show that the reviewers performed internal quality control and found our custom tool to be useful and easy to operate. We provide guidelines for efficient and accurate transcription of historical text by combining machine learning and manual review. We have open-sourced our custom annotation tool and made the reviewed images open access.

Introduction: Content, Design and Structure of Major Databases with Historical Longitudinal Population Data

2024-10-14T11:57:59+02:00

In recent years the development of historical databases reconstructing the lives of large populations accelerated. These considerable investments of time and money have greatly expanded possibilities for new research in history, demography, sociology, economics, and other disciplines. This special issue describes the content and design of 23 important historical databases. Authors were given the freedom to discuss a range of practical and technical decisions from evaluating archival sources to crowdsourcing data entry. The most common issue is nominative record linkage, but we find different choices between semi-automatic and fully automatic linkage techniques and various approaches for connecting diverse sources. Some databases describe special problems, like linking Chinese names, handwritten text recognition or the construction of a release in IDS-format. Other databases offer detailed descriptions of sources or discuss prospects for including new datasets.

Geneva. An Urban Sociodemographic Database

2023-07-11T14:28:53+02:00

The Geneva databases are a data resource covering the period 1800–1880 for the city of Geneva, and occasionally the canton of Geneva. The research team adopted an alphabetical sampling approach, collecting data on individuals whose surname begins with the letter B. The individuals and households belonging to this sample in six population censuses between 1816 and 1843 were digitised and linked. A second database collected marriage and divorce records for the period 1800–1880. A third collection of data included residence permits. All these sources were used for a massive reconstitution of families. This article presents the sources, the linking methods, the typologies used to code places and occupations, to study household structures and forms of solitude. Combined with qualitative information extracted from the archives of public administrations and the National Protestant Church, as well as from newspapers, these databases were used to study the transformation of a medium-sized European city, sociopolitical tensions embedded in demographic and social structures, and the impact of the immigrants who made the 'Calvinist Rome' a religiously mixed city.

Slavery in Suriname. A Reconstruction of Life Courses, 1830–1863

2024-10-10T10:36:54+02:00

The slavenregisters or slave registers of Suriname offer a unique perspective on the social and demographic history of a people in bondage. Thanks to a citizen science project, the archival sources were transcribed in 2017 by hundreds of volunteers. The transcriptions were used to create a longitudinal database of more than 90,000 enslaved persons. This paper describes the sources, data entry, and cleaning to create a standardized database as well as the matching needed to construct life courses. We discuss the best practices we have learned along the way. Finally, it offers prospects for research and expansion of the database to other population sources and areas.

Introduction: Major Databases with Historical Longitudinal Population Data: Development, Impact and Results

2023-06-01T11:06:52+02:00

Over the last 60 years several major historical databases with reconstructed life courses of large populations spanning decades have been launched. The development of these databases is indicative of considerable investments that have greatly expanded the possibilities for new research within the fields of history, demography, sociology, as well as other disciplines. In this volume spanning seven articles, eight databases are included that have had a wide impact on research in various disciplines. Each database had its own unique genesis that is well described in the articles assembled in this volume. They inform readers about how these databases have changed the course of research in historical demography and related disciplines, how settled findings were challenged or confirmed, and how innovative investigations were launched and implemented. In the end we explore how research with this kind of databases will develop in future.

LINKS. A System for Historical Family Reconstruction in the Netherlands

2023-06-01T11:06:52+02:00

LINKS stands for 'LINKing System for historical family reconstruction' and is a software system to link nominal data from the Dutch archives and ultimately reconstruct historical individuals and families. We present the background and philosophy of this matching system and explain its data structure and functioning. Currently the core data of the LINKS system consists of indexed civil certificates. These certificates are available from 1812 — the start of the Dutch Vital Registration — until the year they are confidential based on privacy laws. For more than 20 years, thousands of volunteers have been working to build this index, which contains not only the names of newborn, married and deceased persons, but also the names of their parents, places of birth, ages and sometimes their occupational titles. The software system LINKS includes the standardization of all input before linking, nominal record linkage procedures and identification of all unique persons involved in the system. All processes are repeatable and a strict distinction is maintained between source data, standardized, linked and enriched data and released data. Moreover, LINKS also informs archives about all kinds of errors and inconsistencies found during the cleaning and matching process. We will discuss two matching systems, the first is the original querying system that runs within a MySQL database environment and the second is a newly developed system, called burgerLinker, which is based on knowledge graphs and which is designed as a system that can be used independently from LINKS and is made available as open source software. Finally, we present the most important releases of LINKS data so far: two national releases that link birth and parental marriage certificates, creating families and pedigrees and an integrated dataset of persons, families and family trees in four provinces.

The Development of Microhistorical Databases in Norway. A Historiography

2023-05-11T12:21:49+02:00

Norwegian work on microdata started out with the full count 1801 census and census and vital records from around the capital. Today, most census and ministerial records from 1801 until the mid-20th century have been scanned, transcriptions are being completed, much is encoded and made available via the websites of the Digital National Archives and UiT The Arctic University of Norway. This article complements a previous publication on empirical results from historical microdata. It is primarily organized by technical issues: digitization of source materials, encoding and standardization, building of the Historical Population Register for the period since 1800, record linkage and source criticism as well as GIS. Presently, partner institutions are building the Historical Population Register with prolonged support from the Norwegian Research Council. This will contain longitudinal records of the nine million persons who lived in Norway since 1800. The register increasingly makes it possible to follow the entire population. Unique personal IDs with corresponding URLs to the person page providing links to many sources introduce a new level of historical documentation. Cross-sectional and vital records are being interlinked with automatic and manual record linkage software. Longitudinal data is available for searching as timelines and in Intermediate Data Structure format from UiT The Arctic University and for searching at Histreg.no, which also caters for manual editing. We are well on the way to creating a database that can fill the void in the two centuries before the Central Population Register starts in 1964.

PRDH and IMPQ 1800–1849 Quebec Historical Family Reconstitution. Content, Design and Biographical Completeness

2024-10-09T09:04:46+02:00

Since 1966, the Programme de recherche en démographie historique (PRDH) has worked to create comprehensive genealogical data of the Quebec population. The PRDH longitudinal database, the Registre de la population du Québec ancien (RPQA), draws upon the French Catholic parish registers of the St. Lawrence Valley as its main source material. This family reconstitution covers the French Catholic population of Quebec up to 1799, along with deaths after 1800 of persons born before 1750. Subsequent partnerships with l’Institut Généalogique Drouin, FamilySearch and Ancestry as well as collaboration on the 2011–2017 Infrastructure intégrée des microdonnées historiques de la population du Québec (1621–1965) (IMPQ) project enabled the PRDH to continue efforts to reconstitute the French Catholic population up to 1849. Despite these advances, pushing family reconstitution forward to the mid-19th century has forced the PRDH team to reckon with the increasingly mixed and geographically mobile Quebec population of the 19th and early 20th centuries. This article describes the content and design of the RPQA database, detailing the structure of the RPQA relational database and the breadth of variables available for data management and analysis. It then describes features of the IMPQ extension of family reconstitution from 1800 to 1849, including observational protocols necessary to use these data and consideration of data completeness after 1800. At the same time, the article addresses the fundamental question, "what is my population?" as part of a broader reflection upon the target population encompassed by these data.

What was Killing Babies in Amsterdam? A Study of Infant Mortality Patterns Using Individual-Level Cause of Death Data, 1856–1904

2023-08-21T10:37:52+02:00

Based on unique individual-level cause of death data, this article presents an analysis of the development of infant mortality and the underlying cause of death pattern in the city of Amsterdam in the period 1856–1904. We contribute to the discussion on the development of infant mortality and its determinants and test the newly-constructed ICD10h coding system. First, our results demonstrate that the ICD10h and groupings of causes worked quite well for our period and city data. Second, Amsterdam moved from being one of the most lethal cities in the country to one of the healthiest for infants. These improvements in the fate of infants were brought about despite faltering progress in the provision of piped water, and an absence of modern sewerage throughout the period. For the entire period air-borne diseases were a prominent cause of death category, peaking in the 1880s and still making up the major group of diseases by 1904. Water- and food related ailments were also dominating the epidemiological pattern after the 1870s. Vague or ill-defined disease terms were frequent at the start of the study period. These observations suggest that physicians were increasingly better able and more prepared to formulate more precise disease terms by the 1900s. The seasonality analysis of the different disease groups demonstrates strong summer effects on the group of water- and food related causes of death. It testifies to the shortcomings in the city’s hygienic situation and limited breastfeeding.

Genetic and Shared-Environment Effects on Stature and Lifespan. A Study of Dutch Birth Cohorts (1785–1920) Based on Genealogies

2023-08-28T08:36:22+02:00

Historical demography is generally concerned with the changing economic, social and normative contexts of human behaviour and health outcomes. To most historical demographers, the 'genetic' component of behaviour and health is either unknown or assumed to be constant. However, several studies point at the shift over time in the relative importance of environment and genes: in periods and social groups with strong normative or economic constraints on behaviour, the 'genetic potential' is often not realized. Therefore, to some extent, the waning of environmental constraints on heritability plays a role in changes in demographic outcomes over time. Determining the relative importance of heritability versus shared environment in historical populations for which only genealogies are available poses a challenge. Kin may live in different periods, and in different cultural and social settings. This explorative paper analyses the association between heights of conscripted relatives, as well as their life span. I estimate how the associations are affected by respectively genetic relatedness, shared historical period and shared social and geographical environment. Furthermore, I make a distinction between kin related via the mother versus kin related via the father. All kinds of kin are involved in the analysis: (half, full and twin) brothers, fathers, grandfathers, uncles and cousins. The data consist of about 3,000 men culled from Texel island genealogies, which also include descendants of families who had left the island. Life span has a weak, but still discernible, genetic element. The heritability of height is much stronger, especially at age 19/20. The correlations of mother’s kin with her son's heights are stronger than those of her husband's kin. The analysis does not yield a consistent effect of a protective environment on kin correlations in either height or life span.

Construction of the Finnish Army in World War II Database

2023-01-23T09:03:54+01:00

This article introduces the Finnish Army in World War II Database (FA2W) currently under construction that is being built to study the effects of World War II on Finnish society. The database is a stratified sample of 4,253 representative of the men who served in the Finnish Army in World War II. The data have been gathered from the military service record collection of the Finnish Army, which holds files on practically all draft-age Finnish men of the birth cohort 1903–1926 and around 70% of the birth cohorts 1897–1902. The amount of data is extensive, containing over 60 different variables. The main part of the database consists of men's military careers, comprising longitudinal data on their positions in society and in the army (e.g., civilian/conscript/frontline service), military unit, military branch, task, rank, and service class. Other information includes socio-economic information from the draft and wartime and war experiences, such as wounds, illnesses, medical treatments, death, and honors. In the future the database will be expanded with men’s postwar life trajectories to study the long-term effects of the war.

Collaborations Between IPUMS and Genealogical Organizations, 1999-2022

2023-01-17T13:40:46+01:00

From 1999 to 2019, IPUMS collaborated with genealogical organizations to develop massive individual-level census datasets spanning the 1790 through 1940 period, and we are currently working on the 1950 census. This research note describes how our genealogical collaborations came about. We focus on our collaborations with the Church of Jesus Christ of Latter-Day Saints Family and Church History Department (later known as FamilySearch) and the private genealogical companies HeritageQuest and Ancestry.com.

Historical Life Courses and Family Reconstitutions. The Scientific Impact of the Antwerp COR*-Database

2022-10-07T10:08:25+02:00

The Antwerp COR*-database is a longitudinal micro-level database, which covers all entries from individuals whose last names started with the letters COR (and individuals who shared at some moment in time a household with a COR*-person) from the population registers and the vital registration of births, marriages and deaths for the 19th- and early-20th-century Antwerp district in Flanders, the northern Dutch-speaking part of Belgium. As such the database allows the reconstruction of historical life courses and families, and the analysis of key demographic characteristics and developments regarding marriage, fertility, migration, social mobility, health, mortality and longevity, as well as their interplay within and across households, families and generations. After a short description of the source material and the construction of the database, a review of the literature based on the database is presented in order to provide the reader with an encompassing overview of the research that has been carried out with this database and the knowledge and insights it has generated since its first release in 2010. The article ends with a discussion of potential pathways for future research, including new topics, and future extension of the database through citizen science projects.

What was Killing Babies in Trondheim? An Investigation of Infant Mortality Using Individual Level Cause of Death Data, 1830–1907

2023-03-02T12:45:32+01:00

This paper examines infant mortality amongst newborns in Trondheim city, 1830–1907, working specifically with individual level cause of death data. Findings show that infant mortality in the city started to drop from 1895, primarily as a result of a decline in post-neonatal mortality. At the start of the decline air-borne diseases accounted for nearly half of the deaths, and water-food borne for around one third. The drop was predominantly driven by a decline in these two causal groups, and seasonal fluctuations became less pronounced. Because of the fall in post-neonatal mortality, the relative risk of dying amongst neonates rose towards the end of the period. Although 'convulsions' accounted for 50–70% of all infant deaths between 1830 and 1860, this cause had faded away to near insignificance by the beginning of the 1900s. Here we aim to assess the extent to which this particular aspect of decline can be explained by alterations to official instructions regarding registration and in registration practice itself. This article proposes that the decline in deaths from 'convulsions' can be explained by a relabelling of such deaths into 'congenital and birth disorders' amongst neonates, and a mix of 'water-food borne' and 'air-borne diseases' amongst post-neonates. This argument is supported by the fact that the timing of the decline corresponds with the introduction of cause of death certificates issued by medical practitioners, and which most likely resulted in fewer causes of death being reported by lay informants who could only offer vague symptoms rather than informed diagnoses.

The Ural Population Project. Demography and Culture From Microdata in a European-Asian Border Region

2022-07-07T11:01:47+02:00

The Ural Population Project (URAPP) is built from individual level data transcriptions of 19th- to early 20th-century parish records and mid-19th-century census-like tax revisions manuscripts. This article discusses the source material, the contents, the history of creation and the strategy of the URAPP database and the outcome of the main research topics so far, including historical demography, Jewish studies, indigenous studies and studies of religious minorities in the Urals and Siberia. Our studies of the ethno-religious cultural landscape of the Urals and northwestern Siberia as well as participation in population history projects was more vital backgrounds than the traditional focus on aggregates. The over 65,000 vital events transcribed from parish records of Russian Orthodox Churches and minority religions in and around Ekaterinburg have been the basis for studies of mortality, nuptiality, religion and other characteristics. We found that the Jewish population kept their traditions and connections with relatives in the Pale of Settlement. Prisoners of WWI usually marrying within their own religious group. Infant mortality in Ekaterinburg was lower among Jews and the Catholics, minorities with higher education and western background, while the Orthodox majority exposed their newborn to extremely tough baptism. The burial records show cases of the Spanish flu in 1918–1919, but on a lower level than in the West, supporting recent theories that estimates of flu mortality may be too high. Based on the tax revisions, polygyny was officially recognized among the indigenous Siberian people. The strategy of the URAPP project has evolved from transcribing microdata about minorities towards covering the whole population.

Cause-Specific Infant Mortality in Copenhagen 1861–1911 Explored Using Individual-Level Data

2024-10-10T10:35:57+02:00

This study explores cause-specific infant mortality in Copenhagen between 1861 and 1911, using newly available individual-level data from The Copenhagen Burial Register, as part of a larger comparative project within the SHiP network (Studying the history of Health in Port Cities). The aim is to determine the dominant cause of death patterns for infants and to explore how the ICD10h coding system performs with the Danish individual level-historical causes of death. The results show that in Copenhagen, infant mortality began a distinct decline during the period of study (1861–1911), but the city experienced only very few changes in the cause of death pattern. While a transition from symptomatic to more specific causes of death took place over time, the largest killers overall were the water-food borne and airborne diseases, with a respectively summer and winter peak. The airborne and water-food borne diseases were mainly dominant amongst the post-neonates, whose mortality made up an increasingly larger share of infant deaths. Finally, the results show that although coding the Danish causes of death to the ICD10h has proven successful, more attention needs to be paid to different uses of the same cause of death by different nations, such as the case of atrophy.

The Demographic Database — History of Technical and Methodological Achievements

2023-03-30T08:54:31+02:00

The Demographic Data Base (DDB) at the Centre for Demographic and Ageing Research (CEDAR) at Umeå University has since the 1970s been building longitudinal population databases and disseminating data for research. The databases were built to serve as national research infrastructures, useful for addressing an indefinite number of research questions within a broad range of scientific fields, and open to all academic researchers who wanted to use the data. A countless number of customized datasets have been prepared and distributed to researchers in Sweden and abroad and to date, the research has resulted in more than a thousand published scientific reports, books, and articles within a broad range of academic fields. This article will focus on the development of techniques and methods used to store and structure the data at DDB from the beginning in 1973 until today. This includes digitization methods, database design and methods for linkage. The different systems developed for implementing these methods are also described and to some extent, the hardware used.

Historical Population Database of Transylvania. Sources, Particularities, Challenges, and Early Findings

2022-06-28T10:54:04+02:00

The Historical Population Database of Transylvania (HPDT) is a research tool for population studies developed since 2014 at the Centre for Population Studies in Cluj-Napoca, financed by an SEE-Norway Grant. HPDT employs a source-oriented approach for recording data from the parish registers kept by the Transylvanian churches, focusing primarily on the main vital events such as births, marriages, and deaths. The data entry process was followed by the standardization of various information, such as names, occupations, locations and causes of death, thus allowing the initiation of a linkage process. The database has already been employed in a wide-ranging series of analyses conducted on datasets extracted from HPDT, which include infant and adult mortality, nuptiality and age at first marriage, social mobility, and the medicalization of childbirth. The wealth of information it includes will enable many more scientific investigations.

The Groningen Integral History Cohort Database. Development, Design and Output

2022-06-28T09:06:57+02:00

The Groningen Integral History project launched in 1987 aimed to sketch the lives of people from various social classes in the Dutch province of Groningen in the 19th and early 20th century. One part was the creation of the Groningen Integral History Cohort Database (GIHCD), reconstructing complete individual life courses of 5,280 persons (RPs) born between 1811 and 1872. The quality of the database has become very high by now, despite the lengthy and difficult process of shaping it over 35 years. More than 98% of the RPs (and for some parts of the database even more than 99%) could be followed until their death or until a migration abroad. Even for the life courses of those moving abroad information is available for most RPs. In this article, we primarily focus on the rural part of the database (n = 4,320), the quality of which is the highest and has had the most significant tangible research impact. Building on information from the Dutch civil registration system (from 1811) and the population registers (from 1850), the database includes multiple individual-level variables. In the technical part of the article, we provide an extensive overview of the available variables and summarize the transformation of the rural part of the database into an Intermediate Data Structure (IDS). Since the early 1990s, historians from the University of Groningen have used GIHCD in quite some publications. At the end of this article, we provide a summary of the main outcomes of these publications.

The Barcelona Historical Marriage Database and the Baix Llobregat Demographic Database. From Algorithms for Handwriting Recognition to Individual-Level Demographic and Socioeconomic Data

2024-10-07T11:11:33+02:00

The Barcelona Historical Marriage Database (BHMD) gathers records of the more than 600,000 marriages celebrated in the Diocese of Barcelona and their taxation registered in Barcelona Cathedral's so-called Marriage Licenses Books for the long period 1451–1905 and the BALL Demographic Database brings together the individual information recorded in the population registers, censuses and fiscal censuses of the main municipalities of the county of Baix Llobregat (Barcelona). In this ongoing collection 263,786 individual observations have been assembled, dating from the period between 1828 and 1965 by December 2020. The two databases started as part of different interdisciplinary research projects at the crossroads of Historical Demography and Computer Vision. Their construction uses artificial intelligence and computer vision methods as Handwriting Recognition to reduce the time of execution. However, its current state still requires some human intervention which explains the implemented crowdsourcing and game sourcing experiences. Moreover, knowledge graph techniques have allowed the application of advanced record linkage to link the same individuals and families across time and space. Moreover, we will discuss the main research lines using both databases developed so far in historical demography.

This paper was awarded the Louis Henry award from the European Society of Historical Demography. The Louis Henry Award has been established by the European Society of Historical Demography to recognize methodological innovations in data collection, visualization or analysis.

Nominative Linkage of Records of Officials in the China Government Employee Dataset-Qing (CGED-Q)

2022-09-09T10:24:18+02:00

We introduce our approach to the nominative linkage of records of Qing officials who were included in the China Government Employee Datasets-Qing (CGED-Q) Jinshenlu (JSL) and Examination Records (ER). We constructed these datasets by transcription of quarterly rosters of civil and military officials produced by the government and by commercial presses, and records of examination degree holders. We assess each of the primary attributes available in the original sources in terms of their usefulness for disambiguation, focusing on their diversity and potential for inconsistent recording. For officials who were not affiliated with the Eight Banners, these primary attributes include surname, given name, and province and county of origin. For the small subset of officials who were affiliated with the Bannermen, we assess the available data separately. We also assess secondary attributes available in the data that may be useful for adjudicating candidate matches. We then describe the approach that we developed that addresses the issues we identified with the primary and secondary attributes. The issues we have identified and the approach that we have developed will be of interest to researchers engaged in similar efforts to construct and link datasets based on elite males in historical China.

Building an Archival Database for Visualizing Historical Networks. A Case for Pre-Modern Korea

2022-06-28T09:06:57+02:00

In this paper, we share the experience of collecting and organizing pre-modern Korean historical materials into a searchable digital archive. The Ajou Interdisciplinary Research Group (AIRG) has continuously collected historical data of pre-modern Korea for the past 10 years to assist the study of family history, historical demographics, and social mobility. This paper describes the rich data sources for historical studies of Korea, such as household registers, genealogies, and state examination registers, and we summarize contributions to the study of historical demography and related fields.

What was Killing Babies in Hermoupolis, Greece? An Investigation of Infant Mortality Using Individual Level Causes of Death, 1861–1930

2023-02-20T12:45:49+01:00

This paper employs individual level cause of death data from the port city of Hermoupolis on the Greek island of Syros, in order to test the newly-constructed ICD10h coding system. By constructing cause specific death rates for infants from the late 19th century to early 20th century, the paper contributes to a comparative approach, which aims to show how causes of death differ across several locations within Europe and how they develop over time. Given the scarcity of cause of death data both at the individual and aggregate level in Greece roughly prior to the 1920s, the availability of such data in the draft death registers (for sporadic runs of years in the second half of the 19th and early 20th century) and the civil registration (from 1916 onwards) in Hermoupolis provides a deeper understanding of the history of cause-of-death reporting in the country. Infant mortality in Hermoupolis was relatively high throughout the study period, with water-food borne diseases accounting for the highest number of infant deaths, especially during the hot and dry summer months. While the prominent winter peak of neonatal mortality but also congenital-birth disorders could be partially associated with birth seasonality and/or low temperatures over the winter months. Finally, certain vague terms such as 'atrophy' and 'athrepsy', but especially 'drakos' require further investigation until they are firmly understood.

What was Killing Babies in Palma, Spain? Analysing Infant Mortality Patterns Using Individual-Level Cause of Death Data, 1836–1930

2024-10-07T10:03:39+02:00

This paper explores infant mortality patterns to determine the epidemiological profile of the port city of Palma, Spain between 1836 and 1930 using individual-level cause of death and testing the newly constructed ICD10h coding system. Throughout the 19th century, infant mortality was well below 150 per 1,000 live births, possibly related to the practice of extended breastfeeding and frequent vaccination campaigns. However, between 1840 and 1860, as in other Spanish and European cities, the situation deteriorated. From the 1890s to 1930, the rate was almost always below 100. Post-neonatal mortality was higher than neonatal mortality, and the two rates began to fall at different times: the former in the 1870s and the latter in the 1920s. The main causes of neonatal mortality were congenital and birth disorders, while for post-neonatal mortality they were infectious diseases, mainly airborne, followed by waterborne and foodborne diseases. The decline in these rates was influenced by several factors, including improvements to public hygiene and nutrition and the quantity and quality of water sources. With regard to sex, a more pronounced female advantage was observed in post-neonatal mortality than in neonatal mortality. The seasonality of neonatal mortality in the 19th century was characterised by two peaks in autumn and winter, possibly related to the seasonality of births. In the 20th century, a summer peak was also observed. Post-neonatal mortality showed a sharp peak in summer, which receded and gave way to a winter peak by the late 1880s.

The Utah Population Database. A Model for Linking Medical and Genealogical Records for Population Health Research

2022-06-28T09:06:57+02:00

Improving our understanding of the socio-environmental and genetic bases of disease and health outcomes among individuals, families, and populations over time requires extensive longitudinal data on multiple attributes for entire communities, states or nations. This requirement can be difficult to achieve. In this paper we describe a successful example of a database that meets these needs. The Utah Population Database (UPDB) is a unique and powerful database rarely found in the world that has been addressing these data requirements for over 40 years. The UPDB at the University of Utah is one of the world’s richest sources of in-depth information that supports research on genetics, epidemiology, demography, history, and public health. Genetic researchers have used UPDB to identify and study individuals and families that have higher than normal incidence of diseases or other traits, to analyze patterns of genetic inheritance, and to identify specific genetic mutations. Demographers and other social scientists are increasingly using the UPDB to study issues such as trends in fertility transitions and shifts in mortality patterns for both infants and adults. A central component of the UPDB is an extensive set of Utah family histories, in which family members are linked to demographic and medical information. The UPDB includes medical information about cancer, causes of death, and medical details associated with births. It also includes diagnostic records from statewide insurance claims data and healthcare facilities (hospital discharge, ambulatory surgery, emergency department encounters). UPDB is also linked to Medicare claims data, a federal health insurance program generally for persons age 65 or older. The UPDB provides access to information on more than 11 million individuals and supports nearly 400 research projects. We describe in detail the data components of the UPDB, how it can be accessed, issues related to its development, record linkage, governance and privacy protections, as well as plans for future developments.