From Matched Certificates to Related Persons

e-ISSN: 2352-6343 PID article: https://hdl.handle.net/10622/23526343-2020-0006 PID R-scripts: https://hdl.handle.net/10622/23526343-2020-0007 The article can be downloaded from here. © 2020, Mourits, van Dijk, Mandemakers This open-access work is licensed under a Creative Commons Attribution 4.0 International License, which permits use, reproduction & distribution in any medium for non-commercial purposes, provided the original author(s) and source are given credit. See http://creativecommons.org/licenses/.

In historical research, large-scale demographic databases employing automated matching procedures are becoming increasingly common (Kaplanis et al., 2017;Song & Campbell, 2017). Traditionally, these databases have been used to reconstruct families and their life courses -in so-called family reconstitutionsfor the study of past demographic behaviour (Fauve-Chamoux, Bolovan, & Sogner, 2016). More recently, historical databases have also been used to study the intergenerational transmission of fertility, mortality, social status, etc. (see e.g. Bras, Van Bavel, & Mandemakers, 2013;Jennings, Sullivan, & Hacker, 2012;Knigge, 2016;Reher, Ortega, & Sanz-Gimeno, 2008;Quaranta & Sommerseth, 2018;van den Berg et al., 2019). Yet, historical databases on full populations of provinces or states are only available for a limited number of places in the world -such as Québec, Utah, and parts of Sweden -and are generally based on church records (Song & Campbell, 2017).
For the Netherlands, a rich new data source has become available: LINKS-Zeeland (Mandemakers & Laan, 2017). This dataset contains indexes of digitised birth, marriage, and death certificates from the Dutch civil registry for the full population of the province of Zeeland 1 , and relational information between individuals identified on the certificates based on name-matching algorithms. 2 The Dutch civil registry is a high-quality historical demographic source, as from the introduction of the system the willingness to register was high, information on the certificates was highly standardised by national laws, uniformity in registration was enforced by strong procedural checks, and civil records were made in duplicate to ensure safekeeping (Vulsma, 1988). In the near future, the LINKS dataset will be expanded to include the life courses and family relations from more provinces of the Netherlands. About 700,000 birth, 200,000 marriage, and 650,000 death certificates from Zeeland are matched through indexed names and event dates. This allows for the reconstruction of (parts of) over 600,000 life courses and up to seven generations of families, starting in 1812. About 150 years of certificates have been included in the relational database: LINKS-Zeeland. To be able to study life courses or families, LINKS-Zeeland needs to be transformed from the LINKS output structure into a database suitable for research with statistical analyses: LINKS-gen.
We present how LINKS can be transformed into a 'pedigree format', which is the standard data format in the biomedical sciences for intergenerational studies (Lange et al., 2013;Purcell et al., 2007) and contains identifiers for individuals and their mother and father, as well as other demographic and relational information. The transformation of LINKS-Zeeland into one rectangular file, containing relational information and life-course observations, is a complicated endeavour. For the user, a large time investment is required to transform LINKS-Zeeland before they can perform their statistical analyses. For example, users need to decide how they deal with conflicting matches, unmatched certificates, and incomplete life course or family reconstructions. This is not only inconvenient for users of LINKS-Zeeland, but it also opens the door to variations in how users select cases and reconstruct families. Every new user will operationalise LINKS-Zeeland in a different fashion, which eventually hampers the comparability of scientific results (Ashkpour, 2019, p. 181;Garijo et al., 2014). Moreover, limitations in LINKS-Zeeland are not always immediately clear. Data in LINKS-Zeeland is of high quality, but only if the right selections are made on the data (van den Berg et al., 2020). Yet, flags for likely incomplete or complete family and life course reconstructions cannot be directly retrieved from the database, and have to be implemented as separate variables. To make LINKS-Zeeland more readily available for research, we present LINKSgen: one rectangular data frame that is ready for statistical analyses and contains basic demographic indicators and flags for quality control. The standardisation increases the clarity and straightforwardness of LINKS-Zeeland, making it more attractive to researchers who wish to use it for research.
In this paper, we briefly describe the construction of LINKS-Zeeland and present how LINKS-Zeeland can be transformed into LINKS-gen which contains life course and family reconstructions. First, we give a short overview of the civil registry and its digitisation. Second, we briefly discuss the matching procedure that links birth, marriage, and death certificates within and between generations. This includes an explanation of how relationships between persons were established within and between generations. Third, we introduce a script that uses the established matches between certificates to 1 For more information about the history of Zeeland, see Bras (2002), Mourits and Janssens (2020,forthcoming), Priester (1998), and Wintle (1985  , and 650,000 death certificates . The number of available certificates will increase over time, as new certificates become public every year. LINKS-Zeeland is part of a larger endeavour to reconstruct the population of the entire Netherlands. The LINKing System for historical family reconstruction (LINKS) is an algorithm to match civil certificates from the Dutch civil registry. The Dutch civil registry was founded in the French period and is one of the oldest in the world. The early adaptations of the civil registry go back to July 12th 1796 for areas that were annexed by France, in the most southern part of Zeeland, and in Limburg. 5 In the rest of the country, the civil registry was introduced in 1811. However, it took until January 1st 1812 before the civil registry covered the entire country, including the unannexed part of Zeeland (Vulsma, 1988 An overview of the steps between the original documents to the LINKS-gen database is shown in Figure 1. Certificates from the civil registries are indexed by volunteers from the regional archives. These indexes at least contain the names and ages of newborns, newly-weds, and the deceased, as well as the name of their parents and -if available -spouse. The names and ages of index persons, their spouse, and their parents on civil certificates are matched, so that families can be reconstructed (Mandemakers, Bloothooft, & Laan, forthcoming;Mandemakers & Laan, 2017;Raad et al., 2020;Schraagen, 2014). This provides the user with a relational database in which the relations between an individual's birth, marriage, and death certificates are used to reconstruct life courses. This very basic life course is further developed by using the relations between an individual's marriage certificate with the birth, marriage, and death certificates of his/her children, and further adding of occupational and location data (see documentation by Mandemakers and Laan (2017) for a detailed description). In this paper, we show how this information is restructured into LINKS-gen, a data set that is ready for analysis. 3 The script to build the database is also available on GitHub or the EHPS-network repository and uses R, open source software that is freely available. LINKS-gen can be exported to any other software package. 4 LINKS is available to researchers via the IISG. Please contact Kees Mandemakers (kma@iisg.nl), Auke Rijpma (a.rijpma@uu.nl), or Richard Zijdeman (r.zijdeman@iisg.nl) for access to the database, or with questions or feedback regarding LINKS-Zeeland, LINKS-gen, or the LINKS project. 5 Most of Limburg was annexed on July 12th 1796 in the Départment de la Meuse-Inférieure, other parts of Limburg were incorporated in the Départment de la Roër on May 1st 1798 (Vulsma, 1988).

Figure 1
Steps in building LINKS datasets for research from the civil certificates  & Mandemakers, 2014). In this paper, we briefly summarise the pipeline to more exhaustively discuss the restructuring of the final dataset, which is made available as a csv-file in pedigree format.
Ever since its implementation, the Dutch civil registry has been of high quality, as standards for the municipal bureaucratic procedure were very precise. Birth, marriage, and death certificates were kept in separate books, made in duplicate, controlled by local judiciaries, and stored at separate locations (see https://iisg.amsterdam/en/hsn/data/sources for a short description of the administrative systems in the Netherlands or Vulsma (1988) for a more extensive overview). Initially, forms were handwritten, but between 1815 and 1850 most municipalities switched to pre-printed forms. Due to the precise administrative system, the recorded vital events are likely to represent population vital events well. However, the civil registry only includes information on family ties that (1) left a paper trail and (2) were in accordance with the law. As a result, unmarried cohabitation is not in the archives, as partners were not legally married. Furthermore, pre-and extramarital children were not matched to the father, unless the father was legally able to marry the mother -i.e. was not married himself -and acknowledged the child. These acknowledgements were added as a side note in the margin of the civil certificates. Previous studies on a sample of the Zeeland and Noord-Holland population showed that this can affect about 5% of all parents (Kok, 1991;van den Berg et al., 2020), as 2% of all parents conceived children before they were married and another 3% of all parents had children with a partner that they never married (van den Berg et al., 2020). To complicate matters further, acknowledgement of a child does not necessarily mean that the registered father is the biological father. Sometimes, upon marriage a spouse would acknowledge his wife's existing children to ensure their inheritance rights, even though there was no biological relationship. Thus, although most information contained in the civil certificates reflect the relations between the individuals quite well, imprecisions and oversights are part of the civil registry system. Table 1 gives an overview of the information that civil servants were legally obliged to include on each civil record. Birth certificates contain the date, time, and place of birth; sex and full name of the newborn; the full name, occupation, and place of residence of the parents; and the full name, age, occupation, and place of residence of the witnesses. Marriage certificates contain the full name, age, place of birth, occupation, place of residence, and full names of former partners of both spouses; full name, age, occupation, and place of birth of the parents; full name, age, occupation, and place of residence of the witnesses; and whether children were acknowledged. Death certificates contain the full name, age, occupation, place of residence, and time of death of the deceased; full name, age, occupation, place of residence, family relation of the witnesses; and the place of birth; full name of the (deceased) spouse; as well as the full name, occupation, and place of residence of the parents.
Indexes of the civil certificates include at least the names of the subject/ego, the names of the subject's parents, subject's age, the event date, and the municipality of registration. Besides this standard information, other variables are included depending on the province and type of certificate. In Zeeland also all occupational titles of subjects and parents were included except for the birth certificates. Parents without an occupation are not always explicitly marked as having no occupation. The index of the marriage certificates also contains information on the bride's and groom's place of birth, whereas the death certificate contains both the municipality of residence and the place of death.

Figure 2
Pre-printed forms of death certificates from Nieuwvliet, 1843 Notes: Although the exact wording may change per municipality or province, texts on birth, marriage, and death certificates are standardised to a high degree. Remarks on acknowledgement of children, foundlings, multiple births, name changes, divorce dates, and other less common events are sometimes included as a side note in the margins. Side notes are generally not standardised and difficult to read due to a lack of space.
The process of matching the civil certificates of birth, marriage and death in LINKS is based on the names of subject, spouses, and parents on the civil certificates as well as time-constraints based on the dates of the events, age at marriage, and age at death (Mandemakers & Laan, 2017). An overview of the matching procedure is shown in Figure 3. The number of possible matches was limited by using the time range in which each vital event might have occurred, dramatically reducing the number of potential mismatches. Possible matches were then identified by the names of ego, father, mother, and/ or spouse on the certificates whilst controlling for small errors in spelling of names in the source or during data entry, using a Levenshtein algorithm (Mandemakers & Laan, 2017;Wagner & Fisher, 1974). With the LINKS algorithm, record matching was done in a conservative manner, limiting the number of false positives as indicated by the number of non-unique matches, which indicate 'overmatching'. As such, the procedure focused on retrieving correct matches, rather than retrieving every possible one. Combined with time-constraints, 85% of all established matches between birth, marriage, and death certificates and about 75% of all established matches between parents and children were a result of exact string comparisons. The remaining matches -15% and 25% -were retrieved using the Levenshtein algorithm with a maximum distance of 1 or 2, depending on the length of the string, for each part of the names. In LINKS-Zeeland, certificates were only matched within the province of Zeeland, so that at least part of the certificates of individuals who migrated to Zeeland or -more common -out-migrated from Zeeland to another province in the Netherlands or abroad could not be matched. Nevertheless, a sizable part of the certificates could be matched to another certificate, as shown in Table 2.  Mandemakers & Laan, 2017).

MATCHING PROCEDURE
The share of matched certificates depends on the type of certificates involved. In Table 2, we show the matching rate between certificates for historical cohorts. Birth and marriage certificates had the highest matching rate, both within (own birth to marriage) as well as between generations (birth of a child to marriage parents). Within generations, birth certificates were matched to a marriage certificate in 85% of all cases, so that marriage certificates were available for 32% of all newborns. Birth and marriage certificates were matched to the marriage certificate of the parents in 83% and 79% of all cases. However, matching birth to death certificates is more difficult as death certificates are available for only 71% of all cases. In part, the relatively low matching rate between birth and death certificates was caused by out-migration. Matching death with birth certificates was more successful in terms of matched death and birth records. 80% of all Zeeland death certificates could be retraced to a birth certificate. The difference between these two indicators shows that more than 10% of matches were probably missed due to out-migration, as a sizable share of the population left the province in the 19th century (Priester, 1998;van den Berg et al., 2020). Missing death certificates due to migration more strongly affected those who died at older ages than those who died early in life, as young children were less likely to leave the province (van den Berg et al., 2020). 6 66% of all marriage certificates were matched to a death certificate and 60% of all death certificates were matched to the marriage certificate of the parents, which is an indication that matching was more problematic for individuals who lived longer -and may have had more opportunities to leave the province -than for individuals who died as children. 7   6 Young children could migrate with their parents. However, married couples generally migrated upon marriage and migration was much less common when married couples had small children (Kok, 1997). Therefore, information on child mortality is relatively complete, but becomes less reliable for older children. 7 Censored observations mainly occur between ages 15 and 50, as persons may migrate out of the region of observation. After age 50, migration is low, so that the number of censored cases is low (van den Berg et al., 2020). Table 2 Percentage of successful matches between certificates, LINKS-Zeeland

Match (A-B) Match (B-A) Cohort
Intragenerational matches  (2017) used the matched certificates in LINKS database to make life course and family reconstructions. Persons are defined based on the intragenerational matches between birth, marriage, and death certificates. A person identifier is assigned to each person in a cluster of matched certificates, so that records belonging to the same individual have common person identifiers. As parents also received common person identifiers, individual lives can be ordered into families over different generations. However, this procedure is not as straightforward as it might appear. Panel C shows how a family reconstitution, that is, the reconstruction of life courses and families, works with limited information. Compared with Panel B four relations are missing: the death certificate of child 1 is not matched to any other certificate, the death certificate of child 2 matches only with the parental marriage, and the death certificate of child 3 only matches with his own marriage certificate. When certificates can still be related to another certificate of the same person, this has no effect on the

FROM MATCHED CERTIFICATES TO INDIVIDUALS AND FAMILIES
quality of the reconstructed life course or family. The death certificate of child 3 is matched through his own marriage certificate, which matches with his own birth and parental marriage certificate. As a result, his life course is not censored, nor are observations on child 3 scattered. However, life course and family reconstructions are more problematic when a certificate cannot be related to another certificate of the same person. The death certificate of child 2 was not matched to her birth certificate. As a consequence, her life course is censored and another person with only a known age of death is created.
Child 2 now appears twice in the reconstructed family, so that one might think that the family consists of 4 rather than 3 children. In LINKS-gen we restructure the data to deal with this problem. Finally, the death certificate of child 1 can neither be matched to his own marriage or birth certificate, nor to his parents' marriage certificate. Therefore, the life course of the child 1 is censored. Moreover, a new family is created that exists of one child with known mortality information, unknown parents, and no birth information. This loose certificate can only be excluded from analyses by making selections on the data.

Figure 4
Family relations in a pedigree format and in LINKS The output from the life course and family reconstructions in LINKS-Zeeland 2017.01 is produced from the data in LINKS database. All established matches between certificates are stored with extra variables that indicate the quality of the match, for example, the number of person names on which a match is based. Currently, the release consists of three tables: BIRTH_DEATH, MARRIAGE_LINES, and PERSONS. BIRTH_DEATH contains the established matches between birth and death certificates as well as matches to the marriage certificates of the parents. A record consists of either matched births and deaths, or unlinked birth and death certificates. The table MARRIAGE_LINES contains matches to the bride's and groom's certificates in BIRTH_DEATH as well as matches to their parents' marriage certificates. Multiple occurrences of the same person were assigned the same identifier, so that each record in the two tables refers to a unique 'person', who has been assigned its own unique identifier and is listed in the table PERSONS.
At this point, the information that is delivered to a user of LINKS-Zeeland needs to undergo several transformations to be prepared for data analysis. Selections on the data, post-hoc matching, and knowledge of the data are required before the reconstructed life courses and families can be analysed (van den Berg et al., 2020). Information in LINKS-gen is structured according to the pedigree format (Lange et al., 2013;Purcell et al., 2007) of which an example is shown in Table 3. In a pedigree file, each row contains information on a person including relations to their father, mother, and pedigree, meaning all related individuals. This information makes it very easy to study inheritance patterns, familial clustering, and intergenerational transmission of fertility, mortality, social status, etc. In earlier steps, relations between parents and children were defined. However, the data on individual characteristics needs to be restructured to make the database ready for analysis. There are several issues that need to be solved: the information on life courses and relatives is divided over several tables; incomplete birth data information needs to be supplemented with estimations from marriage/ death dates and ages at marriage/death; missed matches within families need to be restored; and variables need to be constructed that flag from which certificates information is derived. The steps that we present below provide the user with LINKS-gen.
Birth dates were entered as available in the database. But, when persons had no matched birth certificate, the first and latest possible birthdate were estimated using the age at marriage and/or the age at death. In this case, the birthdate was logically deduced by calculating the mean of the estimated birth range. If both an estimation from marriage and death certificates are available, information from both certificates is used to narrow down the period in which a person was born. To the birth information, we added marital information. Since persons may marry more than one time, we reserved five series of fields for marriage information, which is the maximum number of marriages in the dataset, and also constructed a variable on the number of marriages by counting the number of marriages certificates., Thereupon, marriages were put in chronological order and the dates of marriage were included.
Death certificates also include newborns who were dead upon registration and for whom no separate birth certificates were registered. These children were flagged as 'dead before registration'. In the current database, we use the date of death as the date of birth, as these were not registered separately.
The logically deduced date of birth is a reasonable estimate for those children which died before they could be administered a birth certificate. However, the correct birthdate may differ up to three days from the registered death date for children who were born over the weekend. This problem is exacerbated for children who were born before a holiday as the difference between their birth date and registration date increased, increasing the likelihood that they were not stillborn but died before they were registered with the municipality, due to the delay in registration. Earlier work has estimated that most of the Dutch children who were dead upon registration were stillborn, but that about one in three died before their registration with the municipality ( van Poppel, 2018; van Poppel, Bijwaard, Ekamper, & Mandemakers, 2012).
Information originating from certificates of marriage or death could not always be matched to a birth certificate. The person could have been born in another province, could have been born before the start of the birth registration, or after the period within which birth certificates become available. However, in specific cases there were also other reasons for missed matches. In over 10,000 cases, parental names and time ranges matched, but not the names of individuals. Matches were missed for twin siblings with too similar names -for example, 'Gerritje' and 'Gerrit' or 'Johan' and 'Joran'. The same problem can occur for children that received the same first name as an earlier deceased sibling.
In both cases, observations on the same individual were falsely recorded as observations on different individuals within the same family. To solve this problem, we applied a post-hoc matching procedure. Within families, we added unmatched death certificates to birth and marriage certificates of children without a death certificate, provided that the estimated birth date overlapped with the actual date of the birth certificate, and that an unmatched death certificates was relatable to only one child in the family. This provided us with 9,580 (2.9%) extra matched death certificates out of 331,298 unmatched death certificates in the case of LINKS-Zeeland.
Finally, we constructed variables on whether individuals had a twin sibling and on the date of last observation. In case no death certificate was matched, dates of marriage and childrens dates of birth were used to indicate the last date at which an individual was observed alive. For women, when applicable, the date of last childbirth was used as the last observation date. For men, we used the date of last childbirth minus nine months, as not all fathers may have lived until childbirth. In the data set we report the date and age of last observation as well as whether the observation was a birth, first marriage, second marriage, the birth/conception of a child, or death.
The restructured and recoded family reconstitutions result in LINKS-gen, which structures the life course and family reconstructions in a pedigree format (Lange et al., 2013;Purcell et al., 2007). In LINKS-gen multiple generations of intertwined families are recorded by using the matches between the individual birth, marriage, and death certificates to the marriage certificates of one's parents. Each row contains information on an individual with a unique identifier: Id_person. This person can be directly related to his father, mother, and spouse(s) through the corresponding identifiers: Id_father, Id_mother and Id_partner_1 to _5. Up to five marriage partners are available, as no individual had more than 5 known partners. Relations to siblings or children are identified indirectly. Sibling sets can be reconstructed by selecting individuals with a common father and mother to retrieve full siblings or with only one common father or mother to retrieve half siblings. Children are related to their parents with the identifiers: Id_father and Id_mother. It should be noted that these sibship and child sets can be quite large as Zeeland was a high fertility region. As a result, families with over 15 children are not unheard of. Table 4 and Figure 5 show which information is available. For each person, we indicate the sex as available on the original certificate, and constructed variables on whether or not a person had a twin sibling, the date of and age at last observation, and the total number of known marriages. We also flag whether birth, marriage, and death certificates were available for the subject, his/her father, and his/her mother. Birth information includes the birthdate, municipality of birth, and age of the parents at birth. If no birth certificate was known, the value of B_date, B_min_date, and B_max_date was logically deduced as explained in section 5. Death information includes age at death, date of death, municipality of death, and flags on whether a newborn died before he could be administered a birth certificate. Marital information shows the number of marriages as well as the age at marriage, municipality of marriage, and the identifier of the spouse for each marriage.
LINKS-gen does not contain explicit pedigree identifiers. As we use general population data, many men and women appear in more than one pedigree. Standard practice dictates that these pedigrees ought to be merged together (see e.g. Sinnwell, Therneau, & Schaid, 2014). However, such a strategy is not advisable, because LINKS would fall apart in a few, large and non-informative, pedigrees. Rather, pedigrees should be based on the selection of families with certain characteristics. Thereupon, a random sample of families ought to be drawn to prevent overrepresentation of large pedigrees or individuals who reproduced with multiple spouses (see e.g. van den Berg et al., 2019). The current format allows for efficient sampling of individuals and families, so that users can select cases with relevant information and study tailor-made pedigrees.
Occupational information is stored in a separate file, the contents of which are shown in Table 5 and Figure 6. In the civil registry, occupations were only recorded in concordance with vital events. In the case of LINKS-Zeeland, occupational information was generally available at an individual's own marriage and death certificate. Marriage certificates seem to be the best source for female occupations (Boter & Woltjer, 2020;Walhout & van Poppel, 2003). Furthermore, occupational titles of the parents are also available on the marriage and death certificates of children. On birth certificates the parental occupations are only available for the island of Walcheren. The occupational database gathers all these occupations and lists them by the combination of Id_person and Date. The file further lists the calculated age at observation, as well as the certificate containing the occupational information. Occupational information in enriched by recoding occupations into HISCO (van Leeuwen, Maas & Miles, 2002;, HISCLASS (van Leeuwen & Maas, 2011), HISCAM (Lambert, Zijdeman, van Leeuwen, Maas, & Prandy, 2013), and SOCPO (Van De Putte & Miles, 2005) by way of the HSN-HISCO occupational title release . Further, the occupational codings are converted from HISCO into OCC1950 (U.S. Bureau of the Census, 1950) using the HISCO-OCC1950 conversion table (Mourits, 2017), and converted from OCC1950 into Nam-Powers-Boss (Nam & Boyd, 2004;Nam & Powers, 1983), OCCSCORE (IPUMS, 2017b), PRESGL (Siegel, 1971), and SEI (Duncan, 1961) using conversion tables from IPUMS (2017). The latter systems are useful for comparisons with US data. OCC1950 and HISCLASS are almost identical (ρ: 0.97), whereas the correlation between HISCAM and Nam-Power-Boss, OCCSCORE, PRESGL, or SEI is moderate to high (ρ: ≈0.65) (Mourits, 2019, p. 83).

Figure 5
Example of the LINKS-gen pedigree table  , OCC1950 codings from Mourits (2017) and OCCSCORE, PRESGL, SEI, and NPBOSS from IPUMS (2017).  (van den Berg et al., 2020). Therefore, the database is only suited for research if certain selections on the data are made to prevent biased descriptive or inferential statistics, such as Cox proportional hazard models, ordinary least squares regressions, and maximum likelihood logistic regressions (Alter, Devos, & Kvatko, 2009;Gill, 1997). LINKS-gen has included flags that indicate whether birth, marriage, and death certificates were available for an individual or his/her parents.
LINKS-Zeeland can only produce descriptive statistics for specific subgroups of the population. Unlike census or population register data, the civil registry does not indicate how many people were living somewhere at any point in time. Individuals were only included in the civil registry when they experienced a vital event, so that observations on places of residence are scattered and migration is only measured when individuals had children or died in another town. As a result, it is next to impossible to reliably estimate populations at risk, even when the full civil registry for the Netherlands is available. However, such estimates can be given for subgroups for which the population at risk is known or can logically be deduced. For example, life expectancies cannot be calculated, as we do not know the population at risk between ages 0-100. However, mortality before age 5 and after age 50 can be studied, as parents with small children and older individuals were unlikely to migrate (Kok, 1997). Similarly, fertility rates cannot be estimated for the entire population, but many related statistics can be calculated for parental couples that married and died in Zeeland, as these likely resided in the same province for their entire life.
Flags in LINKS-gen indicate the quality of reconstructed life courses. For research on fertility behaviour, migration histories, and other individual characteristics full observation of individuals is necessary, as censored observations produce informed missings. The best indicator for a full observation is the flagged availability of a death certificate, as it indicates that an individual died in Zeeland, which makes it more likely that he also experienced other vital events in the region. However, some return migration is observed at higher ages (van den Berg et al., 2020). Furthermore, the availability of a death certificate does not mean that information is available on children. For women, information on their children is nearly always available, as a mother's name automatically appears on a birth certificate. However, the father's name was only known when he was married to the mother or not married to another women. Therefore, a flagged marriage certificate needs to be available to study male fertility behaviour. In other words, researchers should check the B, M, and D flags to check whether the reconstructed life courses are not censored.
Similarly, flags can also be used to indicate the quality of family reconstructions. For family reconstructions, the marriage certificate of the parents is the most important certificate. Children are matched to their parents' marriage certificate. For an individual, the availability of a parental marriage certificate signals that information on the preceding generations and parental life course might be available. Provided at least one parental death certificate is flagged, intergenerational effects of parental fertility behaviour, migration, or other parental characteristics on children can be studied. Furthermore, the availability 7 USING THE DATABASE of a parental marriage and death certificate usually indicates that the reconstruction of a sibship is of sound quality (van den Berg et al., 2020). This means that inferences on parental birth intervals and the total number of siblings are likely based on complete observations. Moreover, if at least one sibling has a flagged death certificate life course information for this sibling will be relatively complete as well, allowing for the study of familial clustering of events such as mortality and fertility as well as socioeconomic status. Hence, the flags M_parents, D_father, and D_mother indicate whether family reconstructions are uncensored, while D flags indicate whether a child's life course can be used to study of intergenerational effects and sibling similarity.
We further advice users of LINKS-gen to consider the window of observation carefully. For the entire country, birth, marriage, and death certificates are of sound quality from 1812. More recent certificates are protected by privacy laws, in the current dataset from the year 1912, 1937, and 1962 for births, marriages and deaths, respectively. Between 1796 and 1811, certificates are also available for regions within Zeeland and Limburg, but the quality of the administration appears to be less. Furthermore, the civil administration started in 1811, but seemed to be of better quality from 1812 onwards. To give a few examples of how this influences research on LINKS-Zeeland: families can only be reconstructed if they were started after 1811, as otherwise one or more siblings may be missing. Infant mortality can only be studied between 1812 and 1912, as birth certificates are no longer available after 1912. Two-generation reconstructions of life courses are only available for the 1812-1862 marital cohorts, as their children are born between 1812 and 1892, and their grandchildren between approximately 1830 and 1912, after which birth certificates are no longer available. The Lexis diagram in Figure 7 gives a systematic overview of the availability of certificates over time.

Figure 7 Lexis diagram of available information in LINKS-gen
To clear the way for social research on historical life courses, we presented a way to convert LINKS into one comprehensible dataset that can readily be analysed. The script that transforms LINKS-Zeeland into LINKS-gen is published together with this article. Information from birth, marriage, and death certificates is ordered into one logical data frame. This structure will be kept in place for future releases of LINKS, and is designed to simplify and standardise the use of the database. The restructured dataset

CONCLUSION
provides users with three advantages. First, users can more efficiently invest their time in understanding and operationalizing the dataset. Second, researchers can use a standardised version of LINKS, which increases the comparability and replicability of future research. Finally, users can now easily select 'good cases' for their research, as we flag cases that are likely to be relatively complete. Instructions for the selections for different types of demographic research can be found in this paper. Together, these factors make LINKS-gen attractive to researchers who wish to engage in social, economic and demographic historical research on the 19th-century Netherlands.
The LINKS database contains an enormous amount of detailed information on life courses and family relations from 19th century the Netherlands. It gives a rare insight into family relations in the 19th century, as historical databases on full populations of provinces or states are only available for a limited number of places in the world, such as Québec and Utah (Song & Campbell, 2017). In the upcoming years, LINKS will be developed further until it spans all provinces of the Netherlands. The resulting database will be an impetus to future historical, demographic, social, and biomedical research, as it provides new opportunities for research on kin networks, intergenerational similarities in demographic behaviour, and spatial disparities (see e.g. Knigge, 2016;. Another strong benefit of a digitised civil registry is that it can be matched with other sources and serve as a basis for further historical reconstructions. Currently, an infrastructure is being developed by the CLARIAH project to reconstruct the socioeconomic context of the 19th century, both on the individual and contextual level (Hoekstra et al., 2018). LINKS serves as the backbone for this project, as it clearly structures the (partial) life courses of everyone who lived in 19th century the Netherlands. Demographic information from LINKS-Netherlands can be matched with other sources, such as inheritance tax registers (Peeters, de Vicq de Cumptich, & Gelderblom, 2019) to get a grasp of wealth and possessions, or causes of death registers (Janssens, 2019) to model changes in the disease environment to study human stature. Moreover, once information from the population registers becomes structurally available, information on coresidency and migration can be added. Hence, information on individual wealth, mortality, stature, cohabitation, and migration behaviour will be contextualised into intergenerational, familial networks.
LINKS is available to researchers via the IISG. Please contact Kees Mandemakers (kma@iisg.nl), Auke Rijpma (a.rijpma@uu.nl) or Richard Zijdeman (r.zijdeman@iisg.nl) for access to the database, or with questions or feedback regarding LINKS-Zeeland, LINKS-gen, or the LINKS project in general.