The Groningen Integral History Cohort Database. Development, Design and Output

e-ISSN: 2352-6343 DOI article: https://doi.org/10.51964/hlcs12033 © 2022, Paping, Sevdalakis This open-access work is licensed under a Creative Commons Attribution 4.0 International License, which permits use, reproduction & distribution in any medium for non-commercial purposes, provided the original author(s) and source are given credit. See http://creativecommons.org/licenses/.


INTRODUCTION
The Groningen Integral History Cohort Database (GIHCD) consists of a regional sample of more than 5,000 life courses of persons born between 1811 and 1872 in the northern Dutch province of Groningen. This province consists of a large city (Groningen) and its surrounding countryside with numerous smaller and larger settlements organised in about 60 municipalities in the 19th century. Most of the individuals involved were followed during their life throughout the Netherlands, and largely also abroad until well into the first half of the 20th century. The database's main strength is its excellent quality, as a very high percentage of the included persons could indeed be followed throughout their entire life course.
This article first presents an overview of the history of the Integral History Project out of which the GIHCD developed from 1987 onwards. Secondly, we describe the rather unsystematic way in which the database was constructed over the years, partly in response to changing research questions. In the next section, we discuss what the database looks like at the moment, presenting its content and structure. In the last part, we will present an overview of the publications that made use of the data stored in the database. In general, the article also provides an illustration of how technological developments over more than thirty years, in addition to a continuous lack of time and money, can shape a database like GIHCD into its present form. 1 In 1987, researchers from both the University of Groningen (Pim Kooij and Marten Buist) and the University of Utrecht (Gerard Trienekens and Theo van Tijn), with support from the organisation for historical research Stichting voor Historisch Onderzoek (SHO), one of the predecessors of the government-funded Dutch organisation for scientific research NWO, launched the Integral History Project (Kooij, 1993a, p. 2). The primary goal of the project that led to the creation of the GIHCD, was to write an integral history of different regions in the north and the south of the Netherlands. Through this approach the initiators sought to solve the ongoing fragmentation of historical research by operationalising the concept of "quality of life" (Kooij & Sleebe, 1991;Trienekens, 1987Trienekens, , 1993. In short, the theoretical idea was to compare the aims of ordinary people regarding various aspects of their lives with their perception of what was happening in "reality", thus enabling historians to research the quality of life of people in the past. According to this reasoning, major differences between ambitions and perceptions of people will have led to social tensions that would clearly show up in the sources. As general developments -for instance technological and economic progress, modernisation in other respects and democratisation -influenced these aims and perceptions, this would also result in opportunities to connect data on micro and macro level. Although at first sight events were not the core of the Integral History Project, the main way to trace indications of changes in ambitions and perceptions was by doing extensive and detailed archival research on specific individuals and communities representing different societies. Four research pillars were defined, that had to result in a similar number of large databases: 1. Cohort analysis: research on the life course of the inhabitants of selected municipalities; 2. Structural analysis: research based on snapshots of the household structure of the inhabitants in these selected municipalities taken every 20 years; 3. Financial analysis: research on the developments in the provincial and municipal expenditures using the annual accounts (Duijvendak & Blijham, 1994) to study the political priorities of the local and provincial elites; 4. Opinion analysis: research based on the local newspapers to find indications of tensions in society. For this article, we concentrate on pillar 1 and to some extent on pillar 2, because they at least have had some long-lasting research output.
The Groningen clay region was largely defined as those municipalities whose surface area was covered by clay for more than 50% (Paping, 1995, pp. 18-21). The selected region covers more than half the province including 36 out of the 57 19th-century municipalities (see figure 1). The nine municipalities were chosen in such a way to ensure an even geographical spread over the region. They were largely rural, though also deliberately comprised of the small old town of Appingedam with its adjoining rural villages and the somewhat larger semi-urbanised settlement of Winschoten. However, for the sake of convenience, we will call these nine municipalities rural in opposition to the large city of Groningen.
The Dutch scientific fund SHO, in addition to financing PhD's, provided the Integral History Project in 1987 with some funding to develop the four databases. In the Groningen part of the project, this money was used partly to finance a fulltime research assistant, who constructed a digital database of government accounts (pillar 3). However, the amount of funds supplied by SHO were by no means sufficient to finance the construction of the four large historical databases, just mentioned. The University of Groningen also funded additional PhD's and student-assistants to participate in the project.

Figure 1
The municipalities of the clay region in the Groningen province Source: Paping (1995). Explanation: The so-called 'clay area' is the area surrounded by the thick line and the named municipalities. The municipalities in yellow and blue are the ones that are included in the GHICD database.
The aim of the structural analysis was to take snapshots of the household situation every 20 years from 1815 onwards as benchmarks to study changes over time. The transcription and summarising of the population registers of the nine selected rural municipalities was done by several persons working on PhD-projects, some supported by student-assistants. Originally, a similar endeavour was planned for the city of Groningen for the period before 1870, as the years 1870-1910 had already been covered in the thesis of Kooij (1987). However, the urban part of the project was never completed.
The structural analysis was confronted with three major problems. Firstly, there was a lack of sources. The intended timespan of the project was 1770-1920; however, population registers or micro census-data were missing for the older period. Therefore, it was decided to restrict the data collection to the census years 1815, 1829/1830, 1850, 1870, 1890 and 1910, reconstructing the household situation around January 1st of those years. Unfortunately, for most of the selected municipalities the (census) registers were missing for 1815 and 1830. From 1850 onwards, nationwide introduced dynamic population registers were used, except two municipalities for which these registers were missing for the first decade. Secondly, it often proved difficult to reconstruct the exact household situation in 1870, 1890 and 1910 as in the countryside many of these dynamic population registers cover periods of twenty years or more, for instance from 1860 to 1900, while the original ten-yearly census forms had been destroyed. This made it very difficult to ascertain which persons (including live-in servants) lived within the households at the measure points from 1870 onwards. Thirdly, the reliance on only a few PhD-candidates -due to the lack of funding -meant that the years 1870, 1890 and 1910 were only done to a limited extent. Consequently, only for 1830 and 1850 a digital database on household structures is available for a substantial number of municipalities (see table 1).
As laptops and the like were still rare at the end of the 1980s, all households were registered on a paper form. Standardly, all data of the head of the household and eventual partners were transcribed including the names. For the other members of the household some data were transcribed as well, though in much less detail than available in the registers in an attempt to reduce the time needed to collect the data. After filling out all the forms, the information was translated into numbers using an extensive codebook (with numbers indicating, for instance, household types, occupations, and so on). These numbers were digitalised on tape and analysed by way of SPSS using a mainframe computer. Around 1995, a LOTUS 1-2-3 spreadsheet database was created (later on converted into EXCEL) with a household on every row. The numerical codes were reconverted into the corresponding meaning in text. Next, some of the not yet digitalised information from the still preserved original forms was added to the database. An important feature of this dataset is that for more than half of the municipalities in 1830 and 1850 information on the amount paid in the local taxes could be added (the so-called 'hoofdelijke omslag', a tax based partly on income and partly on wealth: Kooij, 1993b, pp. 145-155;Paping, 2010;Voerman, 2001, pp. 267-272).

THE STRUCTURAL ANALYSIS DATABASE
The structural analysis databases were used for several publications: Voerman (2001) made an in-depth analysis for Winschoten in combination with extra data on the mixed rural-urban municipalities of Hoogezand and Veendam in the eastern Groningen peat districts neighbouring the Groningen clay soil region. Paping and Collenteur (1998) used it for the long-term development of the occupational structure of household heads until 1910. Paping (2010) researched the relation between occupation, social position and tax performance within households in the period 1830-1850; and lastly Paping (2018) analysed the household structure using the same data.
This article focuses in particular on the most prominent database constructed within the Groningen Integral History Project: the cohort database with individual life courses from ten municipalities (including the city of Groningen, see figure 1), selecting 120 births from cohorts starting with a 20-yearly interval (1811, 1830, 1850, 1870 and 1910). Although the original aim included the birth cohorts of 1770 and 1790, it soon became clear that this was not feasible. For the three villages later constituting the municipality of Hoogkerk, the first 120 baptisms for the cohort of 1770 and 1790 were collected by Vincent Sleebe (1993). However, it proved extremely difficult to find any other information about these persons. In Groningen, the large majority of the families did not have a surname before 1811, making it difficult to trace persons outside the village without extensive genealogical research. Initially, it was hoped that existing registrations of protestant church members (lidmaten) would indicate migrations, but these registers proved scarce or incomplete and covered only a minority of the population. This was even more detrimental, as the naïve initial research assumption that migration was very rare in the countryside proved to be completely beyond reality. How to deal with migrating cohort members became a persistent issue in the project as we will show later on.
The project originally planned a series of books on all 10 municipalities involved starting with Hoogkerk, a tiny municipality right next to the city of Groningen. This characterised the optimistic atmosphere in the first years of the project: ultimately only the book on Hoogkerk (Kooij, 1993) was published. The contribution to this book that makes extensive use of the GIHCD clearly shows the huge problems with which the construction of the cohort analysis was confronted (Clement, 1993). Birth cohorts were constructed for 1811, 1830, 1850 and 1870. As the population of Hoogkerk was very limited, it sometimes took even six years to achieve a cohort of 120 consecutive births. The cohort of 1811 started in August, with the beginning of the official Dutch civil registration. The birth certificates offered much more information (occupations and ages of people involved) than the baptism registers which had to be used before August 1811. At first, children were only traced within the municipality of birth and in the city of Groningen. As most of the Hoogkerk children disappeared without a trace, this search was expanded to neighbouring rural municipalities like Aduard, where indeed a few of the Hoogkerk cohort members were found.
In the 1980s, it was still difficult to track down a specific person in the civil registration. Only 10-yearly indices of births, deaths and marriages existed for each municipality, supplying merely the registration date of the certificates. As it was not uncommon to have several people with the same name, all hits had to be checked one by one. Fortunately, this could be done at the provincial archive, where an increasing amount of films could be found containing the civil registration of the provincial municipalities before 1900. A notable exception were the records of the city of Groningen, which were kept in the city archive. In 2002 the two archives fused (Groninger Archieven) and the work could be concentrated in one building.
Originally, the core of the database was formed by standardised paper forms for each Research Person (RP). These forms had room for information about: 1. The parents (occupation, birth date, birth place, ability to sign birth or marriage certificate); 2. Marriage (date and location) and characteristics of the partner (occupation, birth date, birth place, ability to sign marriage certificate); 3. Children (birth date, occupation and moment they left the parental household, including stillbirths); 4. The religion, the household situation and exact place where the RP lived according to the census (1815, 1830 and 1840) or the population register (from 1850 onwards); 5. Migration dates and destinations. Extra forms could be added when necessary. Consequently, the original database contains information on three generations, not only on the RP, but also on his or her parents, on marriage partners and on the place and date of birth of children.
The advantage of the paper system was its flexibility, as it was easy to include interesting additional information found in archives. The disadvantages were, firstly, that it was often unclear what the source of the transcribed information was and, secondly, that forms were being filled in very inconsistently. The form gave a lot of decision power to the individual researcher about which information to be included. Consequently, a lot of data was not transcribed, making it often necessary to go back to the original sources and replenish the data

Richard Paping & Dinos Sevdalakis
on the forms. Most of the forms were filled in by student assistants and research assistants, though also sometimes by history students as part of a paper they wrote during a research course under the supervision of project leader Pim Kooij. Kooij himself did most of the cohorts of the city of Groningen. The handwriting, and also the colour of the pen, pencil or marker provide indications on who originally transcribed the data. Over the course of time, many different researchers worked on one individual RP.
Before 1995, the tracing process in the civil registration proved extremely difficult and time consuming.
Especially the high number of persons in the province of Groningen sharing the same name made it timeconsuming to establish if a certificate belonged to the RP in question or not, without reading the certificate itself. In 1993/1994 all funds of the Integral History Project had been used. All life courses from the cohorts from 1811 until 1870 had been constructed, except migration routes outside the birthplace. A few studentassistants -financed by the Groningen Faculty of Economics -had tried to improve this; however, it was too much work. In this phase, RPs were followed in the municipalities in the clay parts of the province, and not only in the city of Groningen. During this period, project leader Kooij created a REFLEX-database with a structured summary of the assembled data of every individual based on the paper forms. In Figure 2, we sketch the technical transformations the database has undergone since 1987.
The Integral History Project received some fresh money from the Dutch research foundation (NWO) to stimulate research cooperation with historians from the former USSR. Since the end of the Communist regime, there was a rising interest in Western research methods among Russian historians, together with a change of focus from political history (mainly the history of communism) to the history of common people. This Dutch-Russian cooperation resulted in several workshops in both Russia and the Netherlands and in two volumes: Where the Twain meet (Kooij, 1998) and Where the Twain Meet Again (Kooij & Paping, 2004). Several Russian contributions applied the Groningen method of cohort analysis (Akolzina et al., 2004;Dyatschkov et al., 1998;Golubeva, 1998;Shustrova & Sinitsyna, 2004;Sinitsyna, 1998). However, source problems (serious gaps in birth, marriage and death records and difficulties with record linkage) were even larger for 19th century Russia, resulting in databases containing only a limited number of individuals with substantial information.
In the meantime, NWO funded in the years 1995-1996 a Dutch-Flemish research project concentrating on family strategies. Paping's part of the project used the Groningen Integral History Cohort Database (GIHCD) to study strategies regarding migration in the countryside. He had to conclude that -despite all the efforts -the cohort analysis database was still not very consistent and had significant gaps. The quality of the 1811 cohort was especially poor, partly due to the difficulties to detect the migration history of the cohort members, as the Dutch population registers keeping systematic track of migration only started in 1850.
As a preparation for the family strategy project, the original REFLEX-database was converted into a LOTUS 1-2-3 spreadsheet, making the data analysis and the input of extra data easier. However, one major weakness of the used program was that it had difficulty accepting dates before 1880. To solve this problem, 100 years were added to all the dates. In the end, dates in the LOTUS database were changed into relative parts of the year -for instance, the 30th April 1840 became 1840.33 -making it easy to calculate time spans. This implied that the exact dates got lost, although these are still available on the original forms. Next, all the data of all RPs from the nine rural municipalities and the cohorts 1830, 1850 and 1870 were checked by Paping, focussing on RPs with an incomplete life course. It turned out that previous investigators had missed a lot of information, which sometimes could even be found at the place of birth. For the still numerous missing persons, it was checked if they had gone to the city of Groningen or if they appeared on American migration lists. Also, many lost cohort members were found by using newly available indices and films of civil certificates and by systematically scrutinising the whole collection of published genealogies in the provincial Groningen Archive. In general, cohort members were followed through the whole province of Groningen at least until 1900, but preferably until after 1920.
Consequently, by 1999 the quality of the 1830, 1850 and 1870 rural cohorts had been greatly improved.
Of the 1830 and 1850 cohort members only 8% and 10% were lost somewhere during their life in the province of Groningen. The situation was worse for the 1870 cohort where about a quarter had not (yet) been traced during their life span; however, only 4% disappeared before 1900 (Paping, 1999, p. 79). These first overviews of the cohorts showed a massive and rising emigration of the cohort members out of the province of Groningen, respectively 11% of the 1830 cohort members, 18% for 1850 and even 26% for 1870. These observed migration shares are even more impressive taking into account the high child and juvenile mortality -as RPs dying early obviously did not have much time to move out of the province -and the percentages of lost persons just mentioned.

Figure 2
Transformations of the GIHCD since its conception In 2002, in line with these interesting migration patterns, the Faculty of Arts of the University of Groningen financed a research project following the migration all over the Netherlands. The research period was extended until 1940 and in first instance extra information on migrations within the province of Groningen was gathered. Subsequently, RPs were followed outside the province, by visiting the municipal archives of main destinations like Amsterdam and writing letters to the smaller archives. This research effort again greatly improved coverage and reliability of the GIHCD, which by then had been converted from LOTUS into EXCEL. However, the extremely time-consuming gathering of data on all migrated individuals, was one of the major causes for this project not to have resulted in an end publication.
Since the millennium change, digitisation of the civil registration has rapidly proceeded. Thanks to the efforts of the Groningen archives, indices with the names, ages and occupations mentioned in all birth, death and marriage records became increasingly available via the website AlleGroningers (see also Mandemakers, Bloothooft, & Laan, forthcoming). Numerous volunteers made the transcripts and improved the Groningen Archives database, adding extra years when allowed by Dutch privacy regulations (public after 100 years for births, 75 years for marriages and 50 years for deaths). A more recent development was the addition of the scans of the original sources. Since the early 2000s, a sophisticated online search engine made it possible to trace individual persons from behind the computer! It became much easier to establish if a certificate really was about the person one was looking for, and this without the time-consuming travel to any archive. One can also search for a combination of names which makes it easy to find specific couples. Also, for other parts of the Netherlands search engines became available to tackle huge databases -especially WieWasWie covering the whole of the Netherlands, though also search engines from separate archives -making it increasingly easier to track down lost cohort members.
In conclusion, digital tools have made it much easier from about 2005 onwards to develop the GIHCD. These digital improvements have been especially useful for improving the quality of the 1811 rural cohorts, which had been largely neglected since 1995, as well as the 1811, 1830, 1850 and 1870 cohorts of the city of Groningen. However, due to lacking research funds the Integral History Project was still unable to do a general update of all cohorts.
Several history students wrote their master or bachelor thesis using the GIHCD: Leendert Klokkenburg (2009) discussed differences between orthodox and non-orthodox Calvinist rural RPs, Bart Hoogenboom (2013) investigated migration from the Groningen countryside to the city of Groningen and Piet-Jan Koning (2019) researched the huge migration from the Groningen countryside to the United States of America (USA). Koning showed that it is possible to trace at least three quarter of the migrating rural RPs born in 1850 and 1870 in America using among others the digitalised US census and passenger list data available online. The relatively small sample size allowed Koning to search for these lost RPs by hand and ultimately led to the retracing of 152 of 219 RPs studied that moved to the US. This seems better than software-based approaches used in recent initiatives to match and validate almost 500 persons of the HSN who migrated to the USA using American censuses (see Paiva, Anguita, & Mandemakers, 2020). So, retracing lost RPs in the USA manually might be a fairly efficient project when conducted on a small scale.
Another stimulus for the GIHCD was its participation in the European Historical Population Sample Network (EHPS-Net) in the period 2011-2016. Jacek Pawlowski was hired in 2014-2015 to put the data of the 1830, 1850 and 1870 RPs in the Intermediate Data Structure (IDS; Alter & Mandemakers, 2014). In this period a general update of the database took place by digitally tracing part of the remaining lost cohort members using newly available search engines.
As has been explained in the previous section, a major part of the Integral History Project focused on sampling microlevel data for the province of Groningen. The GIHCD sampled data on RPs born between 1811 and 1870 based on birth certificates. However, the history, the scope, the data collection and the sampling strategy was much different from that performed by the Historical Sample of the Netherlands (HSN, see Mandemakers, 2000).
The life course data is mainly, but not exclusively, drawn from civil registration documents (birth, death and marriage records) that describe individuals' life events. For the period after 1850 -when the Dutch state required municipalities to systematically keep track of their population -population registers have also been used. Ultimately, of a total of 5,280 RPs involved, information on 3,240 of them, derived from the nine rural municipalities (see figure 1), has been converted into an IDS database (Alter & Mandemakers, 2014). The dynamic nature of the Dutch population registers allowed documenting a number of characteristics of the RPs -like place of residence and occupation -throughout their lives.
As the database does not contain identifiable personal information relating to RPs younger than 100 years and the data of other persons such as children of the RPs are only anonymously processed, there are no impediments regarding privacy. At the moment, researchers can only get the database on request, though it is the aim to make it downloadable in the near future starting with the parts converted into the IDS format. This would allow researchers to use the sample for various studies on demographic and socioeconomic history, as the database includes precise and complete information on the RPs' occupations and migration history as well as some information on their family members. In section 3.2, we will describe in detail the sample design and content of the database and in section 3.3 we focus on the IDS component, which is also the most complete part. When relevant we also reflect on those parts of the database that are still compiled in an EXCEL file. An earlier description of the GIHCD can be found online. 3 This description provides a detailed introduction to the data included in the database.
Besides the GIHCD, a related and partly supplementary database is also available. The Database Roman-Catholics Groningen consists of a family reconstitution of all Roman-Catholics -about 5% of the population -roughly living in the 'GIHCD-part' of the Groningen countryside in the 18th century (Paping, 2009;Paping & Schansker, 2013). This database -the first version of which was constructed for the master thesis of the first author of this article -offers systematic information on the life courses of about 5,000 persons born between 1721 and 1810.
The GIHCD focuses on the large city of Groningen and on nine of the 56 rural municipalities that made up the province of Groningen in the 19th century. The nine municipalities are Zuidhorn, Hoogkerk, Leens, Uithuizen, Bedum, Appingedam, Stedum, Beerta and Winschoten. Together they comprise nine of 36 rural municipalities in the clay parts of the province. The selection method of the 10 research municipalities has already been discussed in section 2.1, where a map is also presented (figure 1). Table 2 shows some characteristics of these municipalities.
Using the 1862 agricultural statistics (Bijdragen, 1870, see Table 2), the share of members of farm labourers' families in the total population of the clay parts of Groningen can be estimated as 41%, while its unweighted average share in the selected nine rural municipalities is only 38%. This figure is a little bit lower than the total average for the clay region, because we have included in our rural sample two relatively urbanised 3 See "Integral History Project Groningen", EHPS-Net: https://ehps-net.eu/databases/integral-historyproject-groningen, assessed on 20 April, 2022.

SAMPLE DESIGN, SOURCES AND CONTENT
municipalities, Appingedam and Winschoten. For members of farmers' families, the shares in total population are 19% and 20% respectively. The remaining 40% of the population consisted mainly of the families of artisans, millers, shopkeepers, innkeepers, merchants, shippers, reverends, schoolmasters and others active in industry and services. This high non-agricultural share denotes the large extent of specialisation of tasks within the regional economy, which was combined with a very market-oriented agriculture. Before the end of the 19th century, large-scale factories were rare in the clay parts of the province of Groningen (Paping, 1999).
The initial sampling of RPs was based on the birth certificates, being available from August 1811. The sample was drawn with intervals of about 20 years, which led to four cohorts of individuals born in 1811, 1830, 1850 and 1870, though some cohorts include some extra years, due to the small size of some of the municipalities involved. For every municipality, the first 120 registered births in each cohort were used. For the large city of Groningen 240 certificates were selected, using the first 20 of every month.
The selected RPs were subsequently traced along specific moments during their life course, such as their wedding, death and birth of their children. The sources which were systematically checked were the birth, marriage and death certificates and the population registers starting in 1850. Other sources have been used to corroborate existing information or for tracking a person who could not be found in the mentioned ones. Examples of these sources include the census lists of 1815, 1829/1830 and 1839/1840, genealogies, military draft records, migration lists and tax records.
Unfortunately, due to the lack of systematic household information on the forms (partly also due to the limited availability of this information in the population registers before 1862), the database does not contain data on the precise household composition over time. It only provides information on the parental or marital relationships between individuals. This allows researchers to analyse families, but not households. However, this might be less of a problem, as the dominant household form in the province of Groningen was the nuclear family, and estimates are available on the date when the RP left the parental home (Paping, 2004b, pp. 278-279;Paping, 2018).
For the cohorts that were drawn from the city of Groningen more linkages are missing (see Appendix A1-4), although recent efforts have improved them to a considerable extent. Tables 3 and 4 provide an overview of how far RPs could be followed during their life course. As the information regarding the urban cohorts has been digitalised only partly, these cohorts have been excluded from tables 3 and 4. The lower result for the 1811 cohort is partly caused by the relatively high number of inaccuracies in the 1811 birth registration, as these were the first official birth records in the new Dutch civil registration.    Table 3 shows that only a few RPs were lost without a trace. Even the cohort of 1811 seems relatively good. So, for all cohorts most of the life course of the RPs is known. The reported huge death rate of juveniles in 1870 can be contributed to a smallpox epidemic, which substantially increased infant mortality, making the choice for this specific sample year rather unfortunate. Both the increasing child mortality and the increase in emigration -not all the marriages abroad have been tracked yet -resulted in a slight decline of the number of known marriages in the database between 1830 and 1870.
As explained in section 2.3, from 1995 onwards the research effort was primarily focused on the rural 1830, 1850 and 1870 cohorts. This effort concentrated on the manual linking of the RPs with their relatives (parents, spouse and children). By doing intensive research in online databases and other sources of information for missing information on parents, marriage, children death and migration, nearly all RPs have been successfully linked with their relatives. Table 3 shows that for the 1811-1870 cohorts, we have no death certificate or emigration data for on average 1.4% of the 4,320 RPs, ranging from 2.1% for the 1811 cohort to 0.6% for the 1870 one. For the 1830-1870 cohorts this is even only 1.2% (table 4). Including also parents, husbands and children the last cohorts contain information on 19,045 individuals.
Recently collected information on the fate of most of the RPs who emigrated to the USA has been added to the EXCEL database, but not yet to the IDS version (especially marriage dates, characteristics of partners and death dates). The USA was a popular destination among lower-income groups in the second half of the 19th century, especially during the agrarian depression in the 1880s and early 1890s. In the future, more information on the life courses abroad will be added to the database. This addition will allow for new avenues of research that have not been explored yet, such as the socioeconomic success or failure of RPs who moved abroad. Some first results were presented by Koning and Paping (2019) showing a relatively huge upward social mobility of these American migrants.
Three cohorts of the rural part of the database (1830, 1850 and 1870) have been converted into the so-called Intermediate Data Structure (IDS; see Alter & Mandemakers, 2014). The IDS has become an increasingly popular format for life course databases (Dribe & Quaranta, 2020;Edvinsson & Engberg, 2020;Jenkinson, Anguita, Paiva, Matsuo, & Matthijs, 2020). Databases in IDS format allow researchers to conduct international and interregional studies with more ease, as it standardises data across databases. The IDS is designed in six tables which key structure can be used in a database management system to connect individuals with each other and with the contexts they are part of on specific moments in time. Until now five out of these six tables have been used for the GIHCD in a Microsoft Access database: INDIVIDUAL,

STRUCTURE OF THE DATASET
INDIV_INDIV, CONTEXT, CONTEXT_CONTEXT and METADATA, excluding the table INDIV_CONTEXT. The information included in the five tables is presented below.
The INDIVIDUAL table includes information on personal attributes (e.g., name, occupation) and events (e.g., birth date, marriage date). The basic structure of the table includes a database identifier (Id_D) and an individual identifier (Id_I) for all included individuals, in our case each RP and the direct relatives that we found and linked to the RPs. Table 5 shows the attributes and events recorded in the GIHCD.
The INDIVIDUAL table counts 90,592 records unevenly belonging to 19,045 unique persons (RPs, parents, spouses and children). Each record consists of a value for a specific type of attributes. Examples of attributes are the RP's first name, the last name and the location of birth. The Type Birth_Location can take the Value 'Hoogkerk', one of the municipalities. The relatives on which the database provides information is limited to the RPs' parents, their eventual children and marriage partner(s). The information available on these relatives is limited but include sex, birth location, birth date and occupation (only for parents and partners). All occupations are provided with a code number from the Historical International Classification of Occupations (HISCO; van Leeuwen, Maas, & Miles, 2002). Occupational information was collected from civil certificates. For example, the occupations of the parents of an RP are collected from the birth certificate and the occupation of the RP's partner is taken from the marriage certificate. Because more databases also draw on the HISCO codes to make claims about the social position of individuals and their parents, the addition of these codes will allow researchers to easily use the IDS release of this database for comparative research (compare Edvinsson & Engberg, 2020;Mandemakers & Kok, 2020;Vézina & Bournival, 2020).
Finally, the INDIVIDUAL table provides a context identifier (Value_Id_C) to connect contextual data to the CONTEXT table. Table 5 outlines some of the aforementioned information that can be found in the INDIVIDUAL table, excluding the time stamps. Three marginal departures from the IDS guidelines stand out. Firstly, in case of the Type "Departure to" we have also filled in the name of the location to which RPs moved next to a context identifier. Secondly, we used the value "IntGron_form" for the field Source (see Table 6). This value stands for the forms that were used in collecting the data. As mentioned in section 2.3, researchers frequently failed to write down all information on the RPs in the formative years of the database. Source specification was one of the fields that researchers sometimes left empty, which resulted in source specifications with the Value "IntGron_form" (3,350 records out of 90,592). Thirdly, in the IDS format, we could not specify the source in case of the Type "End_Observation". Contrary to "Start_Observation", for which we automatically could fill in the birth certificate as the only possible source value, the source specification for the Type "End_Observation" was sometimes problematic since there are different ends of the observable life course. Observations may end when someone passes away (Value: "Death"), when the RP departs out of the register (Value: "Departure") to a location in which she or he is not found again, or when the RP is no longer found in registers after being present at the closing of previous registers (Value: "End source"). So, for the Type "End_Observation" the values for Source are empty. See Table 6 for examples of records in the table INDIVIDUAL.
One major departure from the IDS format is the way in which changes in occupation are sometimes denoted. When time stamps were unavailable, the GIHCD indicates changes in occupations by a new occupation Type ("Occupation1", "Occupation2", "Occupation3", etc.). Here, "Occupation1" denotes the RP's occupation before and at marriage and, wherever applicable, "Occupation2" and higher denote changes in occupations after marriage. This choice was made because several of the paper forms on the basis of which the database is constructed failed to report the dates on which a new occupation was assigned or reported after marriage. For some occupations this exists on the paper forms but the dates have not been digitised yet. Therefore, it was not possible to indicate the timing of occupation changes, but it was required to use the new developed occupation Types. This is something that needs to be adjusted in the future to ensure that the GIHCD data can be combined with data from other IDS databases.
The INDIV_INDIV table records the relationship between the individuals in a database which was in our case restricted to the relations between the RPs and his/her parents, spouse(s) and children. This table uses a second individual identifier (Id_I_2) which links the second individual to the first individual (Id_I_1) in each row. The field Relation contains a value that describes the relationship between both individuals (Alter & Mandemakers, 2014). In Table 7 the structure of the relationships of the first RP of our database is presented, excluding time stamps.  In the CONTEXT table descriptive information of the selected municipalities, the city of Groningen and all the localities appearing in the sources as locations of RPs residence, is stored. The attributes (Type) are limited to the names, longitudinal centroid, latitudinal centroid, and the type of locality (hamlet, village, town, city, or municipality) and a context identifier (Id_C). The spatiotemporal data is drawn from the Dutch Toponyms Spatio-Temporal 1812-2012 database of the HSN (Huijsmans, 2013). Households have not been added, nor families; even though the data from the INDIV_INDIV table allows for some basic family reconstitutions which could include the parents and children of RPs but no other kin, such as RPs' siblings.
The CONTEXT_CONTEXT table embeds the villages, hamlets and towns in the municipalities they are part off. The designation of these localities to their corresponding municipality is also based on the Dutch Toponyms Spatio-Temporal 1812-2012 database. Similar to the INDIV_INDIV table, this table links localities to their municipality by linking two identifiers (Id_C_1 and Id_C_2) and defining the relationship (field Relation) between the context layers. The METADATA table is the fourth version of the IDS (4.01) with the addition of six new occupation Types ("Occupation1"-"Occupation6") which are employed by the GIHCD. When time stamps for all available occupations are traced, these additional Types will be replaced by time stamped occupations.
Finally, time stamps follow the latest IDS guidelines. Date_Type indicates whether the date is an event that is observed on the event itself, like a birth date on a birth certificate, whether it is reported on a later date, or declared at a point for which an attribute is valid or assigned. Estimation indicates whether a date is exact, a middling year between two possible dates, a year, or period with a Start_Year and an End_Year.
Presently, for only 39 out of the 3,240 rural cohort members of 1830, 1850 and 1870 no death date or migration outside of the Netherlands has been found. This means that the fate of slightly more than 1% of them is still missing (see tables 3, 4 and the Appendix). The most recent tracings of these RPs suggest that presumably a large part of them will have left the country or died on sea. This makes the GIHCD of excellent quality, at least when it comes to the coverage of the life courses of the persons involved. The extensive research process in the last decades has shown that in particular those RPs with a diverging life course were difficult to follow, for instance those migrating over larger distances and/or remaining unmarried, born outside marriage and those changing names.
Weak points of the GIHCD are: 1. Part of the information is still not digitalised and available on paper only; 2. Due to the changes in the collecting method, some information has not been systematically recorded, as for instance the changes in the precise household structure of the cohort members; 3. Although there is a lot of new information collected on the urban cohorts (4 * 240 = 960 RPs) and the 1811 rural cohorts (1,080 RPs), this has not yet been systematically included in the digital database; 4. The limited embedding in an organisational structure since the last 25 years, makes the database the sole responsibility of one person.

STRENGTHS AND WEAKNESSES OF THE GIHCD
As previously stated, the impact of the GIHCD has been limited, mainly because it has only been used by economic and social historians of the University of Groningen. Consequently, the studies that have been conducted so far refer to debates in historical demography and economic and social history. We will briefly address them in this section.
The first publication that made use of GIHCD data was the volume on the village of Hoogkerk between 1770 and 1914 (Kooij, 1993). Hoogkerk was selected for a pilot study in which various relevant questions and methods could be tested, especially because of its spectacular transformation from an agrarian village to an industrialised area around 1900. The book ambitiously aimed at integrating the various domains that explained social developments in the village through a lens that combined economic, political, social, cultural, religious and demographic elements. With the help of the GIHCD, the demographic and socioeconomic developments could be disentangled to some extent. Although the RP sample for the four cohorts from Hoogkerk was small (n = 480), it was used by Marcel Clement (1993) to analyse demographic developments in Hoogkerk on a micro level. He studied (child) mortality, nuptiality, migration and social mobility. Due to high levels of child mortality and migration, only a small amount of the RPs delivered substantial data in Hoogkerk. But it showed how mobile 19th-century Dutch people were, as almost every RP left Hoogkerk at least once.
Within the field of historical demography, the interest in individual agency grew in the last decades of the 20th century. How and why families and individuals make choices or were forced to do so in order to improve their socioeconomic position (Engelen, Knotter, Kok, & Paping, 2004). This turn towards agency stimulated the analysis of short-and long-term decisions of households, families and persons, taking into account the wider historical context. However, quantitative data that only present static information on peoples' location or occupation, and do not show how these changed over time, is not suited to easily assess the motives behind the choices. As a possible solution, Paping (1999, pp. 18-19) used the GIHCD to explain how and why different socioeconomic classes made different short-and long-term choices regarding their employment, the employment of their children and their migration patterns. The study showed that married couples often migrated within the first years after marriage. Afterwards their inclination to migrate diminished rapidly, suggesting a rise in local social embedment over time. Furthermore, it was shown using the information on moments of leaving and juvenile occupations in the GIHCD that relatively many lowerincome (unskilled) families opted for short-term strategies, like sending children away as live-in servants, which proved not very beneficial in the long run (Paping, 2004a, p. 188). For lower-income families, the employment of children as servants from the age of about 14 was one of the few options available to avoid the costs of having largely unemployed older children at home. Higher income groups, such as large farmers and the self-employed in industry and services, could afford to keep their children at home. For every social group, those children remaining at home proved to acquire on average a much better position in the long run than those becoming servants. That unskilled labourers responded to real wage increases at the end of the 19th century by increasingly keeping their children at home, suggests that they saw this choice indeed as a positive long-term strategy (see also Paping, 2017). However, changing views on education in the late 19th century might also have played a role (Kooij, 2004, p. 196).
In another publication, Paping (2004b) used part of the cohort database to examine the strategies families employed with regards to the labour of their family members. With the addition of financial microdata from various Nieuw Scheemda farmers' bookkeeping, Paping focused on the group of unskilled farm labourers in Groningen. The financial situation of unskilled worker families proved to be very volatile in the second half of the 19th century. With the exception of the male household heads, the rest of the family membersboth wives and children -heavily depended on casual or seasonal labour. Adolescent sons and daughters usually left the household rather soon to become live-in servants. Information on the dates of marriage and the birth of the first child in the database showed that forced marriages due to pregnancies were common among unskilled worker couples. Such a situation restricted the opportunity for unskilled labourer families to make long-term decisions, as they would be forced to make short-term choices in response to an unplanned pregnancy.
About a decade after the pilot study on Hoogkerk, Kooij (2004) repeated some of the principles of the Integral History Project in a theoretical essay. One of these principles, the necessity to integrate the various domains that influenced peoples' lives had not changed. Kooij (2004, pp. 194) distinguished the economic, demographic, social, cultural, political and religious domains as central in understanding the pressures on individual and household decisions.

OUTPUT
To analyse the relationship between two of these domains, the cultural/religious domain and the socioeconomic domain, Collenteur and Paping (2004) used rural cohort data from the GIHCD and tried to disentangle the connection between cultural and economic effects on peoples' decision to marry in a comparative contribution. They compared differences in marriage patterns between three Russian regions and two Dutch regions. The Russian regions consisted of the geographically farthest to the east, Tambov (located about 410 kilometers southwest of Moscow), the farthest to the west, Olonets (located about 180 kilometers northwest of St. Petersburg) and Yaroslavl (located about 250 kilometers northwest of Moscow). The Dutch regions were Groningen and North-Brabant. The data showed that, on average, rural Russians married at an earlier age compared to people from the Groningen urban and rural regions, but that differences between the Russian regions were huge. Furthermore, men and women in Groningen showed large variations in marriage age, as it was not uncommon to marry around the age of 20, although it was not rare for couples to wait until their 30s. Collenteur and Paping (2004) concluded that marriage patterns were likely driven largely by socioeconomic factors in Groningen, and RPs had a lot of agency with respect to their marriage age, at least in the absence of unplanned pregnancies. This was absolutely not the case in the Tambov region, the farthest east of the Russian regions studied. In Tambov, young adults usually married before their 20s with very small variation in ages at marriage, indicating an important role for cultural factors like traditions and norms influencing marriage decisions, leaving very limited room for individual agency. Although the differences between the Russian regions studied in this research proved to be large, the data suggest that the further east someone went, the lower the average age of marriage and the smaller the variation in age at marriage would be. In this way, Collenteur and Paping's research provide some support for Hajnal's (1983) hypothesis on marriage patterns and the so-called Hajnal line.
Kooij (2011) used the 1830 and 1870 cohorts of Beerta and Winschoten to analyse similarities and differences between a more urban-oriented region (Winschoten) and a completely rural region (Beerta) in the province of Groningen. He shortly compared various social groups on the basis of their life chances and mobility, and did not find major differences in life expectancy and geographic mobility between urban and rural environments. However, he concluded that for the 1870 cohort long distance-migration became more common, both within the Netherlands and outside of it. Furthermore, he showed that the agrarian depression of the late-19th century gave rise to chain migration towards the USA.
Finally, Paping and Pawlowski (2018) compared intergenerational occupational social mobility of rural-tourban migrants with other groups, such as rural stayers, in the Groningen province. Their study used two databases. First, they employed the huge AlleGroningers database with summaries of all 234,000 marriages concluded in the province of Groningen between 1811 and 1934, out of which those of persons born in the Groningen clay area (see figure 1) were selected (N = 121,000). Second, the much smaller GIHCD was used, which allowed studying the relation between social mobility and rural-urban migration much more in depth, taking into account migration to more distant Dutch cities, as well as for instance migration taking place after marriage.
Rural males moving to an urban environment showed both much more upward and downward social mobility, compared to rural stayers or rural to rural migrants, who showed high social immobility. Furthermore, the findings reveal that urban pull factors were primarily responsible for the rural to urban migration, especially the better employment opportunities in the cities. This view contradicts a common explanation that rural-urban migration was mainly stimulated by bad rural circumstances forcing rural poor to move. Just the opposite took place: especially those descending from the somewhat higher social classes were much more inclined to move to the cities, as they had more useful skills than unskilled labourers for an urban environment. Although also experiencing relatively higher chances on downward mobility, migrants moving to cities showed, on average, even more upward social mobility than rural stayers in almost every social group. The GIHCD also made it possible to look at short stayers. According to some research (see Puschmann, 2015), positive results for rural to urban migration are biased upward because they do not account for migrants that return after a short stay in the city. In this view, a return to the rural environment likely represented a failed move to the city. Paping and Pawlowski (2018), however, show that even a short stay in an urban environment led on average to lasting positive effects on the social position for most socioeconomic groups.
The success of the so-called cohort analysis in the Integral History Project was restricted by the combination of ambitions that were too high, some naïve decisions being made and the lack of structural funding, in addition to the consequences of enormous and rapid technical changes taking place since the 1980s. Until 1995, the quality of the database was too low, as a consequence the research output was not very impressive. Later on, the quality improved and several meaningful scientific publications were based on it. However, the use of the database remained difficult due to the struggle to regularly update it technically, and to incorporate new digitally available sources, without proper funding.
Despite all these drawbacks, in one respect the quality of the GIHCD is extremely high. In contrast with other databases only less than 2% of persons involved could not be traced during their whole life course, making it very representative as it also includes the most extreme and rare cases that in other databases are often missing. This is a major reason to try to make the database better available in the future, both in an IDS structure and as an EXCEL spreadsheet.
The relatively small size remains problematic, especially in this time of digitalisation. It seems rather odd to study only a sample of the population, while -at least for part of the less complex questions -you can just as well use the enormous databases now available in the Netherlands. After all, as the quality of softwarebased record linkages is still improving, the future might not be for samples as the GHICD, but for enormous databases connecting events of all persons in a geographical region, at least for the 19th and 20th century (see for northwest Groningen: Paping & Schansker, 2013, and maybe even for an earlier period. However, as long as not all the relevant sources have been digitalised in a way that makes linking possible, and as long as digitalised linkage procedures are still not perfect, there is a need for high quality specific databases like the Groningen Integral History Cohort Database.