Reconstructing a Longitudinal Dataset for Tasmania

e-ISSN: 2352-6343 DOI article: https://doi.org/10.51964/hlcs10912 © 2021, Cowley, Frost, Inwood, Kippen, Maxwell-Stewart, Schwarz, Shepherd, Tuffin, Williams, Wilson, Wilson This open-access work is licensed under a Creative Commons Attribution 4.0 International License, which permits use, reproduction & distribution in any medium for non-commercial purposes, provided the original author(s) and source are given credit. See http://creativecommons.org/licenses/.

The Tasmanian Historical Dataset is a longitudinal data resource spanning the 19th and early 20th century. It includes information on all Tasmanian recorded births and marriages in the period 1803−1899 and all deaths to 1928. It also contains data describing many other life course events including records of arrival and departure, court appearances, military enlistment, property valuations for taxation purposes, details of bank accounts, census and muster returns, street directories and hospital and pauper admissions. As well as individual level data, the collection also contains additional tabulated data from census returns and statistical reports and digital images of many of the original records from which transcripts have been taken.
The dataset has resulted from collective research endeavours dating back to the 1990s and has been put together by researchers working at multiple institutions. Within Australia these include the Universities of Tasmania, Melbourne, Flinders, Monash, South Australia, Newcastle, New England and Griffiths. In addition, researchers at Guelph (Canada) and Liverpool and Oxford (UK) have supplied both data and expertise. The research collection has also benefitted from productive partnerships especially with the Female Convict Research Centre, the Tasmanian State Library and Archive, the Port Arthur Historic Site Management Authority as well as other collecting institutions. Data is continually added as a result of a volunteer transcription program organised through DIGIVOL, the crowd sourcing platform of the Atlas of Living Australia.
The Tasmanian Historical Dataset differs from other initiatives in a number of important respects. While historical life course and intergenerational record sets are now relatively common, it is unusual for these to contain data harvested from multiple sources (in this case drawn from more than 60 different archival series). The cosmopolitan nature of the collection reflects the diverse interests of the team of researchers who assembled it. They primarily work in the fields of economic and social history and historical archaeology, criminology and demography. The diversity of the collection presents some distinct advantages. The ability to interrogate multiple total count datasets in parallel is particularly powerful, providing opportunities to understand record keeping processes and other selection bias issues in greater detail than may be possible with more restrictive data collections.
Other differences reflect the peculiar nature of Tasmania's colonial past. A notable feature, for example, is the inclusion of life course data for 64,819 male and 13,673 female transported convicts. This includes a detailed record of all punishments inflicted on these individuals down to each day spent in a dark cell and each stroke of the lash. The public nature of the data is also unusual. As the dataset has been assembled as a result of a collaboration with local and family history researchers as well as archives and heritage sites it has for long had a life outside of the walls of academic institutions. It feeds information, for example, into the Tasmanian Archive search portal as well contributing to a number of educational and site interpretation tools.
This article starts with a brief description of Tasmania and its history. We then provide a more detailed description of the source materials, our approaches to record linkage and the structure of the datasets and the variables they contain. We conclude by making some observations on some of the problematics of digital record reconstruction using criminal and colonial archives as well briefly outlining some of the public aspects of the data and its potential research uses.
Tasmania, which used to be known as Van Diemen's Land, is the smallest state in Australia. According to the 2016 Australian census the island had a total of just 509,965 inhabitants -two-fifths of which were resident in the capital, Hobart. Despite its small size, Tasmania has a number of characteristics that make it of particular interest to researchers who wish to explore the extent to which the life experiences of individuals impact upon the health and socio-economic status of their descendants. Tasmania's relatively small population and defined geographical boundaries provide distinct record linkage opportunities. It is also a place with a long history of organised record collection.
The island was first occupied around 42,000 years BP. It was subsequently cut off from the Australian mainland by rising sea levels following the end of the last glacial maxima around 12,000 years ago. The next contact between Tasmanian Aboriginal peoples and the outside world occurred in 1642 when the Dutch East India Company commander, Abel Tasman, briefly landed on the east coast naming the island Van Diemen's Land (it was renamed Tasmania in 1856). The first European settlement dates from the 1790s when sealing crews working out of Sydney occupied some of the offshore islands. Official settlement followed in 1803 when the British government sent landing parties to the island in order to secure the main anchorages in the north and south.
For the first fifty years following its colonisation Van Diemen's Land served as one of the principal penal colonies of the British Empire. During that time, it received at least 73,500 convicts -about 45% of all of those despatched to the Australian colonies. European unfree labour catalysed the process of colonisation. In effect the British used the labour of thieves to steal the island, an act of dispossession that culminated in frontier conflict and the enforced removal of the surviving Tasmanian Aboriginal population to offshore mission stations. As a result, the population of the island rapidly declined following first settlement (see Figure 1).
The convicts sent to Van Diemen's Land in the years 1803-1853 were predominantly tried in British and Irish courts, although small numbers arrived who were convicted in other British colonies including Mauritius, India, the Cape, New Zealand and the West Indies. They constituted the single most important source of colonial labour for the first five decades of the colony's existence. Rather than being kept locked behind forbidding institutional walls, the majority of serving prisoners were loaned or hired out to private sector masters. While some masters were former convicts, or the colonially born descendants of transported prisoners, many arrived free. A relatively small number of free migrants received a disproportionate share of grants of land as well as access to cheap convict labour. As the costs of maintaining a prisoner amounted to 59% of a free wage, this settler elite benefitted substantially from the offshoring of Britain's criminal justice system (Panza & Williamson, 2019).
The partnership with the private sector ensured that convict Van Diemen's Land shared much in common with colonial plantation economies. While private sector masters could not punish their unfree servants, they could bring a charge against them in a magistrate's court. These institutions were empowered to sentence a convict to undergo further punishment in a house of correction, a road or chain gang or a penal station. This ensured that two unfree labour systems, one run by the private sector and the other by the state, operated in parallel.
From the 1830s a series of colonial initiatives attempted to attract alternative sources of free labour through assisted migration programs. These became particularly common after the cessation of convict transportation to the colony in 1853. These initiatives were particularly aimed at recruiting female migrants. The settler colonial population had a marked male skew as well as an age structure significantly different from that characteristic of Old-World populations (see Figures 1 and 2). The colony also experienced a marked temporary decrease in population (especially amongst males) as a result of the discovery of gold in the neighbouring colony of Victoria in 1851.
Perhaps because such a large proportion of 19th-century European migrants to Tasmania arrived as convicts, the colony developed a number of record-keeping systems which were unusual in both coverage and the detailed nature of their content. Although plans to introduce an annual census did not eventuate, 29 censuses were conducted in the period 1837-2016 -an average of one every six years. While it has long been the practice in Australia to destroy individual returns after the publication of each census report, the one exception is colonial Tasmania. Complete or partial returns are available for the six censuses conducted between 1837 and 1857. In addition, digitised tabulated data exists for all other censuses. These data were supplemented in the years before Tasmania joined the Australian Commonwealth in 1901 by annual statistical reports forwarded to London as part of the trans-imperial Blue Book system of colonial reporting.
Tasmania is also blessed with the second longest run of birth, death and marriage certificates in the Anglophone world -civil registration was introduced in 1838 (one year after England and Wales). These sources are available in digitised form from 1838-1899 and from 1970 to present. Plans are currently in train to digitise the remaining hard copy death certificates from 1900-1969. Numerous digitised parish records also exist. These are especially important for the years prior to the introduction of civil registration, but can also be useful after 1838. Some provide additional information such as the ship of arrival to the colony for example. Other series can be used to check the extent of under registration by particular religious denominations. Catholics, for example, did not always comply with civil registration in the belief that registration with the Church of Rome was sufficient.

Figure 2
Age The men and women transported as convicts to Australia are of interest to historians because of the detailed way in which they were described. This process started with arrest and prosecution in Britain and Ireland and other trans-imperial courts. Links between records held in the Tasmanian Historical Dataset and British criminal justice system records are contained within the Digital Panopticon website (https://www.digitalpanopticon.org). This is especially the case for those tried in London's Old Bailey.
Post-conviction convicts were held in British and Irish institutions before they were embarked on transport vessels bound for Australia. Archival series have been transcribed for several of these holding institutions including Grangegorman female penitentiary, Dublin, 1840-1852; Millbank convict prison, London, 1837-1846; Pentonville penitentiary London, 1842-1847 and all British Hulks (prison ships used to accommodate prisoners sentenced to transportation), 1837-1845. Many entries in these series are for individuals who were sentenced to transportation, but instead served out their sentences in Britain or Ireland or were pardoned. Others are for convicts who were sent to penal colonies other than Van Diemen's Land. These data are useful in that they can be used to explore the manner in which convicts were selected into various transportation streams. They also enable a detailed comparison of institutional death rates. Finally, many of the inmates in penitentiaries and hulks were interviewed and described in ways that were similar to the processes that accompanied disembarkation in the Australian colonies.
The ability to gather multiple observations for the same individual is useful as it enables an analysis of different data collection processes, as well as differences in response to similar questions over time. An example of this is provided in Figure 3 which explores differences in answer to questions about literacy levels provided by 8,227 male convicts on admission to the hulks in England and on arrival in Van Diemen's Land approximately eight months later. A notable feature of this Sankey chart is that more convicts claimed improvements in literacy levels than those reporting a decline.
The surgeon superintendent appointed to maintain health and discipline on each convict vessel was instructed to keep a record of all treatments administered during the four-month voyage to Australia. The journals for 289 voyages to Van Diemen's Land sailing in the period 1817-1853 were imaged in the National Archive, Kew, London and the treatment lists contained in each transcribed.
On landing each convict was provided with a police number -an early example of the use of identifiers to aid the tracking of individuals. This number, together with the details of the ship of arrival, the length of their sentence and place and date of trial, were used to index all records subsequently generated in the colony. Every convict was also interviewed prior to disembarkation. Together with a record of their past interactions with the court system forwarded on the same vessel that carried them into exile, these testimonies provide details of place of birth, age, next of kin, religion, occupation, level of literacy and a statement of previous life circumstances including the number of times each man and woman had been convicted. Each new initiate into the penal colony was also measured to the nearest quarter inch and described. This process included the documentation of scars, injuries and other deformities, as well as eye and hair colour and the documentation of any tattoos.
Many additional archival series were generated in order to assist with the operation of this complex unfree labour system. The most important of these were the conduct records (also known as the black books). This elaborate series of registers contain summaries of every colonial court encounter by a convict still under sentence, as well as many charges brought against former convicts long after they had become free. These include a detailed enumeration of every punishment down to each stroke of the lash applied to a convict's back, each day spent in solitary confinement or at hard labour.

Figure 3
Differences in response to questions about literacy provided by male convicts on admission to British hulks and subsequently on arrival in Van Diemen's Land (n=8,227) Sources: Tasmanian   Other digitised records include notices of appointments to the colonial police (the latter was largely staffed by serving convicts), details of prisoners transferred to different private sector employers, descriptions of prisoners who had absconded, applications for permission to marry and information about the receipt of tickets of leave (an early form of probation) and the issue of pardons and certificates of freedom. Before 1840 convicts in private sector employment were not supposed to be paid a wage. After that date a system of payments was introduced in order to distance penal transportation from any association with slavery. Such payments were tracked through a series of registers that specified the duration of each passholder contract and the amount the convict would be paid. Finally, the deaths of convicts who were still under sentence were not entered into the civil registration system. Instead these were recorded in a separate series of convict death registers.
Three levels of courts operated in Tasmania: magistrates' benches and police courts (also known as lower courts and or Petty Sessions); quarter sessions and the supreme court. Records for all defendants, charges and verdicts in Tasmanian  After 1865 details of both convicted and discharged prisoners were routinely published in the Tasmanian Police Gazette. Data has been collected from 50,387 of these notices covering both male and female prisoners in the period to 1924. Available information includes date and place of conviction, charge and sentence, place of birth, ship of arrival to the colony in the case of those not born in Tasmania, age, occupation and physical description including height to the nearest quarter inch.

COLONIAL COURTS AND CRIMINAL JUSTICE RECORDS
Although the convict population was mustered annually returns survive only for the years 1822, 1823, 1825, 1830, 1832, 1833, 1835, 1841, 1846 and 1849. The four musters conducted between 1830 and 1835 have been transcribed. Data includes information on the current place of employment of each convict as well as their police number and ship of arrival to the colony.
Returns for 14,870 households censused between 1837 and 1857 survive in manuscript form (see table 1). Although no return for an individual year is complete, the surviving returns are organised by parish. Even in years where only a small fraction of the original returns survive, these represent complete parish returns. These records have been digitised providing information on the name of the head of household, age, sex, religion and civil status (convict or free) of all occupants and the address, size and nature of the dwelling. All tabulated census returns contained in the original 29 census reports published between 1837 and 2016 are also included in the Tasmanian Historical dataset alongside a series of shape files which map changes in census collection districts over time. Information about free arrivals were sourced from the Tasmanian Archive index to free arrivals to Van Diemen's Land in the years to 1856. The file contains details of 24,232 arriving passengers. A companion file contains information on 114,452 departures in the period 1817-1858. Usefully this provides information on the ship of arrival in the colony (a great help in identifying former convicts). In addition, more detailed records are available for 10,631 assisted migrants arriving in the period 1852-1858 (Tasmanian Archive, CB7-1-13-20). These records contain information about age, sex, marital status, religion, native place, literacy, occupation, employer's name and agreed wage rate.
A total of 195,000 births were registered in Tasmania between 1838 and 1899 and 51,000 marriages, all of which have been digitally transcribed. A longer run of 155,000 digitised civil registered deaths for the years 1838 to 1928 is also available. Information includes age at death and cause of death. There are slight variations in the information included in birth, death and marriage certificates over time.
Usefully, details of place of birth were included on Hobart death certificates from 1857, Launceston from 1886 and all death certificates from 1895. Information relating to surviving children is also included on 20th-century certificates as well as details of marriages and spouses. The transcriptions for all three series have been linked to digital images of the original records.
In addition to civil registered births, deaths and marriages, the collection also contains transcripts taken from ecclesiastical registers. These contain information on 19,723 baptisms and 8,828 burials many of which predate the introduction of civil registration in 1838.

BIRTHS, DEATHS AND MARRIAGES
Many convicts arrived in Australia with goods and cash. These were held in trust by the colonial state while the convicts served their sentence. From 1829 on cash sums were entered into a Convict Savings Bank managed by the directors of the Derwent Bank. Between the years 1845-1863 the Hobart Savings Bank kept a record of all customers who had opened bank accounts, both free and unfree. This dataset consists of 12,240 records and includes information on age, sex, occupation, place of residence, civil status (convict or free) alongside a physical description, including height -information that was committed to file in an attempt to stop fraudulent access to accounts (Tasmanian Archive, TAHO NS1167).
Several other record series in the dataset include information about place of residence. Thirty-five trade and street directories for Hobart and Launceston were published between 1825-1854. These contain details of 26,000 addresses, many relating to shops and businesses. Tasmania  Data for 15,234 Tasmanian-born soldiers and nurses who enlisted in WWI has been harvested from attestation papers held in the Australian National Archive. This information includes date of birth, name and address of next of kin, height to nearest quarter inch, weight in stone and pounds and both expanded and unexpanded chest measurement.
Tasmania has a rich resource of cartographic and planometric material. The collections are predominantly held in two main institutions: the Tasmanian Archives and Land Tasmania. The former holds over 180,000 records, of which 26,000 are digitised. Its collection encompasses exploration charts, road and town charts, architectural plans and elevations, drawn from a range of government, institutional and private sources. Land Tasmania retains title and deed plans relating to the administration and sale of land back to the early 19th century. Links to both collections are contained within many digitised records held within the Tasmanian Historical Dataset.
Commencing in 1825 daily weather measurements for Hobart were routinely published in the press. Information varies in detail from year to year but always includes minimum and maximum daily temperatures and barometer readings as well as wind direction. There are some gaps in the series notably between February 1827 and April 1838. Weekly averages are available for some years where daily data is missing.
Our approach to data coding has been guided by the Intermediate Data System (Alter & Mandemakers, 2014). All data has been separated into two types of entity: persons and contexts. Each of these may be assigned attributes such as a person's sex, age, occupation and civil status at any given point in time, or specific events such as court appearances, marriages, admissions and arrivals, departures

DATA CODING
and discharges. Contexts are usually locations and can be nested one within the other. Thus, Spring Grove is a property in the district of Patterson's Plains that lies within the parish of Selby which is itself contained within the Tasmanian county of Dorset. Individuals may be linked together by familial or social relations. They might also be linked to particular contexts at any point in time. Thus, several individuals might reside at the same property or be barracked in the same building. Individuals can also share the same place of employment. All individuals and contexts are assigned unique identifiers All attributes and relationships are assigned codes managed through coding dictionaries. These list each occurrence of every variable and the codes that have been assigned to it. Wherever possible international coding systems have been used to populate these dictionaries. Inevitably, however, the processes of analysis have involved some adaption or the creation of new codes. These variations have been documented within each dictionary.
All occupational information has been coded according to HISCO, the Historical International Standard Classification of Occupation (van Leeuwen, Maas, & Miles, 2002). This fine-grained system of organising occupational descriptions according to the nature of each task has been mapped onto HISCLASS, a related coding that uses information about occupation to proxy social stratification (van Leeuwen & Maas, 2011). The HISCLASS handling of some occupational groups sits oddly with what we know about New-World social structures. This is particularly the case with agricultural and pastoral landholders. In order to take account of these differences, as well as to compare results with previous studies, other industrial and social stratification codes have been added. These include an industrial classification based on the British 19th-century census first applied to convict data by Lloyd Robson (1965) and Armstrong's social classification system and the Nicholas and Shergold variant of this (Nicholas, 1988). As new occupational data is collected, the dictionary is used to automatically code all descriptions of job titles and work processes previously encountered, ensuring consistency of classification across different datasets.
Identifying the means by which individuals arrived in the colony can be important for record linkage. It was also an important determinant of social status. Identifying a particular voyage of arrival is often complicated by vessel naming practices. Multiple ships named the Asia sailed to Tasmania for example. In addition, many vessels made more than one trip to the colony. The assigning of unique voyage codes for convict vessels has proved particularly important in record linkage. It is also important in that the same coded attribute can be used to identify individuals who share a relationship as 'shipmates'. At time of writing work is progressing on a related project to identify free migrant voyages.
There is no agreed international coding system for classifying historical information about crime although several contemporary classification systems exist which can be mapped onto one another. This includes the International Classification of Crime for Statistical Purposes (United Nations Office of Drugs and Crime, 2015). A particular problem with matching 19th-century Tasmanian criminal justice data to these schema is that many of the charges brought against serving convicts are not commonly encountered in contemporary judicial systems. This reflects the manner in which the labour of prisoners was outsourced to the private sector. As a result, magistrates' courts often heard offences that revolved around the non-performance of work or the perceived degree of effort or diligence that convict workers were said to have displayed by masters or others charged with supervising them. Many were prosecuted for 'refusing to work' and malingering. 'Concealing a pregnancy' was even regarded as an offence -convict women who fell pregnant were routinely sent to the house of correction for punishment. Others were charged with 'insolence' or infractions of the rules governing the management of different institutions. To provide an illustration of the extent of the problem, while 67,606 magistrates' bench charges are recorded in the conduct records for 13,415 convict women, only 1% of these involve offences against the person and only 5% were for offences against property. By contrast, 61% were for breaches of the rules and regulations governing the conduct of prisoners under sentence (see Figure 6). As convict administrators had to account for the distribution of such charges, they developed a classification system which we have utilised to code this data. As with information about occupation, we have created coding dictionaries to ensure standardisation of coding across information retrieved from multiple archival series. Sources: Tasmanian Archive, Con 40 and Con 41 series.

CRIMINAL JUSTICE DATA
Nineteenth-century causes of death and diagnoses can be difficult to map onto contemporary classification systems. To address this we adopted the 32-category system created by Rebecca Kippen, which usefully combines aspects of William Farr's 19th-century nosology with the contemporary international classification of diseases. This system is sufficiently broad to be analytically meaningful while at the same time specific enough to enable particular mortality and diagnostic trends to be plotted over time (Kippen, 2011). While Kippen's original schema was developed to code causes of death, our datasets also include information about other episodes of ill-health. In order to capture data about some events that were commonly diagnosed but rarely resulted in death, we have included some additional categories (see Table 2) Many contextual variables have been geolocated including place of birth, conviction, incarceration and work. A Geographic Information System (GIS) has been used to geolocate historic maps and plans to modern survey and archaeological information. From this we have been able to digitise data at multiple scales: historic parishes; hundreds; counties; local government areas; townships; streets; buildings and even rooms. Nested spatial 'containers' or shape files are created as part of this process. Non-spatial information can be linked to these digital contexts effectively populating spaces with people, processes and products.
Granularity is an issue commonly encountered in historical research. This problem arises when information about place is recorded in ways that enable a more precise identification of location in some cases compared to others. Thus, while some convicts provided the name of the street they were born in, it was more common to report a parish of birth. Others still only provided information about their county of birth-a particularly common occurrence with convicts from Ireland. In these instances, we used the code for the county town but created an additional variable to inform users that this was a proxy location and that the precise place of birth was unknown. Source: (Kippen (2011) and Maxwell-Stewart and Kippen (2015).

GEOLOCATING DATA
A related issue is the difficulty of locating names shared by more than one place There are several places that are named Newcastle in the United Kingdom for example. While these are commonly distinguished by reference to local geographical features such as Newcastle-Under-Lynne and Newcastle-Upon-Tyne, this is not always specified in the original record. In terms of convict places of birth that might refer to multiple locations, we adopted the practice of geolocating to the location closest to the court in which the convict was sentenced to transportation.
For places within Tasmania we have adopted similar measures to cope with granularity by adding a resolution variable using three values. For the geocoding of places of incarceration or work, those which could be precisely mapped were marked as being geolocated with a 'high' level of resolution . Places which were located from poorly-geolocated maps, or were locatable to an area only (such as a precinct or parish), were marked as 'medium'. Other locations which could not be pinpointed to a specific location or area were mapped to townships or districts and accorded a 'low' level of resolution. The list of geolocations, along with their site-specific codes, has been archived within the Australian Gazetteer of Historical Place Names.
We have used geolocated historic maps and plans, survey and archaeological data to facilitate the spatio-temporal reconstruction of some built landscapes. Thus, a former penal station, Port Arthur (1830-1877), has been dynamically mapped across its 47-year period of operation . This has enabled the digital reconstruction of buildings, individual spaces, walls, fences, roads and workplaces. Where these have been named in the charges brought against convicts serving at this penal station, it has proved possible to link this data to each digitised context allowing offences to be mapped in time and space.
Shape files can play a particularly important role in linking individual level data to tabulated census returns. While each shape file can be tied to a census table, in order to analyse regional change between censuses it is necessary to identify differences in regional collection boundaries between censuses. The last census, conducted in 2016, was organised around a system of mesh blocks. These are the smallest geographical area defined by the Australian Bureau of Statistics and are primarily organised around land use. It is thus unusual for a mesh block that contains primary production land to include areas zoned as commercial, residential or parks. Each mesh block has been designed to be large enough to protect against accidental disclosure of confidential information. To this end, the majority of populated mesh blocks contain between 30 to 60 dwellings. Many 2016 mesh blocks, however, are completely unpopulated.
We have retrospectively mapped the 2016 mesh blocks onto previous census collection districts. Over the course of this exercise we have identified three types of alignment issue. The first of these can be attributed to differences in mapping standard. Nineteenth and twentieth century maps that depict the boundaries of parishes and local government areas were not surveyed to current standard. As a result, it often appears that boundaries do not align, although examination of underlying topographical features reveal that this is entirely due to mapping inconsistencies. Where we have encountered such 'cadastral noise', we have used the 2016 mesh blocks to redraw historic boundaries.
The second type of misalignment is caused by land use patterns. On occasions a current mesh block includes areas on both sides of an historic boundary, but on closer examination one part of the area bisected by the historical division contains settlement and the other does not. The unpopulated section of the mesh block typically consists of pastoral land. Where this has occurred, we have again used the 2016 boundaries to redraw the historic boundaries.
While in most cases it has proved possible to match mesh block boundaries to historic parishes, local government areas and subsequent census districts, there are occasions where boundaries have been redrawn in ways that problematise comparisons between successive tabulated census returns. In these cases, we have used the 2016 mesh block boundaries to highlight where these major realignments have occurred and have provided sufficient documentation to alert subsequent users.

MAPPING CHANGES IN CENSUS DISTRICT
Our general approach is to internally link each record series before linking across series. Thus, the valuation rolls are internally linked so that successive annual records which record the same occupier in the same address are formed into chains. Likewise, births are first linked to parental marriages in order to link the same individual in both records and define familial relationships. Both lists are then matched in a subsequent exercise aimed at locating individual and family records within households.
Linkage is an iterative process. We first generate standardised lists of key variables especially first name, surname name and ship of arrival. Automated record linkage queries are then run using these cleansed variables. All matches are then checked and cleaned using a duplicate query. Soundex codes are used in subsequent iterations. Linkage weights are employed to evaluate each step of the process and these are retained within datasets in order to assist with subsequent iterations. Finally, remaining unmatched records are examined by hand.
Other record linkage processes are used to match owners to businesses, businesses to places of residence and magistrates to police districts. Some record series also contain attributes that ease the process of record linkage. WWI attestation papers, for example, provide detailed information of next of kin as well as recording age in years and months. This considerably facilitates linkage with birth certificates (Inwood, Kippen, Maxwell-Stewart, & Steckel, 2020). Similarly, criminal justice series routinely contain attributes that lend them to linkage. This includes information about former dates of conviction and sentences. In the case of convicts transported to the Australian colonies, other identifiers were used to help administrators retrieve records pertaining to the same individual filed in different registers or correspondence series. These include police numbers and the name of the ship that transported each convict into exile. As a result, it is possible to link the records for serving convicts with a great deal of certainty (Maxwell-Stewart, 2016).
Record linkage for time-served convicts is more problematic. After all, the men and women 'lagged' to Australia had a vested interest in escaping their past. Name changes were commonplace as former convicts tried to reinvent themselves. The task of tracking emancipated prisoners post-release is more challenging for men than women, a reversal of the normal paradigm. Any convict who wished to marry while still under sentence had to apply for state permission. A much higher proportion of convict women compared to men are named in the registers that governed these processes, a reflexion of the colonial sex imbalance. Since it is relatively straightforward to locate a marriage certificate if both bride and grooms' names are known, the rate of linkage for convict women to marriage certificates is high. Post-sentence migration rates for former convict women are also lower than for men, a smaller proportion left for Victoria following the discovery of gold in 1851 for example (see Figure 1). As they were less mobile and more often embedded in family structures, they are more visible than might be expected although many women in de facto relationships changed their name to their partners name adding a further level of complication.
Despite these difficulties, many colonial record-keeping systems contain clues to identity. One of the reasons convicts sought to hide the details of their former lives is that the manner of arrival in the colony was regarded as a marker of status. Both state-and church-administered record systems sought to include identifiers that could help track legal status. It was common for individuals to be described as 'native' -that is colonially born -or came free, an indication that they had arrived in the colony as a migrant and not a prisoner. Former convicts on the other hand often had their records marked: C.P, standing for conditional pardon, or F.S., free by servitude, or F.C. free certificate -all indicators of former servile status. The ship that a person arrived in the colony on could also reveal much about former legal status. While such annotations are particularly common in criminal justice systems, they were also included on other records including church burial records, hospital admissions and registers of departures from the colony.
Some records also include descriptions of individuals. While this is more common with criminal justice records some bank account registers also contain physical descriptions. Eye and hair colour, height, and descriptions of scars, physical deformities and tattoos can provide useful identity pointers. While it is difficult to use this information to assist automated matching, it can provide a useful check for evaluating problematic matches. Physical descriptions can be particularly useful aids for linking records that could not otherwise be matched as a result of the use of aliases or other name changes. While the heavy reliance the convict system and subsequent colonial record-keeping systems placed on

LINKAGE BETWEEN SOURCES
descriptions as a means of checking an individual's status eases the issues of record linkage, it is dangerous to assume that those described in these records are representative of all former convicts. Just as convicts who marry are easier to trace in subsequent records, so are those who continued to have interactions with the criminal justice system.
Selection bias presents an ever-present challenge for historians. Since archival records were created in the past, researchers must make assumptions about the extent to which the resulting data is representative of the particular issues they wish to study. There are two aspects to this. First, the degree to which the information they utilise is representative of the wider record collections from which that material has been drawn; and second, the degree to which those collections reflect the historical realities that the researcher wishes to shed fresh light on. The use of total-count data can reduce the risk associated with the first of these processes, although surviving records may not be representative of the full range of information originally collected. Nevertheless, any attempt at digital reconstruction is likely to highlight gaps in a series and provide a means of estimating the extent of undercounting.
The digitisation of multiple series can help to explore the second type of selection issue. Comparisons of the ways in which individuals are described in multiple series can throw considerable light on original data collection processes. It can also identify individuals absent in one record but present in another. Both processes can yield information about the ways in which men and women were selected into different record-collection exercises. They can also be useful in reconstructing human agency. Most records are the product of an encounter between a person or household and the state or church. The way in which similar questions are answered in different contexts can reveal much about individual circumstances at the point in time when each record was formed (Maxwell-Stewart, 2016).
As others have argued, all archives were developed as administrative tools and as such are neither passive observers of the past or static entities. Because they were created with a particular purpose in mind, the way in which individuals are represented within archival collections reflect the concerns and prejudices of successive administrations. This is perhaps particularly the case with criminal justice and colonial records, series that were created in order to aid the policing and control of particular subpopulations. It is for this reason that Ann Laura Stoler argues that archival series need to be first read along the grain before an attempt is made to co-opt them for other research purposes (Stoler, 2009).
A key rational behind the Tasmanian Historical Dataset is that the assembly of digital record series in parallel will aid the kind of deep read advocated by Stoler -helping shed light, not only on selection processes, but also the underlying rationale that caused some individuals to be omitted from some series and described in particular ways in others.
Reflecting its collaborative origins, the Tasmanian Historical Dataset has many different potential uses. In the following section we summarise some of the ways in which the information it contains has been employed to date as well as outlining future work.
Multiple mechanisms are known to shape the health outcomes of both parents and their children (Kuzawa & Eisenberg, 2014). Maternal literacy is powerfully associated with improved intergenerational outcomes, for example, while elevated levels of alcohol consumption during pregnancy can stunt offspring growth and retard cognitive development (Riley, 2001;Rose & Cherpitel, 2011). The thrifty phenotype hypothesis posits that a history of maternal undernutrition may trigger foetal responses that favour the development of critical organs at the expense of others, leading to increased risk of chronic illness later in life (especially cardio-vascular disease and type-2 diabetes). Recent work also links thriftier

LIFE COURSE AND INTERGENERATIONAL ANALYSIS
metabolism mechanisms in early life to poorer cognitive, immune system and reproductive development (Pike, 2016). Genetic factors play a role in the determinants of intergenerational health, although the difficulty of distinguishing genetic and environmental pathways is complicated by interactions between the two. There is evidence that effects associated with trauma exposure, for example, can be transmitted across generations via the chemical coating of chromosomes (Kellerman, 2013).
Intergenerational datasets composed of many linked life-course events are needed to explore these underlying causes of familial inequality. The ideal study population would consist of those who experienced a set of well-defined adverse circumstances and a suitable control group which escaped these particular experiences, but otherwise shared many characteristics. Such populations are rare. Gavrilova and Gavrilov (1999) find that most available data for the intergenerational study of longevity in Europe, North America and East Asia, have significant shortcomings. The data often target highly specific or localised populations and lack one or more important features, such as information on women, background information on socioeconomic conditions, the timing of stressful events for individuals, causes of death, individual characteristics, or family background. Their study highlights the need for new data to fill the evidence gaps and support a more convincing exploration of the transmission of inequality across generations. Tasmania is perhaps an exception to this general rule.
A particularly important feature of the Tasmanian Historical Dataset is that it includes life-course information for men and women that arrived as convicts and assisted migrants. Although these two populations migrated under different circumstances, similar detailed information is available for both. This includes details of place of birth, age, occupation and literacy. Thereafter the experience of the two groups differed markedly. While the convicts were subjected to punishments, including solitary confinement and hard labour, the assisted migrants were not. The abundance of civil registration data in Tasmania enable, not only the tracing of convict and assisted migrant cohorts to death, but also the identification of their Tasmanian-born children and grandchildren. An additional important attribute is that those children and grandchildren were raised in New World environments radically different from those that shaped the early life experiences of their parents.
Work to date suggests that some punishments, particularly solitary confinement exposure, cut short the life expectancy of convicts (Kippen & McCalman, 2015;McCalman & Kippen, 2020). While further work needs to be undertaken to determine whether these effects impacted upon the lives of subsequent generations, preliminary evidence provides an indication that the children of transported convicts experienced some benefits as a result of the misfortune of their parents. They were tall compared to Old World populations for example. Moreover, colonially born prisoners born to mothers who had been transported as convicts were taller than those whose mother had arrived free. This surprise finding probably reflects the smaller number of children born to emancipist mothers and hence the greater availability of resources per head in ex-convict households (Maxwell-Stewart, Inwood, & Stankovich, 2015).
As with the intergenerational transmission of health inequalities, there are multiple mechanisms that might explain the perpetuation of prosecution histories across generations. These include poor parenting, poverty of expectation and opportunity and the genetic transmission of conditions likely to heighten the risk of arrest (for example, schizophrenia). Such risks are often heightened by policing strategies. The children of those 'known suspects' are likely to be more severely policed than others. An island originally populated by transported convicts which is also blessed with long runs of digitised criminal justice and civil registration data, Tasmania provides an ideal opportunity to explore the different pathways by which prosecution risk might be transmitted from one generation to another (Godfrey, Inwood, & Maxwell-Stewart, 2018) Much of the literature pertaining to the mechanics of disadvantage, and its persistence through generations, has been formulated around social determinants. For example, there is a rich literature examining the links between education and future earnings and productivity, suggesting that these relationships may be intergenerational (Goldin & Katz, 2001). These pathways are complex. Thus, higher material well-being may lead to better educational outcomes not solely through the availability of financial resources, but through better health and higher life expectancy, which in turn generate greater incentives for investment in education. Far less attention has been given to how factors such as housing and the development of urban infrastructure feed into this process. This is somewhat

ENVIRONMENTAL DETERMINANTS OF WELL-BEING
surprising given that the dwellings in which people live provide them with the most elemental of protections through shelter from environmental conditions and constitute a basic human need. It follows that dysfunction at this level will potentially have a profound impact over the development of human capital and subsequent life-course outcomes.
A related issue is the extent to which social mobility might exacerbate or mitigate early childhood disadvantage. Most measures of social mobility are based on occupational data (or in the case of 19th-century women, the occupations of their husbands as recorded on a marriage certificate). These measures are sensitive to age effects and changes in wage rates over time. They provide at best a snapshot at a particular point in life. The availability of annual housing valuation data provides an opportunity to create a more robust set of measures to explore these issues. It is certainly unusual to have a continuous measure which can be used to analyse the impacts of changing home ownership or variations in the relative value of an individual's place of residence over the life cycle. The ability we have to examine the effects of place and value of residence across generations is particularly rare.
Many archival records contain references to locations that can be converted to multi-scalar and multitemporal objects through the linking of historic and contemporary geographical data. Thus, notices of convict transfers between properties and the individual labour contracts signed by convicts in the period after 1844 can be geolocated in order to reconstruct the seasonal flow of labour between different sectors of the economy. Such analysis is useful in that it can provide an indication of the ways in which human capital -as measured by age, sex, skill and literacy rates -influenced contract length and individual levels of renumeration (Meredith & Oxley, 2005).
The charges recorded in a conduct record contain multiple references to place. These include the location of employment, the court where the case was heard and the site the convict was sent to undergo punishment. The specifics of each charge summary often contain additional geolocatable information such as the name of an inn or a particular building where an offence was said to have occurred. Other records can be used to determine the number of convicts at each employment location over time, enabling a reconstruction of prosecution risk across different types of urban and rural, and public and private workplaces. Analysis of these record collections confirms that those convicts with particularly valued skills were more likely to be provided the benefit of the doubt than was the case for those who were easier to replace. Punishment strategies also reflect the costs of maintaining a convict compared to hiring a free worker. When convict labour was relatively expensive, magistrate's benches were more likely to sentence prisoners serving in the private sector to stints of hard labour in a road party where they would be maintained by government (Maxwell-Stewart, 2015).
Similar forms of analysis can be used to explore the ways in which the deployment of convict labour impacted upon the landscape. Archaeological survey combined with the analysis of LiDAR (Light Detection and Ranging) remote sensing has been successfully employed to reveal the physical traces of convict labour (Tuffin, Roe, Gibbs, Clark, & Clark, 2020). Such sites include quarries, clay pits, sawpits, roadways, tramlines and buildings. Such contemporary landscape imaging can be aligned with historical maps and administrative records in order to answer questions about construction methodology, as well as provide greater insight into the extent and impacts of labour on both the environment and the individual (see Figure 7). Reconstituted punishment records can then be used to explore the extent to which levels of coercion varied across different work landscapes .
The ability to place convicts in individual workplaces can also aid analysis of worker agency and resistance. Using runaway notices placed in the Hobart Town Gazette, for example, it is possible to plot absconding patterns across different locations (see Figure 8). Such point-cluster maps illuminate regional variations in absconding and prosecution patterns over time . While many other convicts were punished for 'refusing to work', analysis of collective action has been hampered by the way in which this information was recorded on individual conduct records. The extent of collective punishment is only revealed when whole series are transcribed and linked by date and site of employment. This can also shed light on the ways in which different forms of action followed one upon the other. Convicts would often attempt to petition higher authority, before striking and then finally absconding when other attempts to address grievances had failed (Dunning & Maxwell-Stewart, 2002;Tuffin, Maxwell-Stewart, & Quinlan, 2020).

Figure 7 LiDAR scan showing site of quarry, spoil heap and tramlines, Tasman Peninsula
This article has described the diverse set of records that make up the Tasmanian Historical Dataset. In many ways the collection reflects a 19th-century Antipodean fascination with classification. Tasmania's convict past ensured that it developed as a society where much information was recorded -a necessary part of keeping the unfree in line. Yet, concerns about the long-term impact of penal transportation ensured high levels of record keeping long after the last convict vessel had arrived. It was also a place, however, that excited a great deal of scientific interest -a product of the way its fauna and flora seemed unusual to European eyes. The Royal Society of Tasmania, founded in 1843, was a particularly enthusiastic promotor of the systematic collection of information. Such classification exercises were rapidly extended to include social data. The early introduction of civil registration and attempts to hold a census on a regular basis reflect this legacy.
The overlapping nature of the resultant record systems and relatively small population sizes have encouraged more recent attempts to use digital techniques to reconstitute and reanalyse these series. This has been a somewhat unusual enterprise in that it has involved a collaboration between archivists, researchers from a variety of different academic backgrounds as well as many family historians and local history groups. It has also differed from some other longitudinal historical exercises in the diversity of the record collections that have been assembled and linked.
This has some disadvantages. Navigating the collection can be confusing. Different records were generated under different circumstances, all of which require careful documentation. It also has strengths, however, in that it enables users to see how individuals were represented in multiple different datasets. In this kind of complex data environment, even an absence can be informative. The ability to bring multiple series into alignment will help users to better understand the selection processes that shaped the development of each record series. This in turn should lead to improvements in archival catalogues, finding aids and interfaces. There is a technical difference between a record and a document, in that a record has a known context and transactional history (Sternfield, 2011). The creation of archival networks can transform a loose assembly of documents into a collection of records. In a digital research environment, this is not a static process. Rather than disassociating information from its archival context, each systematic interrogation of a digital archive can add to a record's transactional history. Thus, cross-interrogation of similar series can assist in reconstructing the sequence of individual record production and in turn aid an understanding of both content and context.

SUMMARY AND CLOSING OBSERVATIONS
Just as data creation has shaped so much of Tasmanias settler past, so the digitisation of that data is helping to shape its future. Once a mark of shame, Tasmania's convict era is now seen as something of a drawcard. Former penal stations and houses of correction have been repurposed as heritage sites and 19th-century houses built by convict labour have been turned into boutique accommodation. The ability to visualise the ways in which tens of thousands of life courses interacted, plot the route that convict voyages took, chart the impacts of solitary confinement and animate absconding patterns has a growing commercial utility. It is useful, for example, that an archival search can now recover more than links to individual records. Thus, users can also be provided with information about each place the subject of their search was sent to toil. The creation of contextualised finding aids may not be the reason why datasets were originally created, but they might provide a reason for investing in them in the future.