Report on the National Archives of Ireland Census Online Project: Part 2, 1901 Census
In or around 2005 the National Archives of Ireland embarked on a project to make the 1901 and 1911 censuses available online without any charge to the user (http://www.census.nationalarchives.ie). The project is of particular value to genealogists and historians and was originally planned to take three years to complete (Irish Times, 7 December 2005, page 3). The Archives secured partnership assistance from Library and Archives Canada and special funding was provided by the Department of Arts, Sport and Tourism, which to date totals €4 million. In a manner which has not been sufficiently documented, Library and Archives Canada subcontracted much of the census project work to AEL Data of India, and there are no details available to show how much each body charged for their services (FOI request only partially completed and now subject to appeal). MORMON
In December 2007 the 1911 Census for Dublin City and County was made available online, other counties followed in the months following and by August 2009 all 32 counties of the 1911 Census were online (see the writer's Part 1 Report). On 3 June 2010 Culture, Sport and Tourism (formerly Arts, Sport and Tourism) Minister Mary Hanafin presided at the online launch of the 1901 Census returns for the 32 counties in the National Archives. As the writer is not on the Archives's invitation list, he picked up particulars from media reports, which were almost uniformly positive. While a party is not the time to be negative, a piece in the Irish Times of 11 June by Karlin Lillington was particularly effusive, observing that if the National Archives building 'had been struck by a meteor at the time of the launch, I'd say half of Ireland's historians would have been wiped out in a single blow'. Lillington claimed that 'many national archives around the world make census information available but only to a limited audience', and one 'historian friend' was quoted as stating that 'this is the case with the British census' (http://www.irishtimes.com/newspaper/finance/2010/0611/1224272270141.html). This account is a travesty of the mode in which census returns are made available online in Britain and the United States, commercially via sites such as Ancestry.com and FindMyPast.com, but also freely via sites such as the Mormon FamilySearch.org. Perhaps archival staff in other jurisdictions consider it wiser to concentrate on their core duties, such as ensuring that archives are properly stored, catalogued and made available for public consultation, rather than embarking on costly 'special projects' which even with resources much greater than that of Ireland they consider they can ill afford. Here is how the United States National Archives recently prefaced an announcement of substantial digitisation projects completed by its partners:
Our digitization partners, Ancestry.com and Footnote.com, have digitized selected NARA microfilm publications and original records and made them available on their web sites for a fee. Each partner allows free searches of some or all index terms for each title. Access to Ancestry.com and Footnote.com is available free of charge in all NARA Research Rooms, including those in our regional archives and Presidential libraries (http://www.archives.gov/digitization/digitized-by-partners.html, accessed 28 June 2010).
While most media reports have not questioned why we have not followed this model in Ireland, a recent RTE report has indicated that the National Archives of Ireland's approach to digitisation actually conceals a problem:
But the popularity of online archives is hiding a problem - storage. While the Taoiseach said he is satisfied with the housing of such records, there have been warnings elsewhere of a crisis for Ireland's archives. Countless records are being stored in a warehouse and away from public view because of a lack of suitable space at the National Archives. Some of the records in the facility in Bishop Street in Dublin are hundreds of years old and others are more recent. All of them though provide a glimpse into the past and an insight into how the country we now live in was shaped. But the National Archives has run out of suitable space to properly stores its records. This means it cannot take in all the Government documents its obliged to under the 30-year rule. Experts say proper archive storage should be in good sized rooms where conditions can be easily controlled. Many of the records are wrapped in waterproofing because of concerns over the roof. Minister Mary Hanafin, who is responsble for the issue, says that the Office of Public Works is in the process of converting the big warehouse into smaller compartments as there is no possibility of a new building due to financial constraints. ('Taoiseach says archives are safe', RTE news report 1 June 2010, http://www.rte.ie/news/2010/0601/archives.html, accessed 6 June 2010).Of course the costly census digitisation project does more than merely hide storage and other problems in the Archives, as it has effectively siphoned off a large amount of now scarce funds which might have been better allocated to alleviating said problems. Some details of the strain the census project is causing within the National Archives are illustrated in the partial FOI releases received to date. For example, in June 2008 Archives Director David Craig stated in an e-mail that Catriona Crowe, the senior staff member in charge, was 'seeking additional staff time to assist in the work of the Census Project', particularly to deal with 'a large number of corrections' in the database, and that a staff member could be taken off a less important Soldiers' Wills project. (note the phrase 'a large number of corrections', which contradicts assurances of accuracy given to the present writer). Another senior staff member replied to the Director that there was 'no spare capacity among our core clerical/administrative staff to offer any assistance to the Census project', going on to state that he was not being advised of project expenditure, 'to whom or when payable', which he needed to know, 'particularly in the current chilly climate of cutbacks' (NAI FOI release).
How indeed could spending millions of public funds on digitising census records be justified when money has not been available to store original records securely in the National Archives, and particularly when other agencies are willing to undertake digitisation at no cost to the state? A draft agreement between the Genealogical Society of Utah (operated by the Church of Jesus Christ of Latter-Day Saints or Mormons) and the National Archives contains the remarkable information that the former was 'indexing the 1901 census, and expect to have it completed by the end of 2007' (NAI FOI release). Why then did the National Archives insist on duplicating this work and did it block or delay the release of the GSU work? It is admittedly hard to explain to those who are enthusing about the National Archives's census project that as well as being expensive, behind schedule and with a high indexing error rate, it has actually had the effect of slowing the progress of digitisation of Irish records. Grandiose and expensive projects were a feature of the Celtic Tiger years which have brought the country close to bankruptcy, and it would appear that the National Archives was not immune to this tendency. Rather late in the day, the National Archives may have recognised the folly of insisting on managing digitisation projects itself, as it understood that it has now signed agreements with Ancestry.com, the Latter-Day Saints, Eneclann and possibly other agencies permitting them to undertake digitisation projects for which permission to proceed was formerly delayed or withheld. This information is subject to confirmation, as the Director of the National Archives has not issued a statutory report since 2006 and the repository is failing to complete the writer's FOI requests.
Given the indexing errors which were a feature of the 1911 project and which have been analysed in the Part 1 Report, the writer has carried out further sampling of the 1901 Census online to establish if the standard of accuracy has improved. There follow the results of analysis of samples of the 1901 Census returns for Dublin, Kerry and Wicklow, which again is confined to surnames only, specifying location, number of returns misread compared to total returns, and incorrect and correct versions of surnames. Multiple mistakes in a return for a family are counted as one error only. While O and Mc surnames are still presented irregularly in the database, with and without inverted commas and/or spaces after the prefixes, a new element on the National Archives census site enables most of these kind of entries to be picked up, and so although a highly unsatisfactory feature they were not counted as errors when encountered in the samples, as they were in the case of the report on the 1911 Census. While the new search element is welcome, it is not a full soundex or variant search facility, the lack of which in the National Archives project remains very difficult to understand given its presence in most online genealogical websites such as Ancestry.com, FamilySearch.org and Origins.com.
It should be noted that the errors listed for Lower Dominick Street below were identified with the assistance of a portion of the 1901 Census on the subscription site IrishOrigins.com, which resulted in some additional errors being discovered, including duplication of entries for two families (Ball and Norris), incorrect numbering of houses for some families (Audibert, Litter, Smith, Byrne, Merlin), none of which were counted as errors here. It may be of interest to provide fuller particulars of the two Number 18 Lower Dominick Street families omitted: Anne McElroy 46, daughter Kate Jennings 21; Michael Carbery 50, wife Mary 44, children Mary 16, Josephine 4, with relatives and boarders. In fairness, it should be stated that some transcription errors were also discovered in IrishOrigins.com's work which are being brought to the attention of the appropriate parties.
(A) 1901 Census Dublin: Location, returns misread/ approximate total returns: incorrect/correct versions of surnames.
Abbatoir 0/1: -
Aberdeen Street 3/43: Sarage/Savage, Qveres/Iveres, Mussell/Russell.
Anna Villa 2/28: Berd/Bird, Hannor/Hannon.
Arbour Hill 4/53: Linchan/Linehan, McCuley/McAuley, Danne/Dunne, O' Reill/O'Neill.
Ardgillan Demesne, Balrothery, Blackhall, Bohill 0/41: -
Bridge Lane, Bridge Street, Chapel Lane, Clonard Street (Balbriggan) 0/111: -
Ballyboghill, Barnanstown, Clonswords 0/8: -
Bellinstown 1/5: Carly/Early.
Ballinclea, Ballyogan 0/4: -
Brennanstown 3/28: Weniman/Wiseman, I Timoney/Timoney, Bim/Pim.
Cabinteely Town 0/28: -
Balgaddy 0/3: -
Balscadden 2/28: Dardes/Dardis, Largan/Langan.
Dermotstown, Grange 0/14: -
Abbey Avenue, Alma Place (Blackrock) 0/17: -
Alma Road (Blackrock) 1/25: Piere/Pim.
Anglesea Avenue (Blackrock) 1/18: Mauley/Manley.
Ballycoolan, Bay, Belgree 0/17: -
Blanchardstown 3/47: Fannelly/Fennelly, McAler/Meaher, Carcoran/Corcoran.
Annfield 0/1: -
Ashtown (part) 1/36: Mannian/Mannion.
Astagob North 1/37 Seully/Scully.
Carpenterstown 0/15: -
Lower Dominick Street 6/156: Carrall/Carroll, Frances/O'Connor, Kevins/Kerins, Garnon/Gannon, -/McElroy, -/Carbery (last two families omitted entirely).
Ardilaun Terrace 0/8: -
Bath Lane 1/5: McAuley McGally/McAuley.
Belvedere Avenue 1/45: Inkerton/Torkenton.
Belvedere Lane 0/5: -
Belvedere Place 4/53: Joyer/Joyce, Kannan/Hannon + O'Kara/O'Hara, Calclough/Colclough + Kogan/Hogan + Lyans/Lyons, Daney/Darcy.
Belvedere Road 1/45: McKeanon/McKeown.
Alexandra Terrace 0/5: -
Arbutus Place 1/22: Glasset/Glasser.
Arnott Street 2/46: Relly/Kelly, Burgees/Burgess.
Arthurs Lane 1/36: I Eane/Keane.
Finding: A total of 39 out of a sample of 1,006 returns of the 1901 Census for Dublin (total individual entries 439,915) were found to have errors and omissions in the online database, which represents a sample error rate of 3.88%.
(B) 1901 Census Kerry: Location, returns misread/ approximate total returns: incorrect/correct versions of surnames.
Abbeydorney 1/34: Caffey/Coffey.
Aulanebane, Aulaneduff, Ballysheen 0/15: -
Barleumount (recte Barleymount) East, Barleymount Middle, Barleymount West 0/22: -
Crohane 1/9: Butten/Butler.
Aglish 0/7: -
Ballytrasna 1/5: Leynes/Lyne.
Coolbane, Coolree East 0/14.
Ahane 1/6: Doneen/Dineen.
Arabela 0/2: -
Ash-hill 1/12: Prindble/Prenderville.
Ballyaukeen 0/5: -
Ardabawn, Caherfealane, Cloonearagh, Corkaboy 0/44: -
Ardagh 0/26: -
Ardoughter 1/62: Mellihan/Rellihan.
Clasmelcon 2/40: Hallison/Halleron, Hallovan/Halloran.
Cloghane 1/43: McCorthy/McCarthy.
Abbey Street Lower (Tralee) 3/49: Guerine/Guerin, Lucia/Lucid, Denson/Dineen.
Abbey Street Upper 1/41: Rounlan/Quinlan.
Back Lane 0/2: -
Ballyard (part) 0/1: -
Ballymullen 1/41: Toggin/Goggin.
Finding: A total of 14 out of a sample of 506 returns of the 1901 Census for Kerry (total individual entries 165,940) were found to have errors and omissions in the online database, which represents a sample error rate of 2.77%.
(C) 1901 Census Wicklow: Location, returns misread/ approximate total returns: incorrect/correct versions of surnames.
Aghowle Lower 2/20: Shyhes/Hughes, Whilan/Whelan.
Aghowle Upper 0/12: -
Barnacastle 0/1: -
Boley 1/14: Fyllan/Tynan + Hauly/Hanly.
Altidore 0/3: -
Ballinasottia 1/1: Willaghley/Willoughby + Willoughley/Willoughby.
Carrigower 1/13: Mc Cane/McCaul.
Drumbawn 1/4: Garoletto/Gavoletto.
Askintinny, Ballinabanoge, Ballinagore 0/18: -
Ballinasilloge 1/10: Buttler/Butler (deleted).
Abbey Lands, Abbey Lane (Arklow) 0/13: -
Abbey Street 1/19: Ivery/Ivory.
Back Street 2/44: Raearon/Kearon, Rearon/Kearon.
Aughrim Lower 1/18: Merragh/Mernagh.
Aughrim Town, Aughrim Upper, Balleeshall 0/59: -
Bahana (King), Bahana (Whaley), Ballard, Ballinaclash 0/30: -
Aghavannah (Ram) 1/9: Turtin/Turpin.
Ballinacor, Ballinagappoge, Ballinanty 0/22: -
Abbeyview Terrace (Bray) 0/1: -
Ardee Street 1/21: Haydin/Hayden.
Back Street 6/61: Bunnan/Brennan + Mc Evay/McEvoy, Copper/Cooper, Darey/Darcy, Dayle/Doyle, William/Keenan, Micheel/Douse.
Bray Commons, Bray Commons Road, Bruces Place, Bruces Terrace, Carrs Cottages 0/32: -
Castle Street 1/57: Cliften/Clifton.
Finding: A total of 21 out of a sample of 501 returns of the 1901 Census for Wicklow (total individual entries 59,906) were found to have errors and omissions in the online database, which represents a sample error rate of 4.19%.
As noted above, much of the work of preparing the 1901 and 1911 Census returns has been subcontracted by Library and Archives Canada to AEL Data of India, which firm has a substantial number of Irish clients. On its website AEL Data commits itself to a remarkably high standard of indexing accuracy, namely, at least 99.9% for transcription as opposed to OCR work, defining an error as 'inclusion or omission of a space, inclusion or omission of a character or a comma shown as a full stop, etc', so that a 'five character word omitted by the keyboard process counts as five errors' (http://www.aeldata.com/digitization/indexing_data_capture.html, accessed 14 June 2010). If this superhuman standard of accuracy was applied in evaluating the samples above, which were confined to mistakes in transcribing surnames, the error rates of course would be a multiple of those listed by the writer. However, it must be stressed that the National Archives has not provided the writer with enough information to state definitively whether AEL Data as opposed to Library and Archives Canada is responsible for the particular work analysed above (the latter firm but not the former is still listed among the project partners).
The present writer has made a not entirely unserious comparison betwee the claimed accuracy rates for the National Archives census project and the official adult literacy rate in Kazakhstan, which is said to stand at 99.8% (http://www.unicef.org/infobycountry/kazakhstan_background.html, accessed 14 June 2010). With respect, a plain and simple listing of incorrect and incorrect sample entries as above is a transparent and reliable way to determine error rates. While the sample sizes are admittedly small, reflecting the fact that the writer is working in an unpaid and voluntary capacity, the fact that the error rates for the three counties chosen, Dublin, Kerry and Wicklow, are in the range 3-4% would indicate that they are statistically plausible, and they certainly give the lie to any attempt to put forward an error rate of less than 1%. To be fair, even allowing for the exclusion of 'O' and 'Mac' problems, the sample error rate in the 1901 Census online appears to be somewhat lower than in the 1911 Census, so that perhaps something has been learned and transcription and indexing methods improved. However, it must be stated again that in digitisation projects of this kind an error rate in excess of 1% is unacceptable, and the sample error rates of approximately 3-4% found in our survey here give cause for concern. It may be of interest to note that a highly critical report by the Comptroller and Auditor General found in 1996 that 3.7% of a sample of the Irish Genealogical Project's database records contained mistakes, 'over eleven times the rate permitted under the quality control procedures set for the project' (http://audgen.irlgov.ie/viewdoc.asp?DocID=546&StartDate=1+January+2007, accessed 28 June 2010). While he stands over the quality of his work, which has been voluntary and has occupied hundreds of hours of his time over the past few years, the writer admits that by himself he cannot undertake a rigorous analysis of the quality and value for money of the National Archives online census project. As the National Archives now ignores correspondence and is apparently incapable of completing FOI requests, the present report is being copied to the Comptroller and Auditor General and the Public Accounts Committtee for attention, and any responses will be noted here in due course.
In summary, the conclusions of the present report are as follows:
(1) The respective roles of and the sums of money paid to Library and Archives Canada and AEL Data in respect of the National Archives online census project should be clarified.
(2) The planning, budget and management of the National Archives online census project should be examined, with particular reference to the fact that it took five rather than three years to complete as first announced.
(3) There is a need for the National Archives to explain why it insisted on proceeding with its own 1901 Census digitisation project when the Genealogical Society of Utah revealed that it was completing this work at no cost to the state.
(4) There is also a need to explain why a costly €4 million digitisation project should have been undertaken when the National Archives was suffering from space and resource problems as described above.
(5) The National Archives should not itself manage major digitisation projects in the future but should licence them with specific conditions to commerical firms and voluntary groups, as is the general practice in countries such as the United States and Great Britain.
(6) The true error rate of the National Archives census digitisation project - certainly well in excess of 1% as has been demonstrated above - should be established and procedures put in train to correct misreadings of records.
Sean J Murphy MA
Centre for Irish Genealogical and Historical Studies
28 June 2010