Here today, gone tomorrow: A case study on the necessity for a more rigorous approach to the preservation of online Irish cultural and political heritage.

Presented at the ‘Making Ireland’ Research Theme: 2016 Conference Series –

Institutions and Ireland: Public Cultures

at Trinity College Dublin – Long Room Hub, on 27 October 2016.

Abstract:

Following on the heels of other western societies for a radicalisation of copyright in the digital age, Richard Bruton, who was the Minister for Jobs, Enterprise and Innovation at the time, established the Copyright Review Committee (CRC) in May 2011. While there were several tasks set for the CRC, one of its main functions was to examine the current state of national copyright legislation and to “identify any areas that are perceived to create barriers to innovation” (Modernising Copyright, 2013: 8). The CRC subsequently produced the report Modernising Copyright in October 2013 which offered modern solutions to Ireland’s outdated copyright laws. Yet, to date, the Irish government has failed to introduce up-to-date legislation based on the recommendations of the CRC report. This paper is concerned with the CRC recommendations for the introduction of digital legal deposit to current legal deposit institutions, and further to this, that such institutions should be permitted to “make copies of our online digital heritage by reproducing any work that is made available in the State through the internet” (Modernising Copyright, 2013: 14). It is our duty to ensure that future generations of Irish society have access to accurate and unimpeded accounts of their historical past. By means of a comprehensive analysis of link rot in current Irish government departmental websites; this paper presents a case study to demonstrate the necessity for a more rigorous approach to the preservation of online cultural and political heritage.

Conference Slides@ SlideShare

Here today, gone tomorrow: A case study on the necessity for a more rigorous approach to the preservation of online Irish cultural and political heritage. By S.C. Healy

Conference Talk:

This presentation is concerned with the concept of legal deposit in the digital age, and the need for up-to-date legislation in Ireland to ensure the current and future preservation of electronic publications and online cultural and political heritage. It will present two case studies on link rot analysis to support this claim.

slide2

The concept of Legal Deposit for print publications originated in France in 1537, with similar efforts being adopted in the UK (1610), Austria (1624) and by 1702, the concept was in operation in Poland, Sweden, Denmark and Finland (IPA, 2014). While the collection and safekeeping of a country’s publication were a core aim, in the earlier centuries legal deposit also played a role in controlling censorship, and to curb subversive publications against a government or monarchy (IPA, 2014).

Today, many countries in the world operate a legal deposit scheme, which in general mandates that those who produce print publications are legally required to deposit a copy in a nominated institution, often designated as a national library. In terms of national cultural heritage, legal deposit serves as a system to compile, maintain and provide access to a comprehensive collection and bibliographic record of a country’s output of print publications.

slide3

Due to developments in computer technology, information communications has changed dramatically over the past fifty years, and as a consequence, the concept of electronic publishing was realised. In the 1960s, electronic publications referred to the use of computers to produce print publications from using word processing, typesetting, or mark-up tools. In the 1970s, the first example of an electronic journal was distributed as “a computer readable archival file” and “in the form of computer-output microfiche” (Lancaster, 1995: 520). In the 1980s, experiments with internet journals emerged, and email technology allowed for the distribution of e-publications via mailing lists, though this was in plain text format (Lancaster, 1995: 520; Pettenati, 2001: 1). The development of the CD-ROM offered an effective, low-cost solution for e-publishing and allowed for good quality graphics and images (Pettenati, 2001: 1), but was slowly replaced as a medium due to the development of the World Wide Web.

Today, an electronic or digital publication can refer to:

  • an online version of a printed publication, a digitised version of a written or printed document, a CD-ROM version of a printed publication;
  • an original electronic or online publication where there is no print parallel such as webpages, blogs, e-zines, e-papers, e-newsletters, interactive CD-ROM;
  • digital scholarly editions, podcasts, interactive databases containing bibliographies, statistics, spatial data etc. (UNESCO, 2000; Taylor, 2013)

Since the 1990s, many western societies have undertaken a review of their copyright, heritage, and legal deposit laws, due to the developments in electronic/online publishing; and many have implemented reform to account for legal deposit in the digital age. Also, many of these countries have initiated national web archiving programmes in terms of election campaigns, thematic collections of online social and cultural heritage, and even  the web archiving of their national domain.

slide5

Legislation that directs the model for legal deposit in Ireland can be found in the Copyright and Related Rights Act, 2000 (CRRA), and by association in the National Cultural Institutions Act, 1997 (NCIA), and the Heritage Fund Act, 2001 (HFA).

In Section 198 (11) of the Copyright and Related Rights Act, 2000, there is a mention of electronic publications which allows for legal deposit institutions to receive electronic copies of books. However, there is no specified legislation in the current Copyright Act committed to the legal deposit of digital publications that are born-digital and the archiving of born-digital documents such as government websites.

slide6

Recognising the need for such measures, both Trinity College Dublin Library (TCD Library) and the National Library of Ireland (NLI), as nominated legal deposit libraries in Ireland, have instigated different schemes to accommodate the collection of electronic and born-digital publications.

TCD Library operates a voluntary electronic deposit scheme in Ireland through their resource edepositIreland. TCD Library is also a nominated legal deposit institution for the UK, and following the enactment of Legal Deposit Libraries (Non-Print Works) Regulations 2013 (UK), the library is entitled to request and collect UK-published e-journals, e-books, and other types of non-print publications such as websites and blogs. While the library does not “harvest or store non-print works directly”, it does, however, “provides access to all digital content via a technical infrastructure established by the UK national libraries” (TCD Library, 2014: 4; 7).

The NLI began a web archiving initiative in 2011 which coincided with the 2011 General Election. Since then, the NLI Web Archive has taken great strides to secure a voluntary thematic web archiving programme for the capture of Irish online social, cultural and political heritage, with recent efforts to capture websites for the 2016 commemorations.

slide7

In May 2011, Richard Bruton, who was the Minister for Jobs, Enterprise, and Innovation at the time, established the Copyright Review Committee (CRC) with a view to following the example set by other countries in Europe in updating their Copyright, Legal Deposit, and Heritage legislation to reflect the digital age we live in. The CRC subsequently produced the report Modernising Copyright in October 2013 which offered modern solutions to Ireland’s outdated copyright laws. Of particular interest are the CRC recommendations for the introduction of digital legal deposit to current legal deposit institutions, and further to this, that such institutions should be permitted to “make copies of our online digital heritage by reproducing any work that is made available in the State through the internet” (Modernising Copyright, 2013: 14). Yet, to date, the Irish government has failed to introduce up-to-date legislation based on the recommendations of the CRC report.

slide8

The rationale for this research came about by accident while reading the CRC Report, Modernising Copyright. I had an interest in a citation on page 80, which was referenced in the footnotes with a web link, associated with the website of the Department of Jobs, Enterprise and Innovation.

However, upon clicking on the link in Footnote-98, the link directed to a different URL:  http://www.djei.ie/press/2012/20121023.htm goes to  https://www.djei.ie/en/News-And-Events/

slide9

The redirect goes to the News and Events page on the website of the Department of Jobs, Enterprise and Innovation. Fortunately, the link indicates that the reference was related to a press item, and had a date of 23 October 2012.

As the current Department News page on the DJEI website provides a search feature and a category box with hyperlinks for years and months, it seemed relatively straightforward to find the item, and thus, the source of the citation and reference could be checked.

slide10

However, upon clicking on the hyperlink for October 2012, it reveals that there are no press items to display, and thus, no record of the referenced press release on page 80 of the CRC Report.

slide11

Furthermore — There are no press releases available from May to December in 2012 either, yet there are from January to April ?

Was there no DJEI department news for 8 months in 2012?

This brings me on to the research questions for the case studies

slide12

For the first case study, I was interested in establishing the occurrence of link rot in the CRC Report – Modernising Copyright, 2013, through the methods of manual link rot analysis and then assessing any findings that the analysis might show-up.

I chose this document, due to its relevance in guiding forthcoming Irish legislation for digital legal deposit, and also in terms of its recommendations that heritage institutions should be able to make copies of our online digital heritage. Indeed, my interest was to find out if the document was a victim of its own recommendations.

For the second case study, I was curious to see the occurrence of link rot through computational link rot analysis for the websites of the Department of Jobs, Enterprise and Innovation (DJEI) and the Department of Arts, Heritage, Regional, Rural and Gaeltacht Affairs (DAHRRGA).

slide13

Link rot is used as a term to indicate that a URL reference no longer provides direct access to a file or web page as originally indicated. The ‘HTTP 404 error’, or a ‘410 not found’ error are typical examples of this. Several studies on link rot have been conducted over the past 15 years, across different disciplines, from law to medicine. A seminal study by Lawrence et al. in 2001 proved to be foundational in the debate on whether formal research should include URL referencing due to issues with link rot. Indeed, link rot was one of the main reasons for the development of web archiving tools and techniques.

Link rot analysis can be conducted through the use of proprietary and open-source software and online tools, as well by a manual undertaking for checking broken links.

Pros and Cons: Software V Manual link rot analysis

Commercial programmes are often expensive, computationally intensive, with convoluted interfaces. With computational methods, while a link may be deemed stable there is no guarantee that the link directs to the actual page being sought.

Some free tools or trial versions of software:

  • set limits to the number of links that can be checked;
  • set limits to the number of pages that can be checked;
  • will only handle a check on a page by page basis.

Manual checking is:

  • more reliable, for small websites and for checking that a link directs to the page being sought;
  • extremely time-consuming and open to human error in recording information when analysing large websites.

 

CASE STUDY 1

slide15

For the first case study, I created an excel database to record the footnotes, reference links, stable and broken links, type of errors, redirect links, the country of origin of the broken link contained in Modernising Copyright. Each link was then manually tested in two browsers, being Microsoft Edge and Google Chrome in September 2016 and re-tested again 4 weeks later in October – (testing time 19/09/2016 – 20/10/2016).

slide16

In the Modernising Copyright document, there are 100 footnotes with 101 referenced links. From the analysis, the document was found to have 19 broken links, amounting to roughly 20% link rot, which is considerably high, for a three-year-old government publication.

In a further breakdown of the results, the origin of each web page with link rot was assessed.

slide17-1

Of interest here, is the high amount of broken links originating from Irish and UK domain web pages, and is of concern due to the relevance of the document in guiding forthcoming Irish legislation.

On a deeper breakdown of the broken links originating from a UK domain,

slide18

I found that 8 of those links are belonging to the website of the Gov. UK Intellectual Property Office, with one belonging to the UK Copyright Licensing Steering Group. While the one link for the Copyright Licensing Steering Group redirects to a different site altogether, the 8 links belonging to the Intellectual Property Office, have redirects to a page which informs the user that the website has changed, but the original web information is available in the UK Government Web Archive.

slide19

The UK Government Web Archive has been archiving government websites and born-digital documents since 2005. So, if a government agency decides to update, or re-design their website, the older version is obtainable from the archive, and the redirects provide information on where to find the item being sought.

On the other hand,slide20

On a deeper breakdown of the broken links originating from an Irish domain – I found that the broken links were from the following destinations:

  • seven broken links from the DJEI website (one of these links has a redirect to a resource where it can not be found);
  • one broken link from the Irish Statute Book online, and
  • one broken link from the Royal Irish Academy website.

slide21

When you click on the broken links from the Irish domains, we find a typical 404 error, a not found error, however one link from the Department of Jobs, Enterprise and Innovation redirects, and this is the press item from October, which I discussed earlier, and this cannot be found on their website.

Finally, as a matter of curiosity, I assessed how many of the broken links from the Irish domain could be found in the NLI Web Archive.

slide22

The broken links from the Irish Statute Book and Royal Irish Academy are not archived in the NLI Web Archive; 5 out of the 7 DJEI broken links are archived in the NLI Web Archive, and these are belonging to the Copyright Review Committee web page on the old DJEI website — someone at the NLI had the foresight to web archive these pages.

slide23

Conclusion: Link Rot Analysis of Modernising Copyright, 2013

  • 19 broken links – 20% link rot overall;
  • 9 broken links originate from an Irish domain, 9 broken links from a UK domain;
  • 8 of the Irish domain broken links have no redirects, 1 has a redirect but to a resource where it is not available;
  • 8 of the UK domain broken links have redirects to a page which indicates that the item has been archived and offers a link for access;
  • 5 of the Irish domain broken links are archived in the NLI- Web Archive, though for anyone who is unaware of this resource, they would not have the knowledge to try to find the items being sought.

This research demonstrates, that legal deposit institutions should be permitted to “make copies of our online digital heritage by reproducing any work that is made available in the State through the internet” (Modernising Copyright, 2013: 14). I would add, that there should be a mandate for government departments to archive a web page when they decide to delete or move it, and indeed government websites should be archived in their entirety when a government reshuffles departments and then needs to create a new website.

CASE STUDY 2slide25Xenu Link Sleuth is a free software programme created by Tilman Hausherr, a programmer in Berlin. It has a simple one-screen user interface, with a ribbon for all its options, it can crawl over 50,000 links, and takes up less space than commercial programmes. It also offers an option to open a broken link in the Wayback Machine which is hugely beneficial for researchers. However, it will not check links that are dynamically generated with javascript, although for the purpose of this research, this is not an issue.

slide26

Xenu recommends a crawl of 70 threads deep for a site of this size, however, this requires a very strong internet connection and is computationally intensive. As this was an independent (unfinanced) research study by an intern – resources were limited. Thus, I opted for a 25 thread depth internal and external crawl of the DJEI website to ensure completion of the crawl to achieve a result.

slide27From this crawl, Xenu reveals that there were 186 + 83 broken URLS (269 total) on the DJEI website from 24-25 October 2016slide28

Similarly, due to a lack of resources, I opted for a 25-30 thread depth crawl of the DAHRRGA website, but this time for internal links “ONLY”, for a varied look at link rot within an Irish government departmental site.

This result shows 25% link rot, from 24-25 October 2016.

For this researcher, 25% link rot in an Irish government departmental website in the “Digital Age”, is disconcerting, and thus, indicative of the necessity for a more rigorous approach to the preservation of online political heritage. Moreover, this further reinforces the CRC recommendation that Irish legal deposit institutions should be permitted to “make copies of our online digital heritage by reproducing any work that is made available in the State through the internet” (Modernising Copyright, 2013: 14).

Final note on Link Rot in Government websites

It has been widely documented, that government websites are prone to change, and even deletion following an election, due to a turnaround in government, and/or a reshuffling of government departments. As a result, many web archiving initiatives began their endeavours with election campaigns and proceeded to capture government websites, before and after an election. In Ireland, over the past 20 years, there has been a considerable amount of reshuffling of government departments concerned with the arts, heritage, and culture, and each reshuffle created a new website.

It is our duty to ensure that future generations of Irish society have access to accurate and unimpeded accounts of their historical past, and thus, move on from the notion of “Here Today – Gone Tomorrow” as is currently the case with some Irish government departmental websites.

Bibliography:

Software Credits:

Xenu’s Link Sleuth – Tilman Hausherr, Berlin, Germany  http://home.snafu.de/tilman/xenulink.html

Image Credits:

Royal Irish Academy “404” – Screenshot; NLI Web Archive – Screenshots; TCD Library E-deposit – Screenshot; Xenu’s Link Sleuth – Screenshots, DJEI website – Screenshots

Advertisements