{"id":39253,"date":"2020-02-11T10:49:06","date_gmt":"2020-02-11T14:49:06","guid":{"rendered":"http:\/\/libapps.libraries.uc.edu\/liblog\/?p=39253"},"modified":"2020-02-11T10:53:19","modified_gmt":"2020-02-11T14:53:19","slug":"preserving-university-websites-through-web-archiving","status":"publish","type":"post","link":"https:\/\/libapps.libraries.uc.edu\/liblog\/2020\/02\/preserving-university-websites-through-web-archiving\/","title":{"rendered":"Preserving University Websites through Web Archiving"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-39263 aligncenter\" src=\"https:\/\/libapps.libraries.uc.edu\/liblog\/wp-content\/uploads\/2020\/02\/LiBlog_WebArchiving_2020-02-11-850x478.jpg\" alt=\"Preserving University Websites through Web Archiving\" width=\"850\" height=\"478\" \/>All of us have experienced clicking on a link and receiving an error, or <a href=\"https:\/\/en.wikipedia.org\/wiki\/HTTP_404\">404 notice<\/a>. Web pages are notoriously fragile documents, and many of the web resources we take for granted are at risk of disappearing. In one case study, archivists who were preserving the hashtags related to the Charlie Hebdo attacks in Paris found that just a few months later, <a href=\"https:\/\/inkdroid.org\/2015\/04\/14\/tweets-and-deletes\/\">between 7 and 10% of tweets<\/a> had been deleted. The average life span of a webpage is between <a href=\"https:\/\/blog.archive.org\/2013\/10\/25\/fixing-broken-links\/\">44 and 100 days<\/a>. And even if you think that we won\u2019t really lose much in the long run if we don\u2019t get every website of interest preserved \u2013 the issue of \u201clink rot\u201d is a big deal, as <a href=\"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/15A59548BF9882B06D3064DA7E290859\/S1472669614000255a.pdf\/div-class-title-perma-scoping-and-addressing-the-problem-of-link-and-reference-rot-in-legal-citations-div.pdf\">half of all URLs in Supreme Court opinion citations<\/a> are now dead.<\/p>\n<p>Web archiving overcomes these issues of obsolescence through thoughtful planning and curation of organizationally and historically valuable web content. Most web archiving today takes place through third-party services such as WebRecorder or Archive-It. To archive a website, you have to supply the URL of the website and give the web archiving service instructions about what you want to capture, and how many links below the main page you want the service to \u201ccrawl\u201d. Web archiving is not simply \u201csaving as a PDF\u201d or taking a screenshot of a website. Because of the dynamic nature of most modern websites, their embedded media, interactive options, and rapidly changing nature, adequately capturing a website so that a user can interact with the archival file requires creating a WARC file. Web archiving services created files known as WARCs, which is a standardized file format for creating archival web content. Implementing web archiving services addresses several critical archives and records use cases.<\/p>\n<p>A web archiving subscription service such as Archive-It offers both a preservation tool and collection development tool in one: the archivist can use the service to \u201ccrawl\u201d a website in order to create a WARC file, and then the service also allows the archivist to present these resources to the public for research through the Internet Archive\u2019s Archive-It website, which is currently used by over 60 ARL research libraries.<\/p>\n<p>At the Archives and Rare Books Library, we have started using Archive-It to begin preserving <a href=\"https:\/\/archive-it.org\/collections\/13197\">important university websites<\/a>. We\u2019re just getting started, but so far we are prioritizing preserving the websites for the Board of Trustees, President, and Provost. All of these websites host important minutes, reports, documents, and other information that is important to retain for university archives. We are also capturing copies of \u201cendangered\u201d websites on the uc.edu \u2013 websites that may be going offline in the near future, but which have important university history embedded in them (you can see an <a href=\"https:\/\/wayback.archive-it.org\/13197\/20191217171952\/http:\/\/homepages.uc.edu\/~huffwd\/Department_History\/History.html\">example here)<\/a>.<\/p>\n<p>Down the road, we\u2019ll be expanding <a href=\"https:\/\/webrecorder.io\/uc_arb\">last year\u2019s pilot project<\/a> to collect websites from student organizations in order to fill in some of our archival gaps reflecting the student experience. You can see our pilot project collection here.<\/p>\n<p>Do you have any ideas about important university websites that ought to be crawled? If so, contact Eira Tansey, ARB\u2019s Digital Archivist via email at eira.tansey@uc.edu or 513-556-1958.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>All of us have experienced clicking on a link and receiving an error, or 404 notice. Web pages are notoriously fragile documents, and many of the web resources we take for granted are at risk of disappearing. In one case &hellip; <a href=\"https:\/\/libapps.libraries.uc.edu\/liblog\/2020\/02\/preserving-university-websites-through-web-archiving\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[1180],"class_list":["post-39253","post","type-post","status-publish","format-standard","hentry","category-arb","tag-digital-archives"],"_links":{"self":[{"href":"https:\/\/libapps.libraries.uc.edu\/liblog\/wp-json\/wp\/v2\/posts\/39253","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/libapps.libraries.uc.edu\/liblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/libapps.libraries.uc.edu\/liblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/libapps.libraries.uc.edu\/liblog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/libapps.libraries.uc.edu\/liblog\/wp-json\/wp\/v2\/comments?post=39253"}],"version-history":[{"count":0,"href":"https:\/\/libapps.libraries.uc.edu\/liblog\/wp-json\/wp\/v2\/posts\/39253\/revisions"}],"wp:attachment":[{"href":"https:\/\/libapps.libraries.uc.edu\/liblog\/wp-json\/wp\/v2\/media?parent=39253"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/libapps.libraries.uc.edu\/liblog\/wp-json\/wp\/v2\/categories?post=39253"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/libapps.libraries.uc.edu\/liblog\/wp-json\/wp\/v2\/tags?post=39253"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}