Documenting COVID-19 in University Archives

At the Archives and Rare Books Library, we recently began using Archive-It to preserve important university websites. The average life span of a webpage is between 44 and 100 days. Web pages are notoriously fragile documents, and many of the web resources we take for granted are at risk of disappearing.

As the COVID-19 pandemic unfolds, we are using Archive-It to capture various UC domain webpages dedicated to the pandemic’s impact on the university community. This kind of rapid response web archiving will ensure we preserve a historical record of this monumental event at UC for future researchers. You can currently view the UC COVID-19 website archive, which is being updated on a daily basis.

So far, we have collected several gigabytes of data, and over 20 websites, including each college’s COVID-19 page. Since some pages update more frequently than other, we schedule crawls (i.e. the process of archiving a webpage) of pages like https://www.uc.edu/publichealth.html on a more frequent basis in order to capture all of the changes.

The Archives and Rare Books Library is not the only archival repository documenting the experience of COVID-19. Dozens of other institutions, including many other Ohio college and university archives, are also collecting and preserving this fast-moving event. One of the largest COVID-19 collections so far is a collaboration between the International Internet Preservation Consortium and Archive-It, which has now collected more than 2,763 websites in 30 languages about the worldwide response to the pandemic.

There has been growing interest over the last several years in developing ethical frameworks around documenting crises within the archives profession. In response, the Society of American Archivists created a Tragedy Response Initiative Task Force that has developed a comprehensive set of guidelines based on archivists’ professional ethics and values. Previous examples of online archiving projects of crises and traumatic events include the September 11 Digital Archive, Hurricane Katrina Digital Memory Bank, and Documenting Ferguson. Given the global reach of COVID-19 and the advances in web archiving and digital projects, the pandemic is likely to become one of the most well-documented global events in recent history.

Would you like to suggest a website that we should include in our COVID-19 UC web archive? Please email us to suggest new UC sites to preserve in our COVID-19 web archives. Please note that at this time, we are currently only crawling public-facing webpages directly related to the UC community of students, faculty, staff, and alumni.

Preserving University Websites through Web Archiving

All of us have experienced clicking on a link and receiving an error, or 404 notice. Web pages are notoriously fragile documents, and many of the web resources we take for granted are at risk of disappearing. In one case study, archivists who were preserving the hashtags related to the Charlie Hebdo attacks in Paris found that just a few months later, between 7 and 10% of tweets had been deleted. The average life span of a webpage is between 44 and 100 days. And even if you think that we won’t really lose much in the long run if we don’t get every website of interest preserved – the issue of “link rot” is a big deal, as half of all URLs in Supreme Court opinion citations are now dead.

Web archiving overcomes these issues of obsolescence through thoughtful planning and curation of organizationally and historically valuable web content. Most web archiving today takes place through third-party services such as WebRecorder or Archive-It. To archive a website, you have to supply the URL of the website and give the web archiving service instructions about what you want to capture, and how many links below the main page you want the service to “crawl”. Web archiving is not simply “saving as a PDF” or taking a screenshot of a website. Because of the dynamic nature of most modern websites, their embedded media, interactive options, and rapidly changing nature, adequately capturing a website so that a user can interact with the archival file requires creating a WARC file. Web archiving services created files known as WARCs, which is a standardized file format for creating archival web content. Implementing web archiving services addresses several critical archives and records use cases.

A web archiving subscription service such as Archive-It offers both a preservation tool and collection development tool in one: the archivist can use the service to “crawl” a website in order to create a WARC file, and then the service also allows the archivist to present these resources to the public for research through the Internet Archive’s Archive-It website, which is currently used by over 60 ARL research libraries.

At the Archives and Rare Books Library, we have started using Archive-It to begin preserving important university websites. We’re just getting started, but so far we are prioritizing preserving the websites for the Board of Trustees, President, and Provost. All of these websites host important minutes, reports, documents, and other information that is important to retain for university archives. We are also capturing copies of “endangered” websites on the uc.edu – websites that may be going offline in the near future, but which have important university history embedded in them (you can see an example here).

Down the road, we’ll be expanding last year’s pilot project to collect websites from student organizations in order to fill in some of our archival gaps reflecting the student experience. You can see our pilot project collection here.

Do you have any ideas about important university websites that ought to be crawled? If so, contact Eira Tansey, ARB’s Digital Archivist via email at eira.tansey@uc.edu or 513-556-1958.

 

Behind the Scenes with UC’s Digital Archivist: Finding the Needle in the Haystack

By Eira Tansey, Digital Archivist/Records Manager

A constant challenge for digital archivists is identifying potentially sensitive material within born-digital archives. This content may be information that fits a known pattern (for example, a 3-2-4 number that likely indicates the presence of a social security number), or sensitive keywords that indicate the presence of a larger body of sensitive information (for example, the keywords “evaluation” and “candidate” in close proximity to each other may indicate the presence of an evaluation form for a possible job applicant).

Digital archivists use a number of tools to screen for potentially sensitive information. When this information is found, depending on the type of information, institutional policy, legal restrictions, and ethical issues, archivists may redact the information, destroy it, or limit access to it (either by user, or according to a certain period of time). Continue reading

Behind the Scenes with UC’s Digital Archivist: Making Sense of It All

By Eira Tansey, Digital Archivist/Records Manager

When archivists first make contact with a large group of records, they often perform some form of appraisal. You might think of appraisal as being the calling card of the much-loved PBS television show Antiques Roadshow, in which average people realize that Great Aunt Milly’s painting is a valued masterpiece – or a total dud.

Unlike appraisers, when archivists appraise something they generally aren’t assigning a monetary value, but seeking to articulate the value of the records and the information they contain. The Society of American Archivists defines (http://www2.archivists.org/glossary/terms/a/appraisal#.V2hA1jXERmM)  appraisal as:

  1. ~ 1. The process of identifying materials offered to an archives that have sufficient value to be accessioned. – 2. The process of determining the length of time records should be retained, based on legal requirements and on their current and potential usefulness. – 3. The process of determining the market value of an item; monetary appraisal.

Continue reading

Behind the Scenes with UC’s Digital Archivist: Much Ado About Digital

By Eira Tansey, Digital Archivist/Records Manager

Within the archives profession, “Digital Archivist” is one of the fastest-growing job titles (http://digitalcommons.kennesaw.edu/provenance/vol31/iss2/5/). The Society of American Archivists offers a Digital Archives Specialist curriculum and certificate (www2.archivists.org/prof-education/das).   And library and archives conferences abound on topics of an electronic and digital nature – like Saving The Web (https://www.loc.gov/loc/kluge/news/save-web-2016.html), the Digital Library Federation (https://www.diglib.org/), and the Software Preservation Network Forum (http://www.softwarepreservationnetwork.org/spn-forum/).

So what does a digital archivist do? Every digital archivist’s responsibilities will look slightly different depending on institutional mission, priorities and resources. As the first link indicates, there isn’t even professional consensus whether a digital archivist is one who works with digitization of analog material (like paper documents and manuscripts, rare books, maps, etc), or someone who works with “born-digital” materials. In many institutions, both of those responsibilities may be within the Digital Archivist’s charge. As UC’s Digital Archivist/Records Manager, my responsibilities center on working with born-digital archives, digital preservation, and overseeing UC’s Records Management program. I also work closely with my colleagues in Digital Collections on digitization projects (http://digital.libraries.uc.edu/).

Continue reading