Documenting COVID-19 in University Archives

At the Archives and Rare Books Library, we recently began using Archive-It to preserve important university websites. The average life span of a webpage is between 44 and 100 days. Web pages are notoriously fragile documents, and many of the web resources we take for granted are at risk of disappearing.

As the COVID-19 pandemic unfolds, we are using Archive-It to capture various UC domain webpages dedicated to the pandemic’s impact on the university community. This kind of rapid response web archiving will ensure we preserve a historical record of this monumental event at UC for future researchers. You can currently view the UC COVID-19 website archive, which is being updated on a daily basis.

So far, we have collected several gigabytes of data, and over 20 websites, including each college’s COVID-19 page. Since some pages update more frequently than other, we schedule crawls (i.e. the process of archiving a webpage) of pages like https://www.uc.edu/publichealth.html on a more frequent basis in order to capture all of the changes.

The Archives and Rare Books Library is not the only archival repository documenting the experience of COVID-19. Dozens of other institutions, including many other Ohio college and university archives, are also collecting and preserving this fast-moving event. One of the largest COVID-19 collections so far is a collaboration between the International Internet Preservation Consortium and Archive-It, which has now collected more than 2,763 websites in 30 languages about the worldwide response to the pandemic.

There has been growing interest over the last several years in developing ethical frameworks around documenting crises within the archives profession. In response, the Society of American Archivists created a Tragedy Response Initiative Task Force that has developed a comprehensive set of guidelines based on archivists’ professional ethics and values. Previous examples of online archiving projects of crises and traumatic events include the September 11 Digital Archive, Hurricane Katrina Digital Memory Bank, and Documenting Ferguson. Given the global reach of COVID-19 and the advances in web archiving and digital projects, the pandemic is likely to become one of the most well-documented global events in recent history.

Would you like to suggest a website that we should include in our COVID-19 UC web archive? Please email us to suggest new UC sites to preserve in our COVID-19 web archives. Please note that at this time, we are currently only crawling public-facing webpages directly related to the UC community of students, faculty, staff, and alumni.

Preserving University Websites through Web Archiving

All of us have experienced clicking on a link and receiving an error, or 404 notice. Web pages are notoriously fragile documents, and many of the web resources we take for granted are at risk of disappearing. In one case study, archivists who were preserving the hashtags related to the Charlie Hebdo attacks in Paris found that just a few months later, between 7 and 10% of tweets had been deleted. The average life span of a webpage is between 44 and 100 days. And even if you think that we won’t really lose much in the long run if we don’t get every website of interest preserved – the issue of “link rot” is a big deal, as half of all URLs in Supreme Court opinion citations are now dead.

Web archiving overcomes these issues of obsolescence through thoughtful planning and curation of organizationally and historically valuable web content. Most web archiving today takes place through third-party services such as WebRecorder or Archive-It. To archive a website, you have to supply the URL of the website and give the web archiving service instructions about what you want to capture, and how many links below the main page you want the service to “crawl”. Web archiving is not simply “saving as a PDF” or taking a screenshot of a website. Because of the dynamic nature of most modern websites, their embedded media, interactive options, and rapidly changing nature, adequately capturing a website so that a user can interact with the archival file requires creating a WARC file. Web archiving services created files known as WARCs, which is a standardized file format for creating archival web content. Implementing web archiving services addresses several critical archives and records use cases.

A web archiving subscription service such as Archive-It offers both a preservation tool and collection development tool in one: the archivist can use the service to “crawl” a website in order to create a WARC file, and then the service also allows the archivist to present these resources to the public for research through the Internet Archive’s Archive-It website, which is currently used by over 60 ARL research libraries.

At the Archives and Rare Books Library, we have started using Archive-It to begin preserving important university websites. We’re just getting started, but so far we are prioritizing preserving the websites for the Board of Trustees, President, and Provost. All of these websites host important minutes, reports, documents, and other information that is important to retain for university archives. We are also capturing copies of “endangered” websites on the uc.edu – websites that may be going offline in the near future, but which have important university history embedded in them (you can see an example here).

Down the road, we’ll be expanding last year’s pilot project to collect websites from student organizations in order to fill in some of our archival gaps reflecting the student experience. You can see our pilot project collection here.

Do you have any ideas about important university websites that ought to be crawled? If so, contact Eira Tansey, ARB’s Digital Archivist via email at eira.tansey@uc.edu or 513-556-1958.

 

A needle in a PDF haystack

One of my research areas is examining the role of recordkeeping and documentation in environmental regulations. A research tactic I frequently use is to sift through hundreds of PDFs at once. Large numbers of PDFs are posted on many environmental regulatory websites, but there isn’t a lot of information about what’s in them or how big they are, or where the juicy stuff is. If this sounds daunting – well, it is! But I’ve come up with a few tricks to help me sort out what is useful and what isn’t.

Step 1: Download

The first thing is figuring out how to download a zillion PDFs at once. For this, I recommend getting a bulk downloader add-on for your browser (here is an example). This will scan through every URL on a page that ends with .pdf, which indicates that the URL is likely a downloadable PDF. A bulk downloader prevents the need from clicking on every single link, which can be a lot when you’re on a page with dozens of PDFs like this one from the Mine Safety and Health Administration.

Step 2: Text Recognition

Once you download all the files you want, I like to place them into a dedicated folder. This is because even though theoretically most government agency PDFs should have optical character recognition (OCR), the actual practice is very inconsistent. OCRed text is critical for PDF searching because it allows you to do a keyword search within a single file or across multiple files at once. Currently, there is not widely-available OCR functionality for cursive or handwriting, just typeface.

Adobe Acrobat has a useful function (under Tools > Text Recognition > In Multiple Files) where you can run the OCR function across everything in a specific folder. This can take a while, but at least there is a progress bar that’ll show you how long it takes – which could be a while, considering that many government environmental regulation records can be a hundred pages for each file. Using the Adobe Acrobat tool also allows you to keep or modify original file names. I like to downsample the files to 600 dpi – it takes longer than the lower dpi measures, but I think it enables better keyword searches later on.

 

 

 

 

 

Step 3: Dig into the PDF files!

There is a free software program called PDF-XChange Viewer, which you can use to do keyword searches over large amounts of PDFs. You can also run a similar search within Adobe Acrobat, but I find that the searching takes far longer and the results are presented less tidily than with PDF-XChange Viewer. Supposedly you can also run batch OCR with this program, but I haven’t tried it.

The example demonstrates how I wanted to find PDFs from coal mine inspection safety reports that mention the word “map.” The results show me that of the dozens of documents I searched in my dedicated folder, there were 187 hits for the word “map” across 14 of the documents. I can get an idea from the left-hand preview pane what the keyword is like in context, and then click on that to see the actual PDF in the right-hand window.

This step helps me pull out the PDFs that I need to analyze more deeply, thus saving me the headache of opening up every single PDF in case it might have something of interest.

The New Deal in the Archives

Survey of Federal Archives project (Ohio History Connection)

With the Green New Deal in the news, there is renewed interest in President Franklin Roosevelt’s original New Deal. The New Deal was a series of federal programs meant to combat the major issues of the Depression, including soil loss, stabilizing the banking system, and providing jobs and relief for the unemployed. One of the famous programs of the New Deal was the Works Progress Administration, popularly known as the WPA. The WPA was responsible for constructing many of the significant buildings, roads, and infrastructure that the American public still uses. But the WPA also employed many writers, musicians, playwrights, artists, historians, clerks, and other unemployed white-collar professionals.

One of the most comprehensive, but least known programs to come out of the WPA was the Historical Records Survey. Originally envisioned by American archivist TR Schellenberg, it was expanded into a workable program by Luther Evans, who would go on to become the Librarian of Congress and UNESCO director. The Historical Records Survey had two major programs: a survey of federal records located in offices outside of the Washington DC area, and a survey of state and local records. During its initial operation, the Historical Records Survey was part of the Federal Writers Project, which was known for producing travel guides for 48 states and many large cities, as well as compiling narratives of ex-slaves.

The Historical Records Survey lasted between 1935 and 1943. Its largest achievement was surveying county records – of the 3,066 counties in existence at the time of the survey, fieldwork was completed for 90% of them. WPA workers carried out the field work by going to county courts and administrative agencies to determine what kinds of records existed, where they were located, and a short description of the records. The field work also generated significant information about the history of the states and their counties. In some areas, municipal records surveys were also completed, such as for the city of Cleveland. Although there had been some attempts to survey America’s local and state records before (mainly through the efforts of the American Historical Association’s Public Archives Commission), the WPA Historical Records Survey was a significant advance in trying to establish a comprehensive picture of the overall condition of America’s public records.

Unfortunately like many worthy archival projects, the Historical Records Survey had some significant setbacks. Only 20% of county inventories were published. At least 27 county inventories were published in the state of Ohio, including for major counties such as Cuyahoga, Franklin, Hamilton, Lucas, and Summit. In the guide to the Hamilton County records, the introduction stated there would be a guide issued for every county. Perhaps the suspension of the Historical Records Survey in 1943 ended the publication of these guides. The remaining records from the Historical Records Survey of Ohio can be found at the Ohio History Connection and the Western Reserve Historical Society.

In fact, the fate of the multitude of records generated by the field workers of Historical Records Survey across the country varied wildly. Many of the records ended up in universities and state and local historical societies. But in some cases, they did not and in fact met a fairly tragic fate – when archivist and National Archives employee Leonard Rapport went searching for Maine’s Historical Records Survey field records in the 1970s, he eventually found that they had been dumped into a bay.

If you would like to learn more about the Historical Records Survey, I recommend Loretta Hefner’s 1980 guide to the unpublished records of the Historical Records Survey, and Sargent Child and Dorothy Holmes’ inventory of publications associated with the Survey (Z1223.Z7 C52, ARB Reference). For additional study, Marguerite Bloxom’s guide Pickaxe and Pencil contains an extensive bibliography arranged by WPA program.

Blockchain and Ohio law

 

Blockchain by Frühstück from the Noun Project

In my capacity as the University’s Records Manager, I’m on a statewide group called the Ohio Electronic Records Committee (Ohio ERC). Ohio ERC consist of professionals from Ohio’s public entities (including archivists, record managers, IT professionals, lawyers) who have an interest in electronic records. We meet quarterly, and produce resources of interest to other public employees, such as best practices tip sheets based on Ohio-specific concerns and annual workshops. It’s a great way to stay up to date with what’s happening within state government, since what is decided in Columbus can impact records management at UC.

At our last meeting, the topic of blockchain in state government came up. It turns out that there was legislation in the last General Assembly concerning blockchain. You can see information about the bill here. Blockchain is a distributed digital ledger system that is protected through cryptographic measures, and which records all changes, transactions, and modifications to the file or object in question. Blockchain’s most famous implementation is the cryptocurrency, Bitcoin. While there is a lot of tech futurist excitement about blockchain, many others caution blockchain suffers from a lack of uniform standards, and others criticize the technology’s voracious energy usage. The reason blockchain is associated with high levels of energy use is because significant computing resources are used to generate its cryptographic verification. As a result, “bitcoin mining” tends to take place in areas with the cheapest electricity. For some time, this included places with extremely cheap coal-generated electricity like China, but this may be changing as renewable sources of cheap power come online.
During the meeting, we took a look at the full text of the bill (SB 300). Something that jumped out to many of us in the room was the definition of blockchain. The bill defined it in the following manner:
“Blockchain technology” means distributed ledger technology that uses a distributed, decentralized, shared, and replicated ledger, which may be public or private, permissioned or permissionless, or driven by tokenized crypto economics or tokenless. The data on the ledger is protected with cryptography, is immutable and auditable, and provides an uncensored truth.”
As I read this, something seemed a little off – the language seemed a little too bombastic to be written by state legislators, which made me think it was likely a form of model legislation. So I did some searching, and found that indeed, the phrase “uncensored truth” was part of similar legislation introduced in at least two other states, including Arizona and Tennessee. In other words, SB300 was model legislation, though it still isn’t clear who is shopping this around to state legislators. In 2018, eighteen states had some kind of legislative activity related to blockchain.
As it turns out, SB 300 was not passed, however language pertaining to blockchain (minus some of the colorful descriptions like “uncensored truth”) was part of another bill and is now part of the Ohio Revised Code (i.e. state law). It is in the section pertaining to commercial code and electronic transactions:
“(G) “Electronic record” means a record created, generated, sent, communicated, received, or stored by electronic means. A record or contract that is secured through blockchain technology is considered to be in an electronic form and to be an electronic record.
(H) “Electronic signature” means an electronic sound, symbol, or process attached to or logically associated with a record and executed or adopted by a person with the intent to sign the record . A signature that is secured through blockchain technology is considered to be in an electronic form and to be an electronic signature.”
Incidentally, earlier this month, a top aide of Governor Kasich (who recently left office due to term-limits) reportedly left state government “to work for a Cleveland tech company that’s developing ways to use blockchain to store and record government records.” It seems likely that we’re going to start hearing a lot more about blockchain in Ohio soon.

Archival Futurism

Staff Checking Motion Picture Film in Temporary Storage (National Archives and Records Administration)

For as long as archivists have been preserving the past, they have also considered what the future holds. The future has meant many things to archivists: the role of technology, the changing faces of our users, and the reasons for why we preserve records in the first place. As part of a book review I am writing, I recently delved into the writings of archivists who have reckoned with the future by doing a quick literature search across several major journals (American Archivist, Journal of the Society of Archivists/Archives and Records (UK and Ireland), Archives and Manuscripts (Australia) and Archivaria (Canada). This pulled up around 60 citations for articles with “future” in their titles, published in these four journals.

The four archival journals I pulled citations from started between the 1930s and 1970s. Although there are scattered “future” articles during the mid-twentieth-century, there were only five before 1974.   Nearly half of all future-oriented articles were published since 2000. Clearly, the millennium triggered significant professional introspection on the direction of our profession.

The history of archival futurism in the last century has often concerned the role of rapidly-changing technology in the creation and preservation of records under archivists’ care. In fact, the very first article published in The American Archivist with the word “future” in its title (1939) concerned the preservation and reliability of motion picture technology.[1] Twenty years later (1958) brought “The Future of Microfilming,”[2] and two decades later (1977) we weren’t quite yet at cloud storage, but we were considering “Optical Memories: Archival Storage System of the Future, or More Pie in the Sky?”[3]

By the 80s, a note of anxiety began to creep into our archival futurism. Perhaps reflecting back cultural anxiety over the collapse of organized labor, rise of austerity measures, technological dystopias, and the late stages of the Cold War, archivists warned about “Instant Professionalism: To the Shiny New Men of the Future,”[4] and asked themselves “Is There a Future in the Use of Archives?”[5]

By the 2000s, the anxiety gave way to even more ominous warnings, many invoking worries of a digital dark age in which all of our bytes bite the dust. Archivists wrote about “Diaries, On-line Diaries, and the Future Loss to Archives; or, Blogs and the Blogging Bloggers Who Blog Them”[6] and “Saving-Over, Over-Saving, and the Future Mess of Writers’ Digital Archives: A Survey Report on the Personal Digital Archiving Practices of Emerging Writers.”[7] To give electronic records the materiality of their analog cousins, archivists used metaphors of manmade infrastructure (“Digital archives and metadata as critical infrastructure to keep community memory safe for the future – lessons from Japanese activities”[8]) and natural phenomena (“On the crest of a wave: transforming the archival future”[9]).

Archivists often summarize their work by saying “we preserve the past for the future.” This sentiment is visible in the titles, as nearly a quarter of the articles also contain a reference to the past, such as “What’s Past Was Future,”[10] “Seeing the Past as a Guidepost to Our Future,”[11] or “Metrics and Matrices: Surveying the Past to Create a Better Future.”[12] That anchoring of archival work in the work of the past, not just for its own sake today, but also for the benefits of users we may never meet, is perhaps what gives archivists such a unique sense of perspective among the GLAM (Galleries, Libraries, Archives, and Museums) sector.

There is no doubt that as long as archivists still exist, we’ll keep writing about the future. But sometimes looking at our own history of archival futurism tells us more about where our profession has been than when we’re headed next.

 

[1] Dorothy Arbaugh, “Motion Pictures and The Future Historian,” The American Archivist 2, no. 2 (April 1, 1939): 106–14, https://doi.org/10.17723/aarc.2.2.7kv56p4206183040.

[2] Ernest Taubes, “The Future of Microfilming,” The American Archivist 21, no. 2 (April 1, 1958): 153–58, https://doi.org/10.17723/aarc.21.2.26114m62333099k3.

[3] Sam Kula, “Optical Memories: Archival Storage System of the Future, or More Pie in the Sky?,” Archivaria 4 (1977): 43–48.

[4] George Bolotenko, “Instant Professionalism: To the Shiny New Men of the Future,” Archivaria 20 (1985): 149–157.

[5] David B. Gracy II, “Is There a Future in the Use of Archives?,” Archivaria 24 (1987): 3–9.

[6] Catherine O’Sullivan, “Diaries, On-Line Diaries, and the Future Loss to Archives; or, Blogs and the Blogging Bloggers Who Blog Them,” The American Archivist 68, no. 1 (January 1, 2005): 53–73, https://doi.org/10.17723/aarc.68.1.7k7712167p6035vt.

[7] Devin Becker and Collier Nogues, “Saving-Over, Over-Saving, and the Future Mess of Writers’ Digital Archives: A Survey Report on the Personal Digital Archiving Practices of Emerging Writers,” The American Archivist 75, no. 2 (October 1, 2012): 482–513, https://doi.org/10.17723/aarc.75.2.t024180533382067.

[8] Shigeo Sugimoto, “Digital Archives and Metadata as Critical Infrastructure to Keep Community Memory Safe for the Future – Lessons from Japanese Activities,” Archives and Manuscripts 42, no. 1 (January 2, 2014): 61–72, https://doi.org/10.1080/01576895.2014.893833.

[9] Laura Millar, “On the Crest of a Wave: Transforming the Archival Future,” Archives and Manuscripts 45, no. 2 (May 4, 2017): 59–76, https://doi.org/10.1080/01576895.2017.1328696.

[10] Maynard Brichford, “What’s Past Was Future,” The American Archivist 43, no. 3 (July 1, 1980): 431–32, https://doi.org/10.17723/aarc.43.3.631227106ux2q512.

[11] Brenda Banks, “Seeing the Past as a Guidepost to Our Future,” The American Archivist 59, no. 4 (September 1, 1996): 392–99, https://doi.org/10.17723/aarc.59.4.92486pp6w6p20575.

[12] Libby Coyner and Jonathan Pringle, “Metrics and Matrices: Surveying the Past to Create a Better Future,” The American Archivist 77, no. 2 (October 1, 2014): 459–88, https://doi.org/10.17723/aarc.77.2.l870t2671m734116.

Recordkeeping and climate change

The science and history of climate change is intrinsically tied up with the practice of recordkeeping. In climate change conversations we often talk about “records” as in, “the hottest summer on record” or “a record level of flooding.” But those notable milestone records do not reveal themselves – they are revealed because of bits of data, created through the practice of observational recordkeeping, constituting a baseline that makes notable records possible.

One of the longest running data sets related to climate change are the Mauna Loa observatory records from Hawaii. These measurements, still carried out today, record atmospheric carbon dioxide concentrations, and were begun by Charles Keeling in 1958.  The resulting diagram of the measurements, popularly known as the Keeling curve, shows an unmistakable rise in atmospheric concentration of carbon dioxide since it began. The Keeling curve is one of the most important pieces of documentary evidence in demonstrating the phenomenon of climate change, and human contribution towards creating climate change through increased greenhouse gas emissions.

https://scripps.ucsd.edu/programs/keelingcurve/wp-content/plugins/sio-bluemoon/graphs/mlo_full_record.png

The phenomena of climate change is all around us today. But it took decades of maintaining meticulous observational data records, and unearthing other forms of data through historical climate surrogate or proxy records, for climate change to enter mainstream scientific consciousness, let alone popular culture. Scientists had been discussing climate change in disciplinary publications and conferences since the 1960s. But climate change did not begin to enter mainstream conversation or awareness until the mid-1980s. After 1985, various phrases like climate change, global warming, and greenhouse gases began to appear in popular culture. An example of this can be seen in looking at the Google Books Ngram viewer, which registers phrases that appear in its corpus of 5 million digitized books. Around 1986, a major rise begins to take shape, and it gains major steam through the late 1980s.

So what happened in the mid-80s? Several landmark events. One was the Montreal Protocol, which developed international cooperation in reducing the use of chloroflourocarbons (CFCs), which had contributed to the ozone hole. The awareness and international response to the ozone hole helped the public understand the degree to which human activity could adversely impact the environment. Then in 1988, a NASA scientist named James Hansen appeared before Congress during unusually hot summer weather to give testimony that global climate change was happening due to the use of fossil fuels, and that the United States needed to prepare for it.

The New York Times reported that “Dr. Hansen, who records temperatures from readings at monitoring stations around the world, had previously reported that four of the hottest years on record occurred in the 1980’s. Compared with a 30-year base period from 1950 to 1980, when the global temperature averaged 59 degrees Fahrenheit, the temperature was one-third of a degree higher last year. In the entire century before 1880, global temperature had risen by half a degree, rising in the late 1800’s and early 20th century, then roughly stabilizing for unknown reasons for several decades in the middle of the century.” Until this time climate scientists had been hesitant to call for major public policy changes towards averting climate disaster. Hansen’s examination of over 100 years of weather station records led to his determination that global temperature rise was on enough of an upward trajectory than the alarm bell had to be rung.

How many archives are in the United States?

Archives located in the Gulf South

In recent years, there’s been growing awareness within the United States cultural heritage community about exposure to climate change. Many emerging communities of concerned cultural heritage professionals have emerged. There is now a Coalition of Museums for Climate Justice, the American Library Association Sustainability Roundtable, and ProjectARCC (Archivists Responding to Climate Change). Climate change is showing up at conferences and journals in the field.

One of the major challenges in assessing the risk of climate change to cultural heritage is that data on cultural heritage is inconsistent. How can you predict how many museums, libraries, or archives will be exposured to climate change without adequate data on how many institutions exist in a given vulnerable location? There is a lot of great data on museums thanks to the Museum Universe Data File. There is a semi-representative directory of American libraries (but it leaves out school libraries). But as an archivist working on climate change issues, I was pretty dismayed to discover a few years ago that there really isn’t a comprehensive data set of archives in the United States.

Last year, my research collaborator Ben Goldman (Penn State University) and I received a Society of American Archivists (SAA) Foundation grant to attempt to compile the first comprehensive data set of all US archives. The roots of this project began with an article we co-authored with geospatial experts from Penn State, in which we took a very limited data set of around 1,200 archives (furnished by OCLC’s ArchiveGrid) and examined their exposure to climate change risks, like sea-level rise, storm surge, and changes in temperature/humidity. However, we knew if there was ever going to be a comprehensive risk assessment of US archives, someone had to bring together a comprehensive data set of how many US archives exist and where they are located. This is what we set out to do with our SAA Foundation grant.

Over the course of the grant, we worked with a fantastic assistant, Whitney Ray, who did an incredible amount of heavy lifting with contacting over 150 archival organizations for any data they had. Essentially, we reached out to anyone we thought may have maintained lists of archives in their region or area of interest. What we received was over 30,000 raw data points! The data came to us in every way you can imagine – from very tidy spreadsheets to webpages with broken links to PDFs of archives. You can read much more about our workflow and process on our RepoData project blog.

We’ve now made data available for 30 states plus Washington DC on our public GitHub repository. We plan to work through the remainder of the states through 2018. While we originally created this data because we know it’s critical for future climate assessment work, we know there will be a lot of potential reuse for it – and we’re excited to see how people use it!

What does records management have to do with maintenance?

Coast Guard and Agencies Response to Deepwater Horizon Oil Spill

In April 2016, Andrew Russell and Lee Vinsel published an article in Aeon titled “Hail the Maintainers.” Russell and Vinsel called for a closer examination of how our culture venerates technological innovation. We elevate innovation and innovators, while overlooking the important role of maintenance in keeping society going. The concept took off, and there has been a subsequent conference known as The Maintainers and many academic articles on maintenance, particularly on the history of technology.

Archivist Hillel Arnold has applied the idea of maintenance theory to the work of archivists, noting that archivists “do the hard and invisible work of maintaining records. Not only do we perpetuate the physical existence of records through preservation activities, we also manage ongoing access to records, in part by maintaining the context of record creation and maintenance through arrangement and description processes.” Hillel and I collaborated last fall on a paper tracing the connections between recordkeeping, maintenance, and environmental regulation. In recent months, I’ve started to examine how the maintenance of regulatory recordkeeping breaks down during fossil fuel industrial accidents and disasters – with significant consequences both for workers and the environment.

Fossil fuel energy production is a highly regulated industry – at least on paper. However, despite the thousands of regulations that govern the extraction of coal, oil, and natural gas, and subsequent downstream production and transmission activities, these regulations have failed to protect the health of workers, nearby communities, and the environment due to several factors that include regulatory capture and lack of enforcement capabilities. Recordkeeping violations are also an explanation for regulatory failures. Industry failure to maintain authentic records – whether by manipulating existing records, or by destroying incriminating records – can accelerate dangerous situations.

Examples of these failures of recordkeeping can be found in two deadly energy industry accidents that happened just two weeks apart in April 2010.  On April 5, an underground mine explosion at the Upper Big Branch mine in West Virginia killed twenty-nine miners. On April 20, an explosion occurred at the offshore drilling platform known as Deepwater Horizon, located 40 miles off the Louisiana coast in the Gulf of Mexico. Eleven workers were killed on Deepwater Horizon, and oil leaked from the site for close to 6 months, resulting in the worst domestic oil spill in history.

Investigations of the Upper Big Branch disaster found that Massey Energy, the parent company, routinely underreported safety violations in the records they shared with regulators. In other words, Massey Energy manipulated the very records that could have demonstrated to regulators that the mine needed to make necessary safety improvements.

In the wake of the Deepwater Horizon explosion, many of the recordkeeping concerns that surfaced were over questions of responsibility and accountability for the months-long oil spill in the Gulf. One BP executive was accused of manipulating oil spill estimates. Others were accused of destroying evidence associated with the post-disaster investigations.

We are currently in a period of increasing deregulation of environmental protections. When it comes to American fossil fuel companies, there is a clear role that recordkeeping – or rather, attacks on recordkeeping – play in deregulation. Effective regulation – whether over fossil fuel production and emissions, or workplace safety rules – requires comprehensive and accurate recordkeeping. In contrast, American politicians who support expansion of fossil fuel energy production in the United States routinely deride regulatory oversight as limiting economic progress and domestic energy independence. One of the primary tools of deregulation has been to cut back the amount of information that industry is required to share with regulators, or the amount of recordkeeping it must maintain internally for safety and accountability.

Recordkeeping alone cannot produce environmental health and workplace safety. But achieving either is impossible without baseline records that provide accountability and information to affected communities.

Above Board, Below the Ground

December 12, 2013, Youngstown, Ohio. Truck crash and spill. EPA incident review conducted. Truck was a contractor hauling fracking wastewater from ChemTron in Avon, OH. Liquids went into storm sewers and Crab Creek – a tributary of Mahoning River. Image courtesy of FracTracker. Photos from Lynn Anderson, Frack Free Mahoning, & Jean Engle, Youngstown Community Bill of Rights Committee

In my recent explorations of how recordkeeping practices inform environmental policy and knowledge, an interesting trend has revealed itself in the context of state regulation of fracking in the Marcellus/Utica shale region (i.e., Ohio, Pennsylvania, and West Virginia). State agencies in these areas are far more likely to proactively disclose records concerning permits, well locations, and production volume than any other records.

This means that there is significant data on the expansion of fracking – from its geographical extent to the volume of extractive activity. What is far more difficult to obtain is information on the effects of fracking. In other words, the records that contextualize fracking’s impact on the communities where it takes place – complaints, routine inspections, and investigations – are largely absent from the available data on state oil and gas websites. Instead, citizens must file records requests to obtain this information. Pennsylvania is a notable exception in comparison to Ohio and West Virginia, as it discloses records specifically pertaining to inspections and waste production and handling. It also partially discloses complaint and investigation records (primarily related to water contamination issues).

Ohio’s inspection records are highly obscured, requiring one to go through a very confusing process to obtain records from the Department of Natural Resources website. There is no obvious way to search for specific inspection, complaint, or investigation records through either Ohio or West Virginia’s website. Ohio law requires that a database concerning major violations by oil and gas operators be made available to the public on the Division of Oil and Gas resources website. Some of this information may be available through Ohio’s RBDMS application, but due to installation difficulties, I was unable to confirm this. When I asked an agency official regarding whether this web database was available, I was told the agency was “in the process” of creating it. The law calling for such a database was passed in 2010 and amended in 2011.

According to an issue paper authored by the Natural Resources Defense Council and FracTracker, West Virginia once had an active Oil and Gas spills database that was updated at least through 2013. The database is still hosted online, but does not appear to have any records in it from the last several years, or the time period in the issue paper.

When agencies have leeway to determine the scope of their proactive information disclosure, what is shared likely reflects how the agency views it mission. It appears that Ohio and West Virginia’s regulatory agencies prioritize disclosing information about the growth of fracking far more than its potential ramifications for the environment.