A needle in a PDF haystack

One of my research areas is examining the role of recordkeeping and documentation in environmental regulations. A research tactic I frequently use is to sift through hundreds of PDFs at once. Large numbers of PDFs are posted on many environmental regulatory websites, but there isn’t a lot of information about what’s in them or how big they are, or where the juicy stuff is. If this sounds daunting – well, it is! But I’ve come up with a few tricks to help me sort out what is useful and what isn’t.

Step 1: Download

The first thing is figuring out how to download a zillion PDFs at once. For this, I recommend getting a bulk downloader add-on for your browser (here is an example). This will scan through every URL on a page that ends with .pdf, which indicates that the URL is likely a downloadable PDF. A bulk downloader prevents the need from clicking on every single link, which can be a lot when you’re on a page with dozens of PDFs like this one from the Mine Safety and Health Administration.

Step 2: Text Recognition

Once you download all the files you want, I like to place them into a dedicated folder. This is because even though theoretically most government agency PDFs should have optical character recognition (OCR), the actual practice is very inconsistent. OCRed text is critical for PDF searching because it allows you to do a keyword search within a single file or across multiple files at once. Currently, there is not widely-available OCR functionality for cursive or handwriting, just typeface.

Adobe Acrobat has a useful function (under Tools > Text Recognition > In Multiple Files) where you can run the OCR function across everything in a specific folder. This can take a while, but at least there is a progress bar that’ll show you how long it takes – which could be a while, considering that many government environmental regulation records can be a hundred pages for each file. Using the Adobe Acrobat tool also allows you to keep or modify original file names. I like to downsample the files to 600 dpi – it takes longer than the lower dpi measures, but I think it enables better keyword searches later on.

 

 

 

 

 

Step 3: Dig into the PDF files!

There is a free software program called PDF-XChange Viewer, which you can use to do keyword searches over large amounts of PDFs. You can also run a similar search within Adobe Acrobat, but I find that the searching takes far longer and the results are presented less tidily than with PDF-XChange Viewer. Supposedly you can also run batch OCR with this program, but I haven’t tried it.

The example demonstrates how I wanted to find PDFs from coal mine inspection safety reports that mention the word “map.” The results show me that of the dozens of documents I searched in my dedicated folder, there were 187 hits for the word “map” across 14 of the documents. I can get an idea from the left-hand preview pane what the keyword is like in context, and then click on that to see the actual PDF in the right-hand window.

This step helps me pull out the PDFs that I need to analyze more deeply, thus saving me the headache of opening up every single PDF in case it might have something of interest.

Welcome Back!

HSL Updates

September 2019, Vol. 1

On behalf of the faculty, staff and students of the Donald C. Harrison Health Sciences Library (HSL), – welcome back for the 2019-2020 school year.  I would like to share with you some library updates and links that you can easily access for information as it relates to classes being offered in the HSL, hours of operation, new technology, upcoming lectures and collaborative spaces for study and research.

HSL upgrades:

  • Over the summer, the electrical engineers were able to finalize the upgrades of lights on the R, E and G levels of the library
  • New power outlets were installed on the R-Level of the library.
  • At the request of HSGA representatives from last year, the HSL purchased four (4) new ergonomic stand-up tables that are located in the Computer Lab.
  • In mid-September the HSL Circulation Desk students will begin offering scheduled 20-minute walking tours of the HSL Library and Winkler Center.  The tour will include, but is not limited to information on equipment that can be checked out, HSL IT Help Desk services, how to navigate the Computer Lab and information on the HSL’s Electronic Classroom.  Look for more information about the tours in October.

HSL Updates will be posted on a monthly basis to share the HSL Workshop calendar and any other information related to things happening in the HSL.  If you have any questions or concerns, please feel free to contact me directly.  As always, we are happy to welcome you into the HSL.

Regards,

Lori E. Harris, Interim Director
Donald C. Harrison Health Sciences Library and
the Henry R. Winkler Center for the History of the Health Professions
Tel:  (513) 558-0315
Fax: (513) 558-2682

Email: harri2li@ucmail.uc.edu
Web:  www.libraries.uc.edu/hsl

New Library Website Tutorial

By Kellie Tilton.

You may have noticed that things have changed a bit around the library’s website! Following the redesign of the entire UC Libraries’ 13 website into a more streamline, central website, some of the tools, services and links you previously used may have moved.

To help with this, we’ve created a quick video:

website tutorial slide

The Archives and Rare Books Library presents the 2nd annual German-Americana Lecture, which will feature Frederic Krome

August, Hans and Adelaide Schiller

August, Hans and Adelaide Schiller

The second annual German-Americana lecture, scheduled for Thursday, Sept. 19, 3-4:30 p.m., in Annie Laws (407 Teachers/Dyer), will feature Frederic Krome, professor of history at UC Clermont College who will speak about his recent research on the unpublished memoirs of a World War I German soldier. “In Times of War: Hans Schiller’s Recovered Memoirs” will provide a fascinating window into the motivations and experiences of Hans Schiller during tumultuous times, without the extensive post war re-writing so common to what historians are starting to refer to as “Ego Documents.”

Enlisting in the German Army in the fall of 1914, Hans Schiller fought on the Eastern, Italian and Western fronts during the war, and with the Freikorps in the Baltic from 1919-21. Sometime in 1922, as he was recovering from Scarlet Fever, Hans Schiller collated his notes and wrote a memoir of his military service. The handwritten memoir was then placed in a box, where it lay as Schiller married, had a family, and in 1939 was recalled to active duty as an occupation administrator in Eastern Europe. In January 1945 he committed suicide and his manuscript, still in the box, came to his younger daughter, who immigrated to the U.S. in the 1950s. It was re-discovered by Karin Wagner, Schiller’s granddaughter, shortly before her mother’s death.

frederic krome

Frederic Krome

Frederic Krome, (Ph.D. University of Cincinnati, 1992) taught at Northern Kentucky University (1992-98) before becoming the managing editor of the American Jewish Archives Journal at the Jacob Rader Marcus Center of the American Jewish Archives, Hebrew Union College-Jewish Institute of Religion. Since 2007 he has taught at the University of Cincinnati. His publications include The Jews of Cincinnati (with John Fine), The Jewish Hospital and Cincinnati Jews in Medicine (2015), and Fighting the Future War: An Anthology of Science Fiction War Stories, 1914-45 (2012), along with numerous articles and book reviews.

Organized by the Archives and Rare Books Library, the German-Americana Lecture is free and open to all, but reservations are requested to jennifer.mackiewicz@uc.edu or by phone at (513) 556-1394.

The German-Americana Lecture is generously supported by The Charlotte and Edward Unnewehr Fund for the German-Americana Collection made possible by the Marge and Charles J. Schott Foundation.

New Faces in the Clermont College Library

Over the summer, two new staff members joined us in the Clermont College Library.

Emily Wages, Operations Manager

Photo of Emily Wages.

Emily Wages

Emily has worked at various libraries and library-related organizations over the years, including King Library at Miami University, Lane Libraries, MidPointe Libraries, and SWON Library Consortium. Emily graduated from Miami University in 2011 with a BS in English Education and a minor in British Literature. In 2014, she graduated from Kent State University with her MLIS. At Clermont College Library, Emily will be managing our public services and student employees.  

Emily loves gardening with native plants, reading, and true crime. She is also a distance runner, currently training to run the Indy Monumental half in November. In the last few years, she has run three full marathons (two Flying Pigs and Columbus) and she plans to run the full Flying Pig again this coming May. She lives in Colerain Township with her husband Quinn, dog Shay, and cat Chili.

Nicole Stamat, Library Specialist

Photo of Nicole Stamat

Nicole Stamat

For the last six years Nicole has worked for the Clermont County Public Library as a library assistant and library assistant specialist. She previously worked for the National Park Service at the Dayton Aviation Heritage National Historical Park; as well as the BSA Philmont Scout Ranch in Cimarron, NM. She obtained a BFA in Art Therapy at Millikin University in 2009 and worked at the Staley academic library. Nicole will also be working in public services and helping with some aspects of our technical services.

She lives in the area with her husband, young son and a hound mix. Her hobbies include fiber arts (most especially knitting), table-top gaming and of course, reading. Her favorite authors include: Terry Pratchett, Neil Gaiman, Rainbow Rowell, Laini Taylor and Margaret Atwood.

Please join me in welcoming Emily and Nicole to Clermont College!

Heather Mitchell-Botts
Instruction Librarian

Research Labs @ GMP Library News update – Zhiyuan Yao Attends the AAG-UIUC Summer School

Zhiyuan Yao is one of two GIS support students working in the Research & Data Service research labs at the Geology Math and Physics Library.  The Data & GIS collab is open to students, staff and faculty seeking help with their geospatial data needs, and the Visualization lab is open for data visualization consultations and collaborative work.  Email us at ASKData@ucmail.uc.edu for more information.  

Great learning and collaboration experience in AAG-UIUC Summer School

This summer in July, I was honored to be offered the opportunity to attend AAG-UIUC 2019 Summer School, which focused on Reproducible Problem Solving with Cyber GIS and Geospatial Data Science. During the one-week summer camp, I met many scholars, got access to the supercomputer Virtual Roger through CyberGIS-Jupter, learned the cutting-edge advances regarding geospatial data science, and got a deeper understanding about reproducibility and replicability. I absolutely had a wonderful time there, and this experience provoked me to think more about how we could develop novel solutions to complex problems.

 

Participants in the AAG-UIUC summer school with mentor Diana Sinton (Ex Director of UCGIS in the  green shirt) in the middle.

Continue reading

Libraries Moving to Mediated Service Model for Kanopy Streaming Videos

kanopy home screen
Kanopy, the University of Cincinnati Libraries’ on-demand streaming video service available at http://uc.kanopystreaming.com, is a content-rich, much-used resource. However, the Libraries’ materials budget cannot sustain the current level of spending for Kanopy. It is important to note that films obtained in Kanopy are not owned, but leased for one year at $135 per lease. When the lease has expired, the film can then be triggered for another one-year lease.

In the past, Kanopy films were automatically leased after 30 seconds of viewing by any user with access to the library, including walk-in visitors. In an effort to eliminate purchases triggered by casual users and to focus on course-related use of the rich academic material included in Kanopy, UC Libraries is moving to a mediated leasing model beginning the 2019 fall semester.

University of Cincinnati students, faculty, and staff will continue to have immediate access to films currently leased by UC (approximately 645 titles as of June 2019) until they expire. People will still be able to search in Kanopy for films the Libraries does not already lease, however, when attempting to access such a film, users will have the option to select the “request” button next to the film, fill out the form, and the Libraries will work to fulfill the request, typically obtaining a new lease within one business day.

For additional streaming video sources, or for further information on the mediation of Kanopy films, please visit the Libraries’ Kanopy LibGuide at https://guides.libraries.uc.edu/Kanopy/FAQ.

For more information regarding Kanopy’s streaming service please read the following article from the May 2019 issue of Film Quarterly at https://filmquarterly.org/2019/05/03/kanopy-not-just-like-netflix-and-not-free/.