Open minds to open data
By Amy Koshoffer, Assistant Director, Research & Data Services, and Mark Chalmers, Science and Engineering Librarian
Throughout the spring 2024 semester, two UC librarians, Amy Koshoffer (pictured left lecturing in the Visualization Lab in the Geo-Math-Physics Library.) and Mark Chalmers, co-taught the Power and Politics of Data honors seminar. The class drew students from colleges across the UC campus, including the College Conservatory of Music, the Lindner College of Business, the College of Medicine, the College of Education Criminal Justice and Human Services, and the College of Arts and Sciences. The course was geared toward, but not exclusive to, students considering doing research.
In this interdisciplinary honors seminar, the students delved into the intricate dynamics of data in today’s digital age, with a special focus on research data produced in the academy. The students not only learned about the technical aspects of data such as data creation, documentation, storage and governance, but also engaged in critical discussions about the ethical, social and political implications of data in research and everyday life. The lectures and discussions highlighted current factors impacting data in the academic research environment, such as federal funding and publisher data sharing requirements. The course had no textbook, but the lessons drew from current publications in academia, industry and invited guest speakers. The seminar culminated with a final project that required students to interview an active researcher. Recapping the semester’s highlights, it’s clear that the lessons learned extend far beyond the classroom or the lab, promising to influence the thinking and careers of today’s Bearcats and tomorrow’s leaders.
Data in the Wild
A hallmark of the Honor’s Program was an emphasis on experiential learning and reflection as catalyst for developing students into global citizen scholars equipped with the skills to work on the world’s complex problems. To provide experiential learning, the course had Data in the Wild sessions, where guest speakers engaged the students either in the classroom or in their lab spaces. During the Data in the Wild sessions, students experienced the course materials in practice as they dove deep into real-world data-specific scenarios from industry and academia. These sessions were led by various experts and practitioners from different fields, who shared their hands-on experiences with data in practical, often complex settings. These sessions reinforced the course materials and exposed the students to the real-life challenges and opportunities of working with data across many environments.
84.51° – Industry Data at Kroger
Bill McMillin was the first guest for the Data in the Wild series. Bill visited the class to talk with the students about how Kroger collects, analyzes and utilizes data across its operations. Before joining the data science team at 84.51°, Bill worked at UC Libraries as a metadata librarian. In this Data in the Wild session, students interacted with Bill to learn how data shapes retail grocery operations in many forms, such as using data to inform targeted coupons, to optimize warehouse inventory, and to determine where best to build new stores to combat food deserts. Bill talked with the class about his career evolution from being a data scientist to becoming a data engineer, emphasizing the critical importance of meticulous data collection and consistent documentation across diverse teams throughout the enterprise.
Not all of Kroger’s data is closed, and he showed the students how to start working with Kroger’s publicly available API (application programming interface) and encouraged them to pursue projects with the API and career opportunities at 84.51°.
Engaging with how data is treated in industry helps expand the content of the course to address compelling topics of great interest to the students such as how companies use the data they collect about consumers and their purchasing habits, but also sets up a strong contextual juxtaposition to the academic research ecosystem and the data practices therein.
Biological Anthropology Data
One of the most challenging aspects of research, especially in open research, can be working with human subjects. Variables including demographics, genetic markers, identifying images and participant responses can contain sensitive and identifying information about the participants and their possible health conditions. Students in the course learned about policies, guidelines and documentation, and the challenges of handling human subject data in the biological anthropology Data in the Wild session with Heather Norton, associate professor of anthropology.
Norton studies human evolution through examination of the genes that determine skin color and skin variation. Her work impacts the basic science understanding of human evolution and has commercial applications through development of skin care products such as anti-aging products. In addition to funding from federal agencies, she has industry collaborations that support her research. Each of these types of funders have different requirements about who owns the data, how to handle and store the data, and how to share it within the research team and outside the research team. Looking at this bigger picture of her work and understanding the complexities of data collected and the relationships developed through the work, students learned about the nuanced reality of data ownership, management and sharing in practice.
Data Equity
There is a person behind every data point. In her Data in the Wild session, Whitney Gaskins, assistant dean and assistant professor in the Office of Inclusive Excellence & Community Engagement told two stories that highlighted how important data can be in healthcare and higher education to ensure the best outcomes for all.
She discussed how important it can be to collect and format data for best use and reuse. Gaskins told the students about her own personal struggles to get the right health care and how data collected for one purpose could lead to improved health outcomes in another situation. The challenges she faced led her to a research study and possible AI solution to help future patients who find themselves in a similar situation. She also talked about the difficulties of collecting the right data and the challenges of combining data sets from different sources. Students in the class learned about the ideals of data collection, documentation and standards, and the Data in the Wild sessions demonstrated the challenges researchers face when really working with data.
The Margaret H. Fulford Herbarium: Plants as Data
On Leap Day the students left the classroom to venture out to the Margaret H. Fulford Herbarium housed on the 6th floor of Rieveschl Hall. UC is lucky to have the third largest herbarium collection in Ohio, housing approximately 125,000 specimens, and home to one of the best collections of bryophytes in the country.
Curated by Eric Tepe, associate professor of biological sciences, the herbarium has unique data challenges that echo the themes of the course in persuasive and unique ways. Tepe has funding from the National Science Foundation to digitize the specimens in the collection and bring them online as part of a broader effort to integrate local collections into larger digital herbaria collections. Having this rich collection available online helps aggregate herbarium data from many sources to address current challenging questions such as the extent of habitat loss due to human activity or changes in species locations due to climate change. The process to get the data out in the open is involved as it takes a long time to scan and database physical specimens to make them available through the internet. One of the challenges of herbarium data is the high variance in data quality. The collection comprises of many specimens that have been brought together over the years from different collectors such as Margaret Fulford and Curtis Gates Lloyd. Over the years, collectors were not always consistent in the data they provided about a given specimen, such as how they described the exact locality they collected the specimen or the date it was collected.
Aside from converting physical specimens to structured, digital data, the issue of data governance was further illustrated at the herbarium. Putting all the herbarium data online also has potential downsides as some species in the collection are endangered, rare or threatened. For this reason, the systems that house the digital collections have special technical features to withhold some information from the public, such as the location of the specimen, to protect the species. Skills needed to handle herbarium data are knowledge of metadata standards for the data, good documentation and databasing.
This experience helped students understand the volume of work needed and the special considerations that must be considered to make such data available. The students enjoyed seeing and even touching the various specimens, the technology used in digitization and learning firsthand the way that UC’s herbarium collection is contributing to larger digital collections being used to study climate change and other contemporary questions.
Archeology Data
A visit to the Classics department may not inspire students to have thoughts of big data, though big data is a major component of archeology research these days. Archaeologists use technologies such as remote sensing, laser imaging, detection, and ranging (LIDAR), geographic information systems (GIS), 3D printing and photogrammetry to gain an understanding of the physical environment of the archeological site and produce terabytes of data in the process. There are also many political implications because to work in these sites comes with requirements and restrictions from the host nations. The reality of archeology research is very different than students may have imagined from films and stories. Like the herbarium project, these projects are very data rich and managing the volume and complexity takes a great deal of skill in building and administering databases. These projects require good relationships with many entities such as government officials and other researchers as the diversity of data collected can impact many other research disciplines including history, biology, geology, chemistry, archeology and others. When it comes to the data, students see firsthand how critical good documentation, databasing and controlled sharing are to the work given the complexity of the research and the number of people involved in an archeological project.
UC’s Institutional Data
The final Data in the Wild session was with Susana Luzuriaga Voight, acting vice provost for academic analytics, from the Office of Institutional Research and Academic Analytics. In this session, the students explored the office’s dashboards used by key stakeholders and decision makers within the university’s administration to gain insights into contributing factors for recruitment, retention, admissions and student success. Many different systems contribute data to these dashboards, and it is a feat to merge the different streams together into a data set that helps create that actionable picture of student success. Voight’s team is also responsible for aggregating, curating and validating the data that UC provides to state and federal agencies. The students learned about the tools and infrastructure used to support this work and how important it is to have a reliable system to support the needs of the research.
Conclusion
Though the course made use of multiple teaching methods and resources, the Data in the Wild sessions brought a real-world perspective to the diverse challenges of working with data and the vast landscape of research data at the university. Students loved the variety of the presentations and getting out of the classroom to explore more of what makes the University of Cincinnati unique and distinctive.
Many thanks to all the guests, UC faculty and staff who presented to our students: Bill McMillin, Sebastian Karcher, Heather Norton, Whitney Gaskins, Eric Tepe, John Wallrodt and Susana Luzuriaga Voight. And a special thanks to Marcia Johnson, library services supervisor for the Geo-Math-Physics (GMP) Library, who captured many of the wonderful photographs included in this article.