Approaches to Library Assessment Using Multiple Data Sources

Image of 3 circles, representing a survey, a data store, and a library shelving area.

Many library assessment projects rely on single sources of data that are well-suited to answering the questions at hand. For example, the organizers of the Library’s Party for Your Mind event were interested in getting feedback on the event from employees who had staffed different event stations. A survey was sent to those who had helped; respondents rated the event on a series of scales (e.g., accessibility, diversity, engagement) and provided comments about what went well, what could be improved, etc. Such an approach to data collection -- using a single data source -- is ideal for generating findings that can guide future work on something like an annual event.

For other types of assessment projects, however, it can be incredibly useful to join multiple streams of data to answer assessment questions. In this blog post, I will share two examples of how this can be done, and will discuss some of the advantages conferred by joining data sources in each case.

Example #1: Assessing Document Delivery Services

During Fall 2019, we launched an assessment project to understand how well the Library was meeting patrons' document delivery needs. One unique aspect of this study was that data were gathered from three sources and merged into a single data set, using patrons’ email addresses as a common variable. These sources were:

  • FY19 document delivery data: The number of physical materials delivered, and the number of materials scanned and emailed, for each recent user of the service.

  • Survey data: Survey sent to recent service users, and to a well-matched comparison group of non-recent users. The survey obtained data on service awareness, use, views on delivery turnaround time, and more.

  • U-M Data Warehouse data: Demographic data obtained from the Data Warehouse. Includes factors such as disciplinary focus, track and rank for faculty, type of program for graduate students, race/ethnicity of faculty and students, etc.

A slide deck summarizing some of the findings of this study can be viewed here.

The joining of these three streams of data conferred some of the following benefits:

  • A shorter survey: Because we paired Data Warehouse data with survey data, we did not have to ask survey respondents about the details of their campus positions and areas of scholarship. This allowed us to make the survey substantially shorter. Shorter surveys are associated with better completion rates!

  • Rich, accurate service use data: Some of our survey participants were very heavy users of the delivery services (e.g., making 20 or more requests in FY19 was not uncommon). Some used the service sparingly, and not in recent months. By incorporating the service usage data collected by our own systems, we gathered rich, accurate data on user interactions with the service. We did not have to fret about the natural memory limitations of the survey respondents.

  • Creation of a matched comparison group using propensity score matching: A powerful aspect of the study was the comparison of service awareness and expectations held by recent users compared to non-recent users and non-users. Our internal document delivery data allowed us to identify the FY19 users we wanted to survey. We then obtained Data Warehouse data on those students and faculty members, including discipline, faculty track/rank, student standing, and demographic information. Those data were used in a logistic regression model to find ‘matching’ individuals in the U-M Data Warehouse who were not in our document delivery records between FY14 and FY19. Those with good propensity score matches were included in the comparison group. (A propensity score is an estimated probability that a person might have been exposed to an experience -- in this case the use of document delivery -- even though they were not.) Having a well-matched comparison group makes the comparison of users and non-users much more meaningful, as different responses by the two groups cannot simply be explained by differences in group makeup.

Example #2: Assessing Library-Related Needs during the COVID-19 Building Closures

In response to the COVID-19 pandemic and the resulting stay-at-home orders, the Library launched a survey in April 2020 to understand how to support the Library-related needs of faculty and students.

For this assessment project, three data sources were joined, again using email addresses as a common variable:

  • Survey data: A survey was sent to a sample of over 12,000 faculty members and students, and roughly 2,000 responded. The survey asked for feedback on library-related needs and suggestions related to the activities of teaching, conducting research, and engaging in coursework.

  • U-M Data Warehouse data: Data from the Data Warehouse was used to construct the sample, and variables from this data source such as campus role, disciplinary focus, and race were subsequently paired with survey data.

  • Learning Analytics Data Architecture (LARC) data: After survey data were collected, additional data on student respondents were obtained in a query of the LARC data set (using email addresses as the common identifier). LARC contains some data about students that are not included in the Data Warehouse, such as family income and the highest level of parental education.

You can read a summary of the COVID-19 study findings here

Ways that the use of multiple data sources benefited this study include:

  • Creating a diverse sample: In using data from the Data Warehouse to create the sample, the goal was to ultimately have a study data set characterized by genuine diversity in terms of campus role, disciplinary focus, and race/ethnicity. In fact, to ensure that the final data set allowed for focused analysis of the needs of smaller, underrepresented racial groups on campus, we oversampled people who were listed in the Data Warehouse as Black, Hispanic, and Native American. Thus, using the Data Warehouse as a data source facilitated both careful recruitment for the study and the subsequent analysis of the variables contained therein (the same variables used to guide recruitment were also paired with survey data).

  • Assessing the role of student socioeconomic status: Economic indicators from the LARC data set, such as family income and parent education level, were joined with the other data we collected. Such a pairing facilitates the analyses of variables, such as student socioeconomic status, that might predict students’ library-related needs, such as the need for electronic course reserve materials (after access to physical course reserves was no longer possible).

Planning carefully for an assessment project can allow a study team to consider the potential benefits of retaining identifiers in data sets long enough to join multiple data sources. Of course this type of approach to data collection and analysis necessitates careful data management practices, and requires working with someone who has access to campus data sources, such as LARC and the Data Warehouse. If you want help with an assessment project that would benefit from the joining of multiple data sources, please feel free to reach out to me in my role as assessment specialist (