🔔 HathiTrust Research Center Services Temporarily Unavailable

Due to scheduled maintenance, some HTRC services are not available from Friday, March 28th at 1:00pm ET to Monday, March 31th at 12:00pm ET. We apologize for any inconvenience.

HathiTrust Research Center Analytics

Supporting large-scale text analysis of the HathiTrust Digital Library for educational and non-profit research

Where to start: understanding the data


Text as data

The data you analyze in HTRC Analytics is text that comes from the HathiTrust Digital Library, a corpus of 18+ million digitized items, provided by more than 60 academic and research libraries across North America and other countries. This corpus opens doors for researchers to study culture and history in new ways.

What does the collection look like?

The HathiTrust Digital Library is a massive collection, including but not limited to fiction, nonfiction, and scholarly works in many different languages and spanning the history of printed text. The chart below shows a breakdown of languages found in the digital library.

How can researchers access this much data?

Researchers get access to public domain and in-copyright texts in the form of worksets, derived datasets, and full text in data capsules, making HTRC an indispensable resource to research communities.

HTRC tools and services are built on a non-consumptive use policy which allows researchers to stay within the bounds of fair use when conducting their analyses.

New to HTRC Analytics?

Read through the New to HTRC? page to understand the foundational concepts for how to use the site.

Featured services

Explore and visualize


Create a graph showing word trends over time with Bookworm+HathiTrust. No coding required!