Humanities Data Analysis: Case Studies with Python#

Humanities Data Analysis: Case Studies with Python is a practical guide to data-intensive humanities research using the Python programming language. The book, written by Folgert Karsdorp, Mike Kestemont and Allen Riddell, was originally published with Princeton University Press in 2021 (for a printed version of the book, see the publisherโ€™s website), and is now available as an Open Access interactive Juptyer Book.

The book begins with an overview of the place of data science in the humanities, and proceeds to cover data carpentry: the essential techniques for gathering, cleaning, representing, and transforming textual and tabular data. Then, drawing from real-world, publicly available data sets that cover a variety of scholarly domains, the book delves into detailed case studies. Focusing on textual data analysis, the authors explore such diverse topics as network analysis, genre theory, onomastics, literacy, author attribution, mapping, stylometry, topic modeling, and time series analysis. Exercises and resources for further reading are provided at the end of each chapter.

What is the book about?

Parsing and Manipulating Data โ›๏ธ

Learn to how effectively gather, read, store and parse different data formats, such as CSV, XML, HTML, PDF, and JSON data.

Modeling and Data Representation ๐Ÿš€

Construct Vector Space Models for texts and represent data in a tabular format. Learn how use these and other representations (such as topics) to assess similarities and distances between texts.

Creating Sophisticated Visualizations ๐Ÿ“ˆ

Emphasizes visual storytelling via data visualizations of character networks, patterns of cultural change, statistical distributions, and (shifts in) geographical distributions.

Working on Real-World Case Studies ๐ŸŒŽ

Work on real-world case studies using publicly available data sets. Dive into the world of historical cookbooks, French drama, Danish folktale collections, the Tate art gallery, mysterious medieval manuscripts, and many more.

Accompanying Data#

The book features a large number of quality datasets. These datasets are published online and are associated with the DOI 10.5281/zenodo.891264. They can be downloaded from the address

Citing HDA#

If you use Humanities Data Analysis in an academic publication, please cite the original publication:

Karsdorp, F., Kestemont, M., & Riddell, A. (2021). Humanities Data Analysis: Case Studies
with Python. Princeton University Press. 
  author = {Folgert Karsdorp and Mike Kestemont and Allen Riddell},
  title = {Humanities Data Analysis: Case Studies with Python},
  publisher = {Princeton University Press},
  isbn = {9780691172361},
  year = {2021}