• Welcome to Massive Texts.

  • The DU Massive Texts Lab studies massive collections of texts through the development and application of computational methods.

    We work with texts at scales too large for a person to read, applying machine learning, text mining, and natural language processing methods to learn the characteristics, relationships, and topics within documents. At an aggregate scale, we study themes across an entire collection of texts; for example, learning about historical or cultural trends from massive digital libraries, identifying duplicate or variant relationships among books, building language models in service of creativity measurement in educational assessment, and quantifying heterogeneity and outlier passages in political legislature.

  • SaDDL - Similarity and Duplication in Digital Libraries

    SaDDL is an IMLS-funded project (LG-86-18-0061-18) aimed at identifying and labelling duplicate work relationships in digital libraries, currently working with the 17 million works in the HathiTrust Digital Library. A reliable way to recognize fuzzy duplicates can lead to better information access and retrieval as well as supporting cleaner large-scale text analysis and aiding in updating library records to modern FRBR-based cataloguing standards. In addition, the project seeks to identify the "best" copies of each work for access or analysis, and to generate recommendations for similarly-themed texts that libraries can adopt.

    To learn more about the project, visit the SaDDL page.

     

  • MOODs: Identifying Outlier Passages and Texts in the Legislative Process

    MOODs is a project that analyzes legislative bills for atypical or outlier texts - Misfits, Omnibuses, and Odd Ducks. The project seeks to help people to better understand the legislative process, using computers to help spotlight novel - and potentially of interest - parts of bills for readers as well as creating additional quantitative measures for downstream classification and analysis. MOODs is funded through an internal University of Denver grant.

    To learn more about the project, visit the MOODs page

  • Current Contributors:

      Dr. Peter Organisciak, Director

      Dr. Krystyna Matusiak, Affiliate Faculty

      Andy Lawder, Graduate Student Associate

      Summer Shetenhelm, Graduate Student Associate

      Grace Therrell, Graduate Student Associate

      Danielle Vasques, Graduate Student Associate

     

      To learn more about each of our contributors, visit the Members page.

     

This portfolio last updated: 09-Nov-2018 5:51 PM