Digital Humanities Workshop - Session 4

The Impact of the Digital on Japanese Studies

Session 4 - Abstracts

Jonathan Abel, Penn State University

On Collecting Data

This presentation will focus on problems with data collection in three unrelated projects.  First, I will discuss some cautionary differences between received or found source data (that which is collected by others) and active data collection (through scraping), drawing on the example of Twitter. Second, I will introduce some of the problems with managing a group of data collectors through a discussion of the Japanese data portion of CINEmap, a project I am working on with colleagues to geotag scenes in world cinema in order to reconnect spaces captured on film with the world. Third, I will introduce the issue of data creation via a project that attempts to code generic categories within the Journal of Japanese Studies and Monumenta Nipponica in order to substantiate field-level speculation on changing interests, factions, and debates in Japanese Studies.

Hoyt Long, University of Chicago

On Aozora Bunko as Archive

Aozora Bunko began in 1997 as a crowd-sourced initiative to create a public digital library for literary and other works no longer under copyright. It now comprises some 13,000 texts and is arguably one of the most promising resources for data-driven studies of modern Japanese literary history. In this talk, I will consider the possibilities and limitations of Aozora Bunko as a digital archive that sits in relation to “modern Japanese literature” as a historical construct, but also transcends it in specific ways. A brief a history of the archive and a statistical analysis of its composition will be a chance to raise questions about its “representativeness” as an archive and about the problem of historical sampling in general. I will also introduce some of the new forms of analysis that this archive makes possible, ranging from an advanced search interface (“Aozora Search”) that I have created with colleagues at Chicago, to more advanced methods in natural language processing and machine learning that make it possible to distant-read at the scale of hundreds, or thousands of texts. Ultimately, Aozora Bunko provides a frame with which to think through the digital future of Japanese literary studies.

Toshinobu Ogiso, National Institute for Japanese Language and Linguistics

On Japanese Corpora and Tokenization