The Impact of the Digital on Japanese Studies
Session 4 - Abstracts
Jonathan Abel, Penn State University
On Collecting Data
This presentation will focus on problems with data collection in three unrelated projects. First, I will discuss some cautionary differences between received or found source data (that which is collected by others) and active data collection (through scraping), drawing on the example of Twitter. Second, I will introduce some of the problems with managing a group of data collectors through a discussion of the Japanese data portion of CINEmap, a project I am working on with colleagues to geotag scenes in world cinema in order to reconnect spaces captured on film with the world. Third, I will introduce the issue of data creation via a project that attempts to code generic categories within the Journal of Japanese Studies and Monumenta Nipponica in order to substantiate field-level speculation on changing interests, factions, and debates in Japanese Studies.
Hoyt Long, University of Chicago
On Aozora Bunko as Archive
Aozora Bunko began in 1997 as a crowd-sourced initiative to create a public digital library for literary and other works no longer under copyright. It now comprises some 13,000 texts and is arguably one of the most promising resources for data-driven studies of modern Japanese literary history. In this talk, I will consider the possibilities and limitations of Aozora Bunko as a digital archive that sits in relation to “modern Japanese literature” as a historical construct, but also transcends it in specific ways. A brief a history of the archive and a statistical analysis of its composition will be a chance to raise questions about its “representativeness” as an archive and about the problem of historical sampling in general. I will also introduce some of the new forms of analysis that this archive makes possible, ranging from an advanced search interface (“Aozora Search”) that I have created with colleagues at Chicago, to more advanced methods in natural language processing and machine learning that make it possible to distant-read at the scale of hundreds, or thousands of texts. Ultimately, Aozora Bunko provides a frame with which to think through the digital future of Japanese literary studies.
Toshinobu Ogiso, National Institute for Japanese Language and Linguistics
On Japanese Corpora and Tokenization