This presentation is a direct continuation of my previous work on n-gram frequency analysis applied to assessing the authorship of a series of medieval Japanese Buddhist texts, published in the 2018 in the Japanese Journal of the Association for Digital Humanities. It aims to develop better tools and workflows by tackling both the problem of digital editions, and their insertion into a working database.

This series of Buddhist texts—called shōgyō (“sacred teachings”)—was largely ignored by historians for a long time, but the situation is gradually changing. These texts were mostly composed in Japanese kanbun (classical Chinese read using the Japanese word order), and their writing style is extremely fragmentary, almost cryptic at times. In fact, they are not designed to be comprehended by the non-initiated, and often require oral transmission of knowledge from a master in order to be truly understood. In order to determine their contents and to utilize them as proper historical sources, one has to reconstruct the vast knowledge network upon which they were built. Previous work by both Japanese and Western scholars has demonstrated that computer-assisted analysis of this massive corpus of texts, spanning an extremely long period, serves as a valuable tool to accomplish this task.

Building on this work, this presentation is divided into two main parts. The first, “edition,” starts with the manuscript (or in some cases printed editions), focusing on the actual process of typing the text and editing it into a workable format. I will mention a few encoding issues, especially with rare Siddham characters. Great progress has been made in the creation on a unicode font, including ligatures and characters created in Japan, so this section will both deal with temporary workarounds and more permanent solutions that will hopefully be attainable in the near future.

However, the main focus will be on what can be done with the text once it is digitized. In order to achieve a better understanding of this extremely nuanced genre of literature, a slightly custom TEI framework will be applied to the texts. In this context, I will expand and apply to the TEI framework a system of speaking voices that I proposed in a previous article (2018). The idea is to distinguish passages written by the author, quotes from either previous masters or canonical texts, notes and commentary outside of the main text, and reading indications, in order to better isolate the most meaningful parts of a given text. By tagging parts of the texts according to their level of discourse, my aim is to refine the data and to easily divide quotations of previous sources from original content. This, in turn, will help better define the originality of each text as well as its intertextual relationship to each other.

The second step is a direct continuation of the first, and will propose a few ways to integrate these texts—and their particularities—within a working database. My work will be based on my ongoing collaboration with Professor Christian Wittern, especially regarding his Kanseki repository. I will provide a few examples whose implementation could pose problems, such as the existence of multiple manuscripts, differences in the Japanese readings (kundoku) found in each copy of the text, and assess a few possible solutions, taking into account the next possibilities given by my editing workflow.

In the context of both creating a digital edition and implementing a database, the form of the document itself, and especially the treatment of marginal indications (such as notes in the upper and lower parts of the page and characters written on the backside as well as kundoku markers) will be crucial. These elements are closely related to the text, but they were not necessarily generated by its author and may have been added at any point during the transmission process of a given manuscript. To give a concrete example, there are several different extant manuscripts of the Goyuigō daiji, a text first composed by the Shingon monk Monkan (1278-1357) in 1328. Close inspection of the colophons, characters written on the backside, and reading indications, show that there are at least three different lineages of manuscripts of this text. Such variations are not limited to subtle discrepancies in the contents. In fact, the presence of different reading indications can lead to a distinct way of reading and understanding the text itself. This will lead me to stress the need for a manner to visualize such genealogies in a database, and question the ways we define what a text or a work actually is.

In my conclusion, I will also assess the value of my findings not only as tools, but also within a broader philological context. My focus will be on the problem of the authorship of these texts, building on the pioneering book published by Raji C. Steineck and Christian Schwermann. In fact, this presentation will allow me to question further the very notion of work in such composite, and even cumulative works, building on and expanding previous research on the subject with new data.