【From the World’s Libraries】Digital Chinese Rare Book Collection of the Harvard-Yenching Library

Kentaro NARITA
U-PARL Project Research Fellow

The Harvard-Yenching Library, located in Massachusetts, USA, is home to one of the largest collections of materials in the world concerning East Asia, and surely there are very few researchers specializing in Asia who do not know its name. Originally, the Harvard-Yenching Library, which supports the research activities of the Harvard-Yenching Institute, developed as an organ separate from Harvard University, but is now under its control and responsible for a part of the Harvard Library, which is comprised of over 70 individual libraries within the university.

Recently on Facebook, the Harvard-Yenching Library announced the completion of the digitization of its full Chinese rare book collection. (See this post)

This digital collection, which is said to extend to as many as 4,200 titles, is searchable through the Harvard Library’s HOLLIS+.

An overview of the digital collection is posted under “Digitization Projects” at the Research Guide for Chinese Studies on the Harvard Library website.

Many collections are listed, but of those, the following contain Chinese rare books.

・Chinese Rare Books- Unique, Manuscripts, ect. 哈佛燕京圖書館中文善本特藏 稿,鈔,孤本(傅斯年圖書館合作項目)
・Chinese Rare Books Collection- Classics & History 哈佛燕京圖書館中文善本特藏- 經,史部(中國國家圖書館合作項目)
・Chinese Rare Books Collection- Collectaner Section 哈佛燕京圖書館中文善本特藏- 叢部
・Chinese Rare Books Collection- Collected Works 哈佛燕京圖書館中文善本特藏- 集部
・Chinese Rare Books Collection- Oversize 哈佛燕京圖書館中文善本特藏- 特大
・Chinese Rare Books Collection- Philosophy 哈佛燕京圖書館中文善本特藏- 子部

However, rather than each of these being linked to individual databases, they all link to HOLLIS+ search results. In order to conduct a more detailed search you need to be conversant in the HOLLIS+ search function, but when doing so there are a few points you need to consider.

Issues arise because these collections were not all digitized at once, and they have been created one by one as separate individual databases. Among these, as you can see from the collection names above, there are portions that have been created through collaboration with overseas libraries—such as the Fu Ssu-nien Library (傅斯年圖書館) of the Institute of History and Philology, Academia Sinica in Taipei and the National Library of China (中國國家圖書館) in Beijing—and these had already been integrated into HOLLIS+ by the Harvard-Yenching Library, with the original collaboration partner library databases still active.

When being integrated into HOLLIS+, processing must have been done to bring together metadata created under different rules for each database, and for that reason, it seems the metadata on which the searches are based contain anomalies. Let’s take a closer look at the specifics below.

First, looking at the above it appears as though the collection has been categorized into subdivisions according to the “four groups of Chinese classics” (四部) system, but this is not necessarily the case with the “Classics & History” (經,史部) category; and, because all of the books digitized under the collaborative project with the National Library of China are placed under this category, sometimes books from other categories are intermixed. For example, the book Jiao Chuang Xiaoshuo 勦闖小說 should be categorized in the four-groups system under “Collected Works” (集部) in the subdivision “Chinese Short Stories” (小說). However, because the metadata associate it with the “National Library of China–Harvard-Yenching Library Chinese rare book digitization project,” it ends up appearing in search results linking from “Classics & History.” Alternatively, this book does not appear in search results for “Collected Works.”

Next, if we look at the search results for “Philosophy” (子部), there are 912 hits. This is too many to simply browse through, so if you are familiar with the four-group system, you will want to narrow it down to a subcategory. However, bluntly stated, this is not possible.

As shown above, there is a “Subject” function for narrowing down results, and one of the choices is “Zi bu” (子部; “Philosophy”); but, selecting this brings the results down to 35 items for some reason. That is to say, the “Zi bu” tag has not been allocated to all of the “Philosophy” category works. If you narrow the search with the “Zi bu—Lei shu lei” (子部–類書類) tag that appears under “Subject” on the next page, 15 items of “Lei shu lei” (類書類) books are shown. However, it is obvious this tag has not been allocated to all books that should be classified under “Lei shu lei.” Unfortunately, with this system you are not able to simply browse through all items under a given subcategory in the same way that would be easy with a paper-based catalog for Chinese rare books. Further, tags such as “Zi bu” and “Zi bu—Lei shu lei,” which are derived from the four-group system, do appear in the column on the left, but even if you open the “Details” tab for an individual listing, the system is configured such that you are unable to ascertain the tag.

You also need to be careful when searching by either title or author.

First, you should go to the “Advanced Search” function in HOLLIS+. When you want to search only for digital images, it appears you should enter “digitization” in the first search window as per below. If you don’t, paper-based books and essays stored in the Harvard Library will also come up as results.

If you then search “墨池編” under “Title,” two examples of “墨池編” come up as results. If you then click “View Online,” the image can be freely browsed or downloaded. When searching in Chinese characters, the system recognizes both traditional and simplified characters, but it does not recognize Japanese jōyō “regular use” characters that differ from both the traditional and simplified Chinese characters. What is more troublesome is that—as is the case with the work Dao De Zhenjing Yijie 道德真経義解—the jōyō character “経” is mixed into some titles for some unknown reason. This consequently means you will not be lead to such works if you search by the traditional character “經” or by the simplified character “经.” This sort of metadata inconsistency may have a not insignificant effect on the accuracy of searches overall.

Again, if you search for “杜工部集” under “Title,” there are no results. However, you must not jump to the conclusion that 杜工部集 is not included in Yenching. Let’s try searching again under the pinyin “du gong bu ji.” This will give you the two results for 杜工部集 seen below. That is to say, depending on the book, there are cases where Chinese character title metadata is not picked up by the search system. For this reason, it is safer to search the names of works in pinyin. When doing so, some pinyin titles are separated by syllable and others are separated by meaning, such as “du gong bu” and “Du Gongbu,” respectively, for 杜工部 mentioned above. You need to be careful and take note that results are not limited to one pinyin separation method or the other. On the other hand, because the system ignores the difference between upper and lowercase letters and the difference between regular letters such as “u” and diacritic variations such as “ü,” you needn’t worry about that.

Lastly, when searching by writer under “Author / creator,” there is no problem searching in either Chinese characters or in pinyin. In the case of 白居易, you can search with the pinyin “bai juyi.” However, if the name is split into its individual syllables, “bai ju yi,” it will not come up as a result, so you need to be careful. Also, the system not only responds to real names but also recognizes courtesy names and pseudonyms in many cases. However, there are definite shortcomings in searching for courtesy names and pseudonyms, as is the case with the courtesy name “樂天” (Letian) for 白居易, where the traditional characters “白樂天” provide no results and you must search with either the simplified characters “白乐天” or the pinyin “bai letian.”

Let’s sum up the points of caution from above.

・ You should not assume the collection can be browsed according to the system of four groups of Chinese classics.
・ Title/author name, etc. should searched after entering “digitization” in the HOLLIS+ Advanced Search mode.
・ I suggest you search for author name in pinyin, while remembering to try different syllable combinations.
・ Searches for author name can also be carried out in Chinese characters (traditional or simplified). In pinyin, care must be taken with the way in which syllables are separated.

Above, I have attempted to compile a list of basic issues that have come to my attention. I hope this will allow researchers of Chinese studies to enjoy a more comfortable digital life!