Digital Resources for Sinologists 1.0


Digital Resources for Sinologists 1.0

Part I: An Introduction to Chinese Electronic Dictionaries and Criteria for Their Evaluation

This article provides an overview of Chinese electronic dictionaries, followed by a detailed annotated list of the main digital lexica, resources, and reference tools currently available. As usual, when dealing with digital resources, the examples discussed below will soon become outdated, perhaps even within the next few months. We will therefore first provide a set of general guidelines on how to evaluate and compare electronic dictionaries and related reference tools before discussing the applications themselves.

In late 2012, Victor Mair gave a lecture entitled “Sinology Then and Now: Methods and Aims” at Peking University (see YouTube video here). Given the vast scope one would expect from a title like this, it is remarkable that he delineated the major periods in Sinology by the availability of new and groundbreaking lexicographic works.

Traditional lexicography and lexicology are time-honored disciplines, and the basic tenets that guide their creation have remained relatively stable in recent years. Recently, advances in digital technologies have led to new, although long anticipated, types of dictionaries (such as WordNet or the Oxford Collocations Dictionary), which are based on word frequency statistics, collocations, semantic relations, and so on. This has led to a reevaluation of the organization of lexicon-entries and the structure of dictionaries, as well as placing a greater emphasis on understanding the roles dictionaries can play in assisting as well as examining the cognitive processes of vocabulary acquisition and language learning. While conventional alphabetic or radical-based organizations group lexemes that usually have no semantic relation, new clustering and syntagmatic approaches aim at presenting co-occurrences, just as a reader would find in the real world. Dictionaries that are based on real language observation can thus also become more descriptive and less normative, hopefully leading to developments in lexicographic theory. A practical example is the annotation of entries with difficulty- or domain-specific frequency levels (similar to HSK levels), thereby facilitating the acquisition of new vocabulary.

The earliest digital Chinese-English dictionary that was freely available and widely used was the CEDICT, initially released in 1997 and maintained by Paul Denisowski ( In 2007, it was renamed the CC-CEDICT and made publicly available under the Creative Commons Attribution-Share Alike 3.0 License. It has then been adapted by a host of dictionary projects, some of which have already gone out of use while others have become among the most widely used Chinese lexical tools available today.

Therefore, as with other types of digital content, the Internet has allowed for a shift of authorship from traditional professional organizations to amateurs and enthusiasts roughly at the same time a sea change in traditional publishing has occurred. Similarly, the potential to reach a wide audience of contemporary scholars has improved greatly with digitization. Thus, questions of reliability and quality must be considered, as well as familiar problems of discourse, authority, language policy, and censorship.

How does digitization change dictionaries?

An obvious improvement in bilingual digital lexica is that by design they can support dynamic multi-scope targeted searches and individualized organization of the data (or subsets thereof), whereas a print lexicon can be organized in only one direction. Electronic dictionaries can more easily be updated and keep up with linguistic changes such as neologisms and semantic drift. They also allow for expansion and customization by readers themselves. Furthermore, we often see a single search interface or framework application, which can query several different databases, and individual dictionaries can be included or installed as need dictates. Advanced search tools can not only link the lemmata between two languages but also return any lexical entry that contains the search term somewhere in the definition, example sentence, or any other data field. To aid with searching, many lexicons can suggest corrections of misspelled terms, show possible continuations and do simple word-root stemming, which can be very helpful to users who are not familiar with affixes. Some digital lexica also allow for multi-level logically connected search terms (using the operators AND, OR, NOT, and so on), and a few even permit full-fledged regular expressions, which is a system for evaluating patterns of character strings, with its own grammar, conventions, and user community. Thus, the reverse dictionary has now been made superfluous, and concordances can be more or less completely automatically generated. In short, restraints that had been imposed by the need for conciseness in the old medium have become less forbidding.

The calculator-size handheld devices that had their heyday over the past few decades have largely become replaced by smartphone apps. At the cutting edge of these, Word Lens (currently not available for Chinese) provides immediate character recognition and auto-translation of any text photographed with the phone’s camera. Thus, dictionary systems and related lexical tools have recently been improving rapidly, becoming more comprehensive, faster, and including many new functions. However, when we look under the hood of today’s real-world implementations, things get significantly more complicated.

What are the special characteristics of Chinese and bilingual Chinese-English dictionaries?

Without abetting the myth of radical (linguistic) exoticism, there are nevertheless a number of attributes in which Chinese lexicography differs radically from that used for English and other languages based on phonetic scripts.

Lexica organized around a logographic, non-alphabetic script like Chinese will usually allow for searches based on the graphic form(s) of a character (graph or grapheme), or by transcribed pronunciation (in Chinese, this is commonly Mandarin putonghua pinyin 普通話拼音, though some dictionaries now also include pronunciations from other dialects), normally via Romanized input; tone indicators can often also be added to reduce the number of homonyms returned when using phonetic search criteria. While there are several robust voice recognition applications available, we are not aware of many dictionaries that have incorporated this feature yet (Pleco being one of the few examples).

Most graphs can be searched for using either simplified or traditional forms, and sometimes variant or paleographic forms, as well as graphic subcomponents. These search systems mainly use the traditional semantic determiners called “radicals” (bushou 部首), which vary in number depending upon the system used by the lexicographer and have been the most common organizational system in Chinese over the past two millennia. In terms of etymology, their importance has slowly diminished, since some follow semantic considerations while others are phonetically motivated, or are on occasion completely arbitrary. Newer electronic dictionaries thus also allow for search based on components that are not “radicals” but follow the same compositional rules: subcomponent or initial stroke type, followed by the number of additional strokes. It isn’t the case that non-radical based graphic lookup methods only appeared with digital dictionaries; in print we can sometimes search for a graph by its full number of strokes, or use the “four corner index,” but these are less efficient since the amount of characters that must be searched visually by the reader becomes quite large.

If we include graphic input of encoded characters, drawn, scanned, or photographed images of characters, Romanized or otherwise written pronunciation, and audio-representation in recognition or production, one can easily employ up to four or more scopes, any combination of which can be used as primary organizing feature and search lemmata.

In cases where the user is faced with a written Chinese character of which neither meaning nor pronunciation are known, s/he has the following options: a) find the primary radical or other component and count its strokes, then look up the radical and count the additional strokes, or b) paint or otherwise capture the character and use Optical Character Recognition (OCR) software. In this case, we see that older solutions required that characters be drawn following the traditional stroke order, while most newer algorithms allow for any order as long as the result matches the desired form. The user may also be c) interested in the correct order of strokes for the character, so many dictionaries display stroke order animations. Closely related to this function are “translations” of standard graphic forms into fonts resembling handwritten script forms, intended to help the learner become familiar with these widely used scribal variants.

In cases where the user hears a Chinese word and wants to look up its meaning or graphic form, s/he has the following options: a) enter the pinyin or appropriate alphabetic transcription and search through a long list, and b) reduce the list by adding the presumed tone. In this part of search, which is usually taken over by dictionary-independent input methods (IME), the user can quite naturally as in spoken conversation also c) add more context to reduce ambiguity of the input string, which most IMEs facilitate natively based on usage frequency (known as n-grams). As a review of the technology employed by IMEs would require an article of its own, in brief let us say here that aside from alphabet-based input of transcripted phonemes there exist several common methods which map PC or mobile phone keyboards to graphic compounds without any connection to phonetic values whatsoever. These methods are generally more difficult to learn but eventually allow for significantly faster input. Another obvious advantage of stroke-based input is that users are then less prone to losing the ability to handwrite characters, a problem which is increasingly being perceived as affecting literacy in general, and not just by opponents of computerization.

If we combine the factors of the user’s native tongue, the need to resolve written or spoken language, the desire to receive or produce speech, the intention to learn the language or translate efficiently, and the direction of translation we quickly find there are an exponential number of ways to configure an adequate search result. At first, digital dictionaries simply reproduced the structure of their paper precursors, and only later did they begin to make better use of the new ranges of function. In theory, one and the same digital dictionary could be customizable to best meet the needs of any type of user, and even language domain, strata, usage by period, gender, or geographical varieties could be incorporated simply by using corresponding techniques and intelligently ordering the search results. As yet, none of the electronic solutions we have reviewed make adequate use of these possibilities, so in search of real variety, at this point we must still rely upon print works.

Criteria for Evaluating an Electronic Dictionary

We can summarize the above-mentioned aspects to focus on specific user groups, use cases, and search methods. Most importantly, however, dictionaries can be primarily characterized by the types of lexicographic information they provide. These may include translations, definitions, parts of speech, explanations of grammatical particles, measure words or noun classifiers, treatments of polyphonic characters, example sentences (which can be retrieved from real world language corpora or contrived by the editors), pragmatic suggestions, and verb valencies. More fundamentally, dictionaries can distinguish between characters 字, compound words 詞, and phrases 辭. They can pay attention to neologisms, proper names (people or places), idioms, common errors, slang, semantic relations (like hyponyms, hyperonyms, meronyms, synonyms, or antonyms), level or register of language, colloquial vs. written styles, etymology, dialect usage and periodization, and specialized lexical domains (for example, the trilingual French-English-Chinese International Dictionary of Refrigeration) that one would find on the bookshelves of a well-stocked bookshop in China. Some digital lexica give citations for printed reference works, distinguish traditional and simplified characters, and/or offer results regardless of form. Others focus on quotations, insults, rhetorical devices, or rhymes; the list is virtually endless.

As further criteria to evaluate electronic dictionaries, we should look at the licensing and price policy. Does an open collaborative project ensure that it will always be free of charge? Will updates or subsequent versions be free for those who have already bought a license? Will the license purchased for one operating system extend to versions for another OS or for one’s mobile phone?

Usability can be evaluated judging by the number of necessary mouse-clicks, the degree of internal and external hyperlinking, the use and customizability of colors, font sizes, arrangement and types of results, the language of the framework application, and many more criteria. Some dictionary applications offer a wealth of extra features that we can only hint at here. Does the application make use of the statistics users generate, and does it self-customize? How easily can one sync personal data like user-created entries, frequent compounds, flashcard stats, and so on, and do they work with other installations of the same application? Does the application keep a record of the search terms to make memorization more efficient? How well does the application integrate with other programs? (This criterion is especially important for pop-up dictionaries that work inside browsers, word-processing and spreadsheet applications, and sometimes the operating system itself.)

Ultimately, each user must judge an electronic dictionary or digital lexical tool based upon his/her individual needs, goals, and usage patterns. Thankfully, many recent lexica and digital tools allow for a wide range of customizations, designed to better serve the increasingly disparate needs of our rapidly growing and ever more diverse communities of users.


Part II: An Annotated List of Common Digital Dictionaries / Lexical Tools / Learning and Translation Tools / Encyclopædia

Sources: David Hull and the “Sinologists” Facebook Group, digital resource bibliographies at Harvard, Princeton, Chicago, Darthmouth, Heidelberg, Clavis Sinica and

Key:    Resource Name(s)
Online URL
Author / Editor / Source
Brief description of main attributes
Additional notes


General Dictionaries/Lexical Tools

Hanyu da cidian 漢語大詞典
No online version. CD-ROM available from the publisher. Also included in Lingoes and Pleco, see below.
Luo Zhufeng 羅竹風, ed. The Commercial Press (H.K.). Version 3.0, © 2010

The Hanyu da cidian is by far the most comprehensive dictionary available for the Chinese language, including over 300,000 entries, with full definitions and historical citations for each entry. Compilation of the dictionary was undertaken from 1979 to 1993 by a staff of over 300 scholars and lexicographers; the digital version has its own native search functions, or users can enter search terms directly.
In Chinese only, with separate editions for traditional and simplified characters.

Zhongwen da cidian 中文大辭典
Zhang Qiyun 張其昀, ed. © 2006, Chinese Culture University 中國文化大學 (Taiwan).

Based originally on Tetsuji Morohashi’s 1960 Dai Kan-Wa Jiten 大漢和辞典, the Zhongwen da cidian was the preeminent Chinese dictionary until the publication of the Hanyu da cidian. The online version of the dictionary contains over 370,000 entries, and is fully searchable by single graph or phrase; a native radical-stroke index is also provided.
All entries and search functions are in traditional Chinese characters only.

Wenlin 文林
No online version. Digital installer or CD-ROM available from the publisher at
Wenlin Institute, Inc. 文林研究所 © 2006, version 4.1 released 2013.

Wenlin is a Chinese-English standalone application featuring a bilingual dictionary (originally based on John DeFrancis’ ABC Chinese-English Dictionary), text editor and flashcard system. It contains over 10,000 characters and approximately 200,000 words and phrases. Variant and paleographic forms, stroke order, subcomponents, and a variety of lexical references are also listed for each individual graph, any of which can be used as search lemmata. Unicode, GB, Big5, and UTF-8 encodings supported.
Both traditional and simplified characters supported, with a built-in converter.

Sou wen jie zi 搜文解字 (incl. Hanyu da zidian 漢語大字典)
Huang Juren黃居仁, ed. Institute of Linguistics, Academica Sinica中央研究院語言學研究所.

The single-character search function from the Sou wen jie zi dictionary contains the entries from the 54,678 characters in the Hanyu da zidian in a multi-scope searchable digital interface. The “phrase search” function returns standard phrases in which a graph is located, configurable by location within the phrase, length of the phrase, or reduplicated binomes. The “text search” function searches for occurrences of the search term within the following classical texts: the Analects 論語, the Mengzi孟子, the Great Learning 大學, the Doctrine of the Mean 中庸, the Laozi 老子, the Zhuangzi 莊子 and the Three Hundred Tang Poems 唐詩三百首.
Entries and search functions are in traditional Chinese characters only.

Zdic 漢典
© 2004–2013, is the most comprehensive free online Chinese-Chinese dictionary, containing detailed definitions and a wide variety of lexical data (character stroke animations, radicals and subcomponents, Mandarin putonghua pronunciations with sound files, variant graphic forms, encoding data, and input sequences) for virtually every Chinese graph in Unicode CJK character set. Entries from the classical Kangxi zi dian, Shuo wen jie zi and Song ben guang yun dictionaries are provided, as well as sample paleographic forms, pronunciations in various dialects, and single-word English translations for each graph.
Entries are in simplified Chinese, though the search function accepts a wide variety of graphic forms.

Source: CC-CEDICT is a site based in the Netherlands, directly linked to and supported by the CC-CEDICT project, the largest collaborative public Chinese-English dictionary (wherein users provide entries and corrections to the lexicon). Along with extensive lexical data, MDBG is one of few dictionaries to provide the HSK level of each word. The main search function, the “Word dictionary,” can search by traditional and simplified graphic forms, Mandarin pinyin, and by English keyword. One particularly useful function is that users can input a short text in Chinese characters, which is then parsed programmatically and returns the entries for each graph, word, or phrase in turn. The “Character dictionary” additionally accepts Yale or jyutping Cantonese, cangjie and four-corner codes as search lemmata. Along with encoding converters, MDBG now offers integrated lexical tools for Chrome and Firefox browsers.
Search lemmata include traditional and simplified Chinese, English keywords, and text strings.

Thesaurus Linguae Sinicae 新編漢文典
Christoph Harbsmeier 何莫邪, ed., Jiang Shaoyu 蔣紹愚, asst. ed.

The Thesaurus Linguae Sinicae (TLS) is a very large lexicon uniquely designed to show correlations among Chinese words with “synonymous” meanings within syntactic categories. Most remarkable about TLS is its attempt to systematize lexical and grammatical knowledge and use it as the foundation for a database of non-contemporary Chinese. Synonym groups are in English, each including a detailed English definition. The English terms which form the basis of the database are taken from many highly regarded translations of a wide variety of early Chinese texts, both transmitted and excavated, including much of the early Buddhist, Confucian and Taoist canons. Data for each graph includes definitions taken from Karlgren’s Grammata Serica Recensa, early Chinese reconstructed pronunciations and phonological data, variant character tables, and the lexeme representations and lexemes that inform the synonym groups.
Searches can be performed in English or traditional Chinese, within the following categories: Headword, Word, Synonym Group, Word in a Text, Word Attributed to a Text, Syntactic category (English only, based on Chinese categories), Rhetorical device (English only), and Character definition.

Chinese Text Project Dictionary 中國哲學書電子化計劃字典
Donald Sturgeon, ed. © 2006-2013

The CTP Dictionary features a good amount of lexical information, including pronunciations in Mandarin pinyin, Cantonese jyutping, and Stimson’s Tang reconstructions, detailed graph composition information (and other graphs related by component), numeric references to other dictionaries, and full entries from the Shuowen jiezi and Guangyun dictionaries. This dictionary has three additional unique strengths: 1) Full integration with the extensive textual corpora on the site (see its listing in the Encyclopedia section below), featuring a wide range of examples drawn from the texts provided for each graph and many common compounds; 2) The dictionary search function accepts strings of characters, or one can simply use any of the texts on the site, and then provides basic lexical information (pronunciation, meanings, variant graphic forms and links to other dictionaries) for each graph in the order entered; and 3) The Advanced Search function “Text/concept search: Dictionary match” permits searches limited by specific gloss of many graphs, as generated from the English translations.
Searches can be performed in simplified or traditional Chinese, including many variant forms; the ability to parse whole texts and use the individual graphs as dictionary input is a powerful feature.           

Kingsoft Corp., © 2012

The online version of their offline Powerword 金山詞霸 translation software, the bilingual iCIBA dictionary features numerous detailed Chinese-English definitions and a wealth of contextual example sentences for both Chinese and English terms. Compounds, variations and synonyms are also provided for each entry, with encoding and grapheme information also available for Chinese. There is also a browser add-on which provides pop-up definitions and lexical data, for Chinese or English.
Searches can be performed in English, traditional or simplified Chinese, though the lexicon is in simplified characters only.


The 5156edu website contains several types of lexica, mainly intended for young Chinese students studying Chinese. The dictionary provides most of the main definitions for a character, each accompanied by source citations from classic texts (similar to those in the Hanyu da cidian), followed by compounds and their short definitions. Related idioms and homonyms are listed on the left side.
The 5156edu sites are all in simplified Chinese characters only, including search functions.

Lin Yutang’s Chinese-English Dictionary of Modern Usage 林語堂《當代漢英詞典》電子版
Chinese University Press, Chinese University of Hong Kong 香港中文大學出版社.

The digital version of Lin Yutang’s 1972 dictionary provides his extensive Chinese and English glosses for over 40,000 Chinese graphs and compounds, divided by parts of speech, accessible via a well-designed search system that allows the user to search any parts of the dictionary using a variety of criteria. A full English index and user’s guide is also provided.
The entries for the dictionary are all in English and traditional Chinese; there is no support for simplified characters. 中文.com
Rick Harbaugh, ed. © 1996-2011

This site is designed for the learner of modern Chinese and includes a number of study aides, sample readings, and links to other sites. The dictionary features simple glosses for Chinese characters and compounds, explanations for graphic composition, and parts of speech (unfortunately all entries are images, so they cannot be cut-and-pasted into other applications). The dictionary’s main strength is that it is arranged by common component, which allows the user to quickly jump to other graphically similar entries.
Searches can be made in English, Mandarin pinyin, or by using the site’s own search functions.

Wictionary 維基詞典 Chinese; other languages available via
Wikimedia Foundation, Inc. Open content: all entries written and edited by volunteers.

The Chinese Wictionary itself is extremely extensive, containing over 800,000 entries, but its main strength is that any searches can return results from all the Wiktionaries in more than 175 languages. The lexical entries are generally quite simple, with standard dictionary references, orthography (including variants), pronunciations (including Cantonese and reconstructions), simple English definitions, and links to related entries, followed by entries for the graph/compound found in other Wiktionaries (in other languages).
Searches can be made in many languages/scripts, but will sometimes fail to return Chinese language data, even for a Chinese character.

Revised Chinese Dictionary (Taiwan Ministry of Education) 重編國語辭典修訂本 (漢漢)
Taiwan Ministry of Education中華民國教育部, © 1994

This dictionary website run by the Taiwan MOE is in traditional Chinese characters only; basic character information and definitions by part of speech are provided for individual graphs and a large number of compounds, including the locus classicus for each.
Entries and search functions are in traditional Chinese characters only.

nciku (“next-词酷”)
Beijing DFHL Co. Ltd. 北京东方慧灵科技发展有限公司

According to the site, the nciku system is designed to be “an illustrated and interactive learning resource.” As of this date, it is not much beyond a basic bilingual dictionary site, with entries taken from the Foreign Language Teaching and Research Press Chinese-English dictionaries, the Collins Chinese-English dictionaries, the English-only Macmillan Dictionary, and the Chinese-only Xiandai hanyu guifan cidian 现代汉语规范词典. Basic glosses are provided for each word/graph/compound, followed by a number of example sentences and collocations. Stroke order animation and character composition is also provided for Chinese graphs.
The site is in simplified characters, but searches can be performed using traditional characters and English as well. Mobile apps for iPhone and Android are available.

KangXiZiDian 康熙字典網上版, © 2009

This website is very much what one might expect: images of pages from the 1716 Kangxi Dictionary are presented alongside essentially the same content in digital form. The dictionary also contains a basic search function.
Entries and search functions are in traditional Chinese characters only.

Le Grand Dictionnaire Ricci de la langue chinoise 利氏漢法辭典
Koninklijke Brill NV, © 2012

With nearly 300,000 entries, this dictionary describes itself as “the most comprehensive up-to-date dictionary of Chinese into a modern Western language.” Detailed definitions for each entry are provided, and all entries are in French only.
Single-graph searches in traditional Chinese characters only.


Specialized Dictionaries/Lexical Tools

Unicode, Inc. © 1991–2013

The Unihan Database is the result of the effort to create one unified character set for all Chinese, Japanese, and Korean graphs; it currently supports tens of thousands of unique graphs, and is now the standard character set for most systems. The website features grid display and two search functions: radical and stroke count, or text search for English gloss or pronunciation in Mandarin pinyin, Cantonese jyutping, Stimson’s Tang pronunciation from T’ang Poetic Vocabulary, Korean hangul, Japanese on, or Japanese kun. On the webpage for each graph, full encoding data, stroke count, dictionary codes, and direct links to the online MDBG, CantoDict, and WWWJDIC dictionaries are provided.
One should be aware that only a subset of all Chinese graphs have been included into the Unicode standard—thereby becoming searchable and independent of modalities such as writing style and format—and only a subset of this is represented in most Chinese fonts. For a more complete coverage it may be necessary to install the CJK Extensions, such as the PMingLiu-ExtB font package. (See also
The database is also freely available for download as code charts in PDF or data text file format.

Dictionary of Chinese Character Variants 異體字字典
Taiwan Ministry of Education中華民國教育部, © 2000

This well-designed dictionary from the Taiwan MOE is the best online resource for checking the variant forms of a graph. The dictionary features basic definitions and citations in digital text for each variant, and the right side panel contains linked images of citations taken from the pages of a number of standard reference works for graphic forms, including: 說文解字, 古文字類編,漢語古文字字形表, 漢簡文字類篇, 漢隸字源, 隸辨, 金石文字辨異, 偏類碑別字, 碑別字新編, 六朝別字記新編, 敦煌俗字譜, 干祿字書, 龍龕手鏡, 龍龕手鑑, 玉篇, 廣韻, 集韻, 集韻考正, 類篇, 四聲篇海, 宋元以來俗字譜, 字學三正, 字彙, 正字通, 字彙補, 康熙字典, 彙音寶鑑, 異體字手冊, 大陸簡化字表, 角川漢和辭典, 中日朝漢字字形對照, 中文大辭典, 漢語大字典, 中國書法大字典, 草書大字典, 學生簡體字字典, 簡體字表, 佛教難字字典, 中華字海, 古文四聲韻, 宋體母稿異體字, 書法字彙, 重訂直音篇, and 補充資料.
Searches must use the site’s two interfaces (radical and stroke-count), and the site is in traditional Chinese characters. For offline use, a 10-CD set is also available for purchase.

Gg-art China Museum Calligraphy Dictionary 中华博物汉语字典书法字典
Guangzhou Huan Hui Cultural Development Company, Ltd. 中华博物广州环汇文化发展有限公司

This Chinese dictionary is primarily useful for the many calligraphic and variant graphic forms provided for each character, including links to pages in many standard characters form reference works, but it also contains an array of definitions in Chinese and simple English glosses (divided by parts of speech), followed by compounds from the site’s Dictionary of Common Usage 常用词典.
Entries and search functions are in simplified Chinese characters, though traditional characters can also be used to return stroke order and compositionally related graphs.

Chinese Etymology
Richard Sears汉字叔叔, © 2003, 2008, 2011

This site contains images of graphic forms taken from the contents of three well-established paleographic dictionaries: 甲骨文编 (oracle bone forms), 金文编 (bronze inscription forms), 六書通 (seal forms), and說文解字 (seal forms). Unicode data, English glosses, component forms, etymological explanations, and direct links to other dictionaries are provided for each graph in the right-side panel.
Searches can be made using simplified or traditional graphs. iPhone and Android mobile apps for the site are also available.

Bamboo & Silk Manuscripts Paleographic Contextual Database
Center of Bamboo and Silk Manuscripts of Wuhan University 武漢大學簡帛研究中心. © 2005

The Center has created a database of single-graph images taken from the official publications of the excavated bamboo and silk manuscripts from the ancient states of Chu, Qin, and Han. Users can enter a character or component (pianpang 偏旁) into the main search box to view graphic forms or use the contextual search to see all phrases containing the character; both searches can be further narrowed by selecting specific manuscripts or corpora.
Entries and search functions are in traditional Chinese characters only.

Tomohiko Morioka, © 2010

This extensive Japanese-language website, designed for graph component analyses, also features images of graphic forms from a huge number of stone carvings (拓本文字), including both single graphs and compounds. Character data, full graph component analyses (including variant forms), and component maps are also provided, as well as an array of links to other graphs and graphic forms.
Searches can be made in traditional or simplified Chinese characters (though they return different results), or Japanese kanji.

Yoshio Yoshida, ed. © 1998-2013

This Japanese-language website is dedicated to photos of unique and interesting graphic forms of Chinese characters and Japanese kanji as found in public monuments and signs throughout East Asia. Graphic composition is discussed in Japanese next to each photo, along with location and similarities to other graphic forms.
There is a Google search bar on the site, or users can browse the numerous categories of graphic forms.

The Digital Etymological Dictionary of Old Chinese 古漢語詞源字典(網絡版)
Jeffrey Tharsen, ed. © 2012

As the homepage states, “This site contains a range of new tools designed to facilitate extensive analyses of the phonology and phonological structures of early Chinese texts.” Two interfaces allow characters or entire texts to be cut-and-pasted into the Text Entry box, and then the graphs are parsed and phonetic data returned for each character from the following dictionaries: Qieyun 切韻 and Guangyun 廣韻 (including Baxter’s reconstructed pronunciations), Baxter-Sagart 2011 (Old Chinese and Middle Chinese reconstructions), and Schuessler’s ABC Etymological Dictionary of Old Chinese (OC, MC and Late Han reconstructions, including etymological notes on sister languages and dialect forms). A text file upload function is also available for longer texts, and a multi-level search function returns sets of graphs based on the search criteria.
There are English and traditional Chinese versions of the website; all entries must be in traditional Chinese.

上古音查询 东方语言学, Shanghai Normal University Phonetics Lab上海师范大学语音实验室.

This Chinese-only website provides an array of Middle Chinese phonological data for graphs, including reconstructions by Bernhard Karlgren, Li Fang-kuei李方桂, Wang Li王力, William Baxter, Zhengzhang Shangfang 郑张尚芳, and Pan Wuyun潘悟云 (though there seem to be some errors in the data, so use with caution).
Entries are in traditional Chinese, and the search function accepts simplified or traditional graphs.

Phonemica 乡音苑
Kellen Parker and Steve Hansen, Phonemica users (crowd-sourced), © 2012

This well-designed website is designed “to record spoken stories in every one of the thousands of varieties of Chinese”, transcribe the recordings into putonghua and Mandarin pinyin (or occasionally IPA), translate them into English, and use them to analyze dialects and dialect patterns throughout China. Users’ comments on each recording are also listed.
A map or full list of recordings is available, and the site is also on Weibo.

Chinese Classics Concordances
Sergey Zinine, © 2009

This statistics-heavy site allows users to see which graphs were used and with what frequency in the following classical Chinese texts: 春秋左傳, 穀梁傳, 公羊傳, 詩經, 毛詩, 書經, 易經, 周禮, 儀禮, 孝經, 郭店出土文獻, 禮記, 孟子, 論語, and 莊子. Custom searches, text comparisons and concordances can be performed using graphs, compounds, phases, Mandarin pinyin and English. Baxter-Sagart 2011 and Sergei Starostin reconstructed pronunciations are also available via a separate search function.
The site is in English; texts and search functions are in traditional Chinese characters only.

Digital Dictionary of Buddhism 電子佛教辭典
A. Charles Muller, ed. © 1995

With over 60,000 entries, the DDB is an extremely extensive database of Buddhist terms and terminology, with searches allowed in Chinese characters, English, Sanskrit, Pali, and Tibetan. Entries are taken from a large multilingual catalogue of reference works and include pronunciations, basic meanings, translations and extensive contextual definitions listed under “Senses.” Search functions for the CJKV-English and SAT Daizōkyō Text Database are also provided.
Subscription is recommended (and can be gained free of charge by contributing to the DDB), but limited guest access is also permitted. Extensive topical indexes are freely available on the home page.

Popular Chinese Internet Memes, Slang, Expressions, Acronyms
ChinaSMACK, © 2008–2013

Despite the sensationalist nature of this website, the glossary is the best quick reference guide to the evolving slang used by Chinese “netizens” on the internet. Head words are in Chinese (more or less), with part of speech and pronunciation in Mandarin pinyin provided, followed by translations and definitions in English.
This is a simple glossary, with no additional search functions or links.


Learning and Translation Tools/Resources

Pleco Chinese Dictionary
Pleco Software Inc. © 2013

Available for purchase for iOS and Android phones, Pleco’s mobile Chinese dictionary application is the best on the market at this point. Featuring a number of excellent built-in dictionaries (including the recently added Hanyu da cidian, Grand Ricci, ABC Chinese-English Comprehensive Dictionary, ABC English-Chinese Dictionary, Xiandai hanyu guifan cidian 现代汉语成语规范词典, 21st Century English-Chinese Dictionary and Tuttle Learner’s Chinese-English Dictionary, the Grand Ricci) with many others available for free download, this full-featured app also includes a handwriting recognizer for input, a flashcard system, a text file interface, audio pronunciation in Mandarin (with both search and output functions),stroke order animations, and even an OCR designed to work with the phone’s camera (so one can take a picture of a graph and the dictionary entry will appear).
The dictionary can be set to traditional or simplified characters, and searches can be performed using simplified or traditional graphs, Mandarin pinyin, handwriting input, or any combination thereof.

Dr. Eye 譯典通
Inventec Corp. © 2009

The Dr. Eye Chinese-English-Japanese dictionary and translation software is the flagship product sold by this Taiwan-based company and includes an extensive array of built-in dictionaries. With versions for Windows computers, iOS and Windows Mobile, most of the standard features one would need are included, including mouse-over dictionary and translation apps, full document translators, multilingual and dialect pronunciations, and user interfaces in a variety of languages.
The software suite can be configured to display results in traditional or simplified Chinese and includes several different native input methods.

Clavis Sinica 释文解字
David Porter, ed. © 2013

Designed for students learning Chinese, the Clavis Sinica software suite contains a text reader linked to a Chinese-English dictionary with over 37,000 entries (single graphs and compounds), as well as graph composition and shared component functions, flashcard and vocabulary-building apps, and a small library of Chinese texts with audio (in Mandarin putonghua). There are versions of the software suite for Windows, Mac OS, Linux and most mobile devices.
Simplified and traditional Chinese are both supported, and thesoftware can use most input methodsas well as its own native interface for pinyin, stroke order, and English.

Perapera Chinese Popup Dictionary
Justin Kovalchuk, ed. © 2011, Perapera Language Tools

Widely regarded as the best free Japanese-Chinese-Korean plugin for Firefox and Chrome (Chinese only), Perapera features a pop-up interface driven by the CC-CEDICT dictionary, with glosses in English, French and/or German. Both single graphs and compounds are recognized, with many proper nouns and technical terms, and additional dictionaries can be added by the user.
There is no direct input nor search function, as this app is solely a mouse-over popup dictionary.

Lingoes Project, © 2006–2013

As the home page states, this free Windows-only application features “lookup dictionaries, full text translation, capture word on screen, translate selected text, and mouseover and pronunciation of words in over 80 languages.” Its main strength at present is the vast array of dictionaries that can be added to the interface, including 214 in simplified Chinese and 24 in traditional Chinese (including the Hanyu da cidian). The interface is highly configurable and can be set for any of 39 different languages, and users can also create their own dictionaries or use other freely available dictionaries (such as any in the Global WordNet Association, or JuKuu). Lingoes also provides audio for pronunciation, and includes a search history.
Lingoes is a standalone dictionary tool which contains no native interface for entry of Chinese characters, but most normal input methods (including cut-and-paste) and a “capture word on screen” function are supported.

Google Translate
Google Inc.

This web-based translation app currently features 71 languages, but its real strength lies in the way Google is harnessing libraries of translated works to drive the results, and then users’ preferences are recorded to rank the potential translations for any given word or phrase. All dictionary content is maintained by Google within the WordNet lexical database system; the Chinese WordNet中文詞彙網路 is primarily supported by National Taiwan University.
The main version of the app is deceptively simple, with just a text-entry box and then an interactive translation provided next to it. Websites and documents can also be run through the application using the native interface or the Google Translator Toolkit.

JuKuu 句酷
Guo Yongsheng, ed. © 2008

This Chinese-English-Japanese online application contains a simple dictionary but is also useful for providing a large number of bilingual example sentences in which the search term appears, along with a “definition distribution” 释义分布pie chart so users can see which translations are more or less common.
The website is in simplified Chinese, as are all the tools, including an offline version, a plugin for Microsoft Office, and a Windows Mobile app.

NJStar 南極星
NJStar Software Pty Ltd. © 2013

This company has offered a range of Windows-based Chinese-Japanese-Korean-English software for sale since 1991, including their own word processors with built-in dictionaries, instant translation (including website translation), specialized input methods, a lunar calendar application, and a full-featured suite for learners of Chinese.
The software is designed to interface with all applications, including word processors, web browsers, and even iTunes.

Babylon Ltd., © 1997–2013

While Babylon’s Desktop Translator translation software features 35 dictionaries for Chinese (plus the Britannica Concise Encyclopedia 大英簡明百科), it also serves as the main interface for their fee-based professional human translation services, which can accept a range of file types, including audio recordings, in any of 77 different languages. The company also offers a series of tools for language learners, including some designed specifically for young children, and the software runs on Mac or Windows computers and most smartphones.
Like NJStar and Lingoes, Babylon’s software suite is primarily composed of standalone applications that can interface with both offline and online resources.

Zotero Reference Manager
Roy Rosenzweig Center for History and New Media, George Mason University.

Zotero is a free, web-based or offline bibliographic reference manager (like Endnote), and has become apopular tool for many users working with Chinese and other Unicode character sets. When working with bilingual data, the information can be kept separate within the database yet combined when generating bibliographies, footnotes, and endnotes.
Zotero runs in Firefox, Chrome, or Safari under Windows, Mac or Linux OS. Zotero offers little direct support for bilingual Chinese users, but there is an active Chinese-Japanese-Korean forum on the website.

Bookends Reference Manager (Mac OS only)
Sonny Software, © 2013

Bookends is the preferred reference manager for many using Chinese, as it fully supports Unicode, has a well-designed interface and features its own online search engine for importing bibliographic data.
Bookends is available for Mac OS only, and is available for purchase from the website.

Brent Hou Ieong Ho, Hilde De Weerdt, Shih-Pei Chen and the European Research Council.

MARKUS is a new type of markup platform developed by Brent Hou Ieong Ho, Hilde De Weerdt and Shih-Pei Chen. It was first designed to automate the markup of different kinds of named entities (personal names, place names, temporal references and official titles) within Chinese texts, and has developed into a multi-faceted tool that gives users access to a range of online reference tools while reading texts in classical Chinese as well as the ability to tag and extract any kind of information of interest to them. In addition to names already present in the China Biographical Database and the China Historical GIS, users can tag words or phrases by uploading their own lists or by using built-in criteria. (MARKUS currently contains a small selection of built-in tags and search criteria, such as book titles, that could be useful in the analysis of quotations and citations). Users can also design their own criteria or regular expression search and tag the results. All markup can be edited while consulting the reference sources integrated in the platform and the final results can be exported for further analysis in the on-site visualization platform or other software.

At present, the MARKUS platform functions only in the Google Chrome browser. The project is currently in beta, and the authors continue to improve the accuracy of the automated markup and add additional modules. If you are a researcher with a project that could benefit from the system and you would like to become a beta-tester, please contact the project managers.



Siku quanshu 四庫全書 Subscription required for online access; intranet, hard drive and CD-ROM versions available for purchase.
Digital Heritage Publishing, Ltd.

The Siku quanshu is the largest single collection of books compiled in Chinese history (3,461 total works with over eight hundred million characters) and is currently also the largest digitized corpus of Chinese works available. The native tools feature well-designed Basic, Progressive, Correlative, and Advanced searches as well as bookmarking, annotation, and highlighting functions. Images from the original works have been preserved in high-resolution digital copies. As of version 3.0, the software is fully compliant with Unicode, so users can copy-and-paste from any text into virtually any other application. A series of classical dictionaries has recently been added to the collection, including the說文解字, 重修玉篇, 康熙字典, 重修廣韻, and集韻.
The Siku quanshu contents and interface are in traditional Chinese characters only. Online and offline versions are only accessible via the specialized interface provided by the publisher.

Scripta Sinica 漢籍電子文獻
Academica Sinica, © 2000

At roughly half the size of the Siku quanshu, the online Scripta Sinica database is another excellent resource as the editors at Academica Sinica have spent decades editing, proofing, and categorizing their vast digital resources into a fully-searchable online system. The basic search can include variant graphs and even synonyms (mainly for proper names), as well as customizations by time period; the advanced search provides an open-ended four-level search of full text, title, content, or sub-commentaries.
The Scripta Sinica contents and interface are in traditional Chinese characters only. Registration (which is free) is required to access the corpus and use the advanced search functions.

CHinese ANcient Texts Database Project (CHANT) 漢達文庫
D.C. Lau Research Centre for Chinese Ancient Texts 劉殿爵中國古籍研究中心, Chinese University of Hong Kong香港中文大學. © 2013

The CHANT database is one of the best resources for digitized transmitted and excavated ancient Chinese texts. Corpora include oracle bone inscriptions, bronze inscriptions, bamboo and silk manuscripts, pre-Han and Han-period transmitted texts, Six Dynasties-period transmitted texts, and a leishu section containing early Chinese encyclopædia, such as the群書治要, 太平御覽, and冊府元龜. One of the more useful features of the CHANT interface is that variant graphs and notes from other editions are included via pop-up windows, usually with full citations. Basic search functions are available for all corpora.
Entries and search functions are in traditional Chinese characters only. Individual or institutional registration is required, though there is a 30-day free trial registration.

Chinese Text Project 中國哲學書電子化計劃
Donald Sturgeon, ed. © 2006

The Chinese Text Project is the most user-friendly of the large textual databases currently available, containing over 6,000 texts and twenty-six million characters. Along with the dictionary (see above), the site boasts a range of other tools, such as a parallel passages interface (with both transmitted and excavated editions, featuring color-coded concordances of parallel or similar passages in other texts), images of source manuscripts, concordance and index data, the most common sub-commentaries for most texts, publication data for various editions, and user-driven metadata entry, discussion forum, and even an internal wiki. In addition to the powerful basic text search functions, the advanced search interface includes selection by time period and use of the extensive metadata tags.The site also provides English translations for most texts, an extensive Bibliography section, and a well-developed wiki with an active community.
The Chinese Text Project website can be accessed in English, traditional or simplified Chinese, or a combination thereof.

Wikisource 維基文庫 (for Chinese; other languages available via
Wikimedia Foundation, Inc.

With over a hundred thousand pages all freely available online (all works are in the public domain), the Chinese-language Wikisource database can be extremely useful, as the home page allows the user to search or simply browse by type of work, title, author, or subject. Both transmitted and excavated texts can be found in the database, and links to translations of texts are available from the left-hand sidebar. Full-text searches can be performed in any language.
Like all Wikimedia pages, Wikisource can be used in any of the available interface languages; both traditional and simplified Chinese are fully supported.

Digital Database of Buddhist Triptaka Catalogues佛教藏經目錄數位資料庫
Chinese Buddhist Electronic Text Association (CBETA) 中華電子佛典協會. © 2013

This Taiwanese organization has created a proofed, digitized, fully searchable version of the Taishō shinshū dai zōkyō volumes 1–55 and 85, along with many other scriptural texts. Along with their customized reader application, the CBETA contains a series of well-designed search tools, allowing for proximity searches for keywords within used-defined ranges, and an automatic citation creator.
In traditional Chinese only; all texts and tools can be downloaded from the website, or a DVD-ROM is available for purchase.

Fo Guang Shan Buddhist Electronic Etexts and Dictionary 佛光大藏經/佛光大辭典
Master Hsing Yun 星雲大師, Fo Guang Shan Monastery 佛光山寺

First established in 1997, this was the first large database of Buddhist texts and now includes tens of thousands of works in traditional Chinese, many with English translations. The simple search interface on the Fo Guang Shan website allows full-text, work title, and content searches, or users can browse from the homepage.
Entries and search functions are in traditional Chinese characters only. CD-ROM versions of the dictionary (including a mobile version) are available for purchase from the website.

Traditions of Exemplary Women 列女傳
Anne Kinney, ed. © 2003

While the focus of this site is the six editions of the列女傳 and related early Chinese works on women in early China, it also has bilingual digital editions of many of the major classics and histories from the pre-Qin and Han periods, in traditional Chinese on the left side with English on the right side, often including notes.
There are no search functions on the site, but the texts are laid out clearly, generally one chapter per page.

Gugong Hanquan Ancient Text Search Database 故宮【寒泉】古典文獻全文檢索資料庫
Chen Yufu 陳郁夫, ed. © 1999

The simple interface on this site is designed to retrieve all citations containing the search term(s) from a number of standard Chinese corpora, including the 十三經, 先秦諸子, 全唐詩, 宋元學案, 明儒學案, 四庫總目, 朱子語類, 紅樓夢, 白沙全集, 資治通鑑, and 續通鑑.
All content and search functions are in traditional Chinese characters only.

China the Beautiful: Chinese Literature Classics 锦绣中华之一页
Ming L. Pei, ed. © 1995–2009

Among the pages on Chinese culture this site has a surprisingly large textual archive freely available on the website featuring classic works of philosophy and poetry, literature and history, ancient to modern prose works, and even a few English works translated into Chinese. A Google search is provided for the site, or users can browse by category.
Pages are in either traditional (Big5 encoding) or simplified (GB2312 encoding) characters.

Google Books
Google, Inc. © 2012

While not designed expressly for sinologists, the full-text search functions in Google Books deserve mention as the number of volumes now stretches into the tens of millions. Users can search within volumes to quickly return all mentions of the search term in a specific work, much like a custom index. The main drawbacks to the system are that Chinese is rendered incorrectly at times by Google’s OCR filter, and pages from the works are displayed as images rather than digitized text, even for works in the public domain, making it difficult to transfer the information into other applications.
Searches can be in any language, including simplified and traditional Chinese, though English works seem to dominate the holdings. N-gram frequencies from Google’s Chinese archives can also be accessed and exported for use in other systems.

iask 爱问共享资料, © 2012

This China-based website provides a web interface for searching or browsing and downloading digitized materials. The number of volumes in Chinese is substantial, and the interface can return a wide variety of file types.Drawbacks include blatant copyright violations and the absence of quality control; and scans of published volumes in PDF, PDG or DJV format are often of relatively poor quality. Not all files are free to download; users must spend site-specific points to retrieve some files.
Searches can be made in traditional or simplified Chinese, or other languages.

Holger Schneider
Institut für Außereuropäische Sprachen und Kulturen
Lehrstuhl für Sinologie
Artilleriestraße 70
91052 Erlangen, GERMANY

Jeffrey Tharsen
East Asian Languages and Civilizations
University of Chicago

Image: Screenshot by Holger Schneider.

The views, perspectives, and opinions expressed here and by those providing comments are those of the author(s) and commentator(s) alone, and do not reflect the opinions of Dissertation Reviews, its members, editors, or advisory board members.


  1. From Brian Landor on Facebook’s Sinologists page:
    “It would be worth mentioning that Pleco has the Hanyu da cidian, and one can click on unknown words in the definition and see their definitions in other dictionaries, which makes it much more useful than the PC version. Pleco’s version of the Ricci (which really is better than any Classical Chinese to English dictionary) is also much better than the PC version, which is pretty amateur, though it works fine. Both the CD and Pleco versions of the Ricci support multiple-character searches; the former can only be searched in Chinese, but the Pleco version can also be searched in French.”

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like