Chinese Phrase Tagging and Automated Annotation Based on CSSCI Corpus
Xie Jing1, Su Xinning2, Shen Si2
1. School of Economics and Management, Nanjing University of Chinese Medicine, Nanjing 210046, China; 2. School of Information Management, Nanjing University, Nanjing 210093, China
Abstract:The paper introduces a new syntax method as the solution of term phrase identification on CSSCI corpus, and obtains the inter-relationship among terms in academic literature from the linguistic aspect based on phrase components, such as words, part-of-speech, grammar functions, etc. These linguistic features are mixed with phrase features which are extracted from Tsinghua Treebank so as to leverage the accuracy of phrase auto-identification in academic corpus.
谢靖, 苏新宁, 沈思. CSSCI语料中短语结构标注与自动识别[J]. 现代图书情报技术, 2012, (12): 32-38.
Xie Jing, Su Xinning, Shen Si. Chinese Phrase Tagging and Automated Annotation Based on CSSCI Corpus. New Technology of Library and Information Service, 2012, (12): 32-38.
[1] Chomsky N. Syntactic Structures[M].Berlin: Mouton de Gruyter, 1957. [2] Abney S P. Parsing by Chunks[A]. // Berwick R C, Abney S P, Tenny C L. Principle-Based Parsing[M]. Springer, 1991. [3] The Penn Treebank Project[EB/OL]. [2012-09-12]. http://www.cis.upenn.edu/~treebank/. [4] 周强. 汉语句法树库标注体系[J]. 中文信息学报, 2004, 18(4):1-8. (Zhou Qiang. Annotation Scheme for Chinese Treebank[J]. Journal of Chinese Information Processing, 2004, 18(4):1-8.) [5] 陈静, 王东波, 谢靖, 等. 基于条件随机场的兼语结构自动识别[J]. 情报科学, 2012, 30(3):439-443. (Chen Jing, Wang Dongbo, Xie Jing, et al. Automatic Identification of Concurrent Structure Based on Conditional Random Field[J]. Information Science, 2012, 30(3):439-443.) [6] 朱丹浩, 王东波, 谢靖. 基于条件随机场的介宾结构自动识别[J]. 现代图书情报技术, 2010(7-8):79-83. (Zhu Danhao, Wang Dongbo,Xie Jing. Automatic Identification of Prepositional Phrase Based on Conditional Random Field[J]. New Technology of Library and Information Service, 2010(7-8):79-83.) [7] Feng Z W. Analysis of Chinese Terms in Data Processing[R]. Report in Fraunhofer Institute, 1988. [8] 冯志伟. 一个新兴的术语学科——计算术语学[J]. 术语标准化与信息技术, 2008(4):4-9.(Feng Zhiwei. A New Scientific Domain in Terminology——Computational Terminology[J]. Terminology Standardization & Information Technology, 2008(4):4-9.) [9] 冯志伟. 汉语单词型术语的结构[J]. 科技术语研究, 2004, 6(1):15-20. (Feng Zhiwei. Structure of Word Terms in Chinese Language[J]. Chinese Science and Technology Terms Journal, 2004, 6(1):15-20.) [10] 冯志伟. 汉语词组型术语的结构[J]. 科技术语研究, 2004, 6(2):35-37. (Feng Zhiwei. Structure of Chinese Phrase Term[J]. Chinese Science and Technology Terms Journal, 2004, 6(2):35-37.) [11] 冯志伟. 术语形成的经济律——FEL公式[J]. 中国科技术语, 2010, 12(2):9-15. ( Feng Zhiwei. Economic Law of Term Formation— FEL Formula[J]. Chinese Science and Technology Terms Journal, 2010, 12(2):9-15.) [12] CRF + +: Yet Another CRF Toolkit[EB/OL]. [2012-09-11]. http://crfpp.sourceforge.net/.