|
|
Chinese Phrase Tagging and Automated Annotation Based on CSSCI Corpus |
Xie Jing1, Su Xinning2, Shen Si2 |
1. School of Economics and Management, Nanjing University of Chinese Medicine, Nanjing 210046, China; 2. School of Information Management, Nanjing University, Nanjing 210093, China |
|
|
Abstract The paper introduces a new syntax method as the solution of term phrase identification on CSSCI corpus, and obtains the inter-relationship among terms in academic literature from the linguistic aspect based on phrase components, such as words, part-of-speech, grammar functions, etc. These linguistic features are mixed with phrase features which are extracted from Tsinghua Treebank so as to leverage the accuracy of phrase auto-identification in academic corpus.
|
Received: 14 November 2012
Published: 12 March 2013
|
|
[1] Chomsky N. Syntactic Structures[M].Berlin: Mouton de Gruyter, 1957. [2] Abney S P. Parsing by Chunks[A]. // Berwick R C, Abney S P, Tenny C L. Principle-Based Parsing[M]. Springer, 1991. [3] The Penn Treebank Project[EB/OL]. [2012-09-12]. http://www.cis.upenn.edu/~treebank/. [4] 周强. 汉语句法树库标注体系[J]. 中文信息学报, 2004, 18(4):1-8. (Zhou Qiang. Annotation Scheme for Chinese Treebank[J]. Journal of Chinese Information Processing, 2004, 18(4):1-8.) [5] 陈静, 王东波, 谢靖, 等. 基于条件随机场的兼语结构自动识别[J]. 情报科学, 2012, 30(3):439-443. (Chen Jing, Wang Dongbo, Xie Jing, et al. Automatic Identification of Concurrent Structure Based on Conditional Random Field[J]. Information Science, 2012, 30(3):439-443.) [6] 朱丹浩, 王东波, 谢靖. 基于条件随机场的介宾结构自动识别[J]. 现代图书情报技术, 2010(7-8):79-83. (Zhu Danhao, Wang Dongbo,Xie Jing. Automatic Identification of Prepositional Phrase Based on Conditional Random Field[J]. New Technology of Library and Information Service, 2010(7-8):79-83.) [7] Feng Z W. Analysis of Chinese Terms in Data Processing[R]. Report in Fraunhofer Institute, 1988. [8] 冯志伟. 一个新兴的术语学科——计算术语学[J]. 术语标准化与信息技术, 2008(4):4-9.(Feng Zhiwei. A New Scientific Domain in Terminology——Computational Terminology[J]. Terminology Standardization & Information Technology, 2008(4):4-9.) [9] 冯志伟. 汉语单词型术语的结构[J]. 科技术语研究, 2004, 6(1):15-20. (Feng Zhiwei. Structure of Word Terms in Chinese Language[J]. Chinese Science and Technology Terms Journal, 2004, 6(1):15-20.) [10] 冯志伟. 汉语词组型术语的结构[J]. 科技术语研究, 2004, 6(2):35-37. (Feng Zhiwei. Structure of Chinese Phrase Term[J]. Chinese Science and Technology Terms Journal, 2004, 6(2):35-37.) [11] 冯志伟. 术语形成的经济律——FEL公式[J]. 中国科技术语, 2010, 12(2):9-15. ( Feng Zhiwei. Economic Law of Term Formation— FEL Formula[J]. Chinese Science and Technology Terms Journal, 2010, 12(2):9-15.) [12] CRF + +: Yet Another CRF Toolkit[EB/OL]. [2012-09-11]. http://crfpp.sourceforge.net/. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|