Please wait a minute...
New Technology of Library and Information Service  2012, Vol. Issue (12): 32-38    DOI: 10.11925/infotech.1003-3513.2012.12.07
Current Issue | Archive | Adv Search |
Chinese Phrase Tagging and Automated Annotation Based on CSSCI Corpus
Xie Jing1, Su Xinning2, Shen Si2
1. School of Economics and Management, Nanjing University of Chinese Medicine, Nanjing 210046, China;
2. School of Information Management, Nanjing University, Nanjing 210093, China
Download: PDF(583 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  The paper introduces a new syntax method as the solution of term phrase identification on CSSCI corpus, and obtains the inter-relationship among terms in academic literature from the linguistic aspect based on phrase components, such as words, part-of-speech, grammar functions, etc. These linguistic features are mixed with phrase features which are extracted from Tsinghua Treebank so as to leverage the accuracy of phrase auto-identification in academic corpus.
Key wordsPhrase annotation      CSSCI corpus      Multi-feature      Auto-identification     
Received: 14 November 2012      Published: 12 March 2013
:  TP391  

Cite this article:

Xie Jing, Su Xinning, Shen Si. Chinese Phrase Tagging and Automated Annotation Based on CSSCI Corpus. New Technology of Library and Information Service, 2012, (12): 32-38.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2012.12.07     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2012/V/I12/32

[1] Chomsky N. Syntactic Structures[M].Berlin: Mouton de Gruyter, 1957.
[2] Abney S P. Parsing by Chunks[A]. // Berwick R C, Abney S P, Tenny C L. Principle-Based Parsing[M]. Springer, 1991.
[3] The Penn Treebank Project[EB/OL]. [2012-09-12]. http://www.cis.upenn.edu/~treebank/.
[4] 周强. 汉语句法树库标注体系[J]. 中文信息学报, 2004, 18(4):1-8. (Zhou Qiang. Annotation Scheme for Chinese Treebank[J]. Journal of Chinese Information Processing, 2004, 18(4):1-8.)
[5] 陈静, 王东波, 谢靖, 等. 基于条件随机场的兼语结构自动识别[J]. 情报科学, 2012, 30(3):439-443. (Chen Jing, Wang Dongbo, Xie Jing, et al. Automatic Identification of Concurrent Structure Based on Conditional Random Field[J]. Information Science, 2012, 30(3):439-443.)
[6] 朱丹浩, 王东波, 谢靖. 基于条件随机场的介宾结构自动识别[J]. 现代图书情报技术, 2010(7-8):79-83. (Zhu Danhao, Wang Dongbo,Xie Jing. Automatic Identification of Prepositional Phrase Based on Conditional Random Field[J]. New Technology of Library and Information Service, 2010(7-8):79-83.)
[7] Feng Z W. Analysis of Chinese Terms in Data Processing[R]. Report in Fraunhofer Institute, 1988.
[8] 冯志伟. 一个新兴的术语学科——计算术语学[J]. 术语标准化与信息技术, 2008(4):4-9.(Feng Zhiwei. A New Scientific Domain in Terminology——Computational Terminology[J]. Terminology Standardization & Information Technology, 2008(4):4-9.)
[9] 冯志伟. 汉语单词型术语的结构[J]. 科技术语研究, 2004, 6(1):15-20. (Feng Zhiwei. Structure of Word Terms in Chinese Language[J]. Chinese Science and Technology Terms Journal, 2004, 6(1):15-20.)
[10] 冯志伟. 汉语词组型术语的结构[J]. 科技术语研究, 2004, 6(2):35-37. (Feng Zhiwei. Structure of Chinese Phrase Term[J]. Chinese Science and Technology Terms Journal, 2004, 6(2):35-37.)
[11] 冯志伟. 术语形成的经济律——FEL公式[J]. 中国科技术语, 2010, 12(2):9-15. ( Feng Zhiwei. Economic Law of Term Formation— FEL Formula[J]. Chinese Science and Technology Terms Journal, 2010, 12(2):9-15.)
[12] CRF + +: Yet Another CRF Toolkit[EB/OL]. [2012-09-11]. http://crfpp.sourceforge.net/.
[1] Chuanming Yu,Yutian Gong,Xiaoli Zhao,Lu An. Collaboration Recommendation of Finance Research Based on Multi-feature Fusion[J]. 数据分析与知识发现, 2017, 1(8): 39-47.
[2] Li Kechao, Lan Dongmei, Ling Xiaoe. Research of Books Recommendation of Borrow Preference Uncertainty in University Readers Based on Cloud Model and Multi-feature[J]. 现代图书情报技术, 2013, (5): 54-58.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn