|
|
Study on Term Extraction on the Basis of Chinese Domain Texts |
Gu Jun1,2, Wang Hao1 |
1. Department of Information Management, Nanjing University, Nanjing 210093,China;
2. Baoshan Iron and Steel Company Ltd., Shanghai 201900,China |
|
|
Abstract Based on the ICTCLAS dictionary segmentation, this paper proposes a method that extracts relevant concept terminology from the Chinese patent texts by maximum matching and frequency statistics, then computes the weights of the items by TF-IDF and gets the final concept terminology. Finally, it analyzes the results with the sample data extraction experiments.
|
Received: 10 February 2011
Published: 11 June 2011
|
|
[1] Berners-Lee T, Hendler J, Lassila O. The Semantic Web[J]. Scientific American, 2001,284(5): 28-37.[2] Ying D, Schubea F. Ontology Research and Development. Part I: A Review of Ontology Generation [J]. Journal of Information Science, 2002, 28(2):123-136.[3] Turney P D. Learning to Extract Key Phrases from Text[R]. National Research Council, Canada, NRC Technical Report ERB21057, 1999.[4] Witten I H, Paynter G W, Frank E,et al. KEA: Practical Automatic Keyphrase Extraction[C]. In: Proceedings of the 4th ACM Conference on Digital Libraries, Berkeley, California, US.1999: 254-256.[5] 姜韶华, 党延忠. 基于长度递减与串频统计的文本切分算法[J]. 情报学报,2006, 25(1): 74-79.[6] 刘桃, 刘秉权, 徐志明,等. 领域术语自动抽取及其在文本分类中的应用[J]. 电子学报,2007, 35(2): 328-332.[7] 何婷婷, 张小鹏. 特定领域本体自动构造方法[J]. 计算机工程,2007, 33(22): 235-237.[8] 王昊,邓三鸿. HMM和CRFs在信息抽取应用中的比较研究[J]. 现代图书情报技术,2007(12): 57-63.[9] 刘豹,张桂平,蔡东风. 基于统计和规则相结合的科技术语自动抽取研究[J]. 计算机工程与应用, 2008, 44(23): 147-150.[10] 岑咏华, 韩哲, 季培培. 基于隐马尔科夫模型的中文术语识别研究[J]. 现代图书情报技术,2008(12):54-58.[11] 温春, 王晓斌, 石昭祥. 中文领域本体学习中术语的自动抽取[J]. 计算机应用研究,2009,27(7): 2652-2655.[12] 高文利. 基于本体的军备情报抽取系统的设计与实现[J]. 现代图书情报技术,2010(1): 83-87.[13] 周浪,史树敏,冯冲,等. 基于多策略融合的中文术语抽取方法[J]. 情报学报,2010,29(3): 460-467.[14] 国内外三种专利申请受理状况总累计表[EB/OL]. [2010-12-22].http://www.sipo.gov.cn/sipo2008/ghfzs/zltj/zljb/201101/t20110110_562647.html.[15] ICTCLAS特色[EB/OL]. [2011-01-10]. http://ictclas.org/ictclas_feature.html. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|