Please wait a minute...
New Technology of Library and Information Service  2011, Vol. 27 Issue (4): 29-34    DOI: 10.11925/infotech.1003-3513.2011.04.05
Current Issue | Archive | Adv Search |
Study on Term Extraction on the Basis of Chinese Domain Texts
Gu Jun1,2, Wang Hao1
1. Department of Information Management, Nanjing University, Nanjing 210093,China;
2. Baoshan Iron and Steel Company Ltd., Shanghai 201900,China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  Based on the ICTCLAS dictionary segmentation, this paper proposes a method that extracts relevant concept terminology from the Chinese patent texts by maximum matching and frequency statistics, then computes the weights of the items by TF-IDF and gets the final concept terminology. Finally, it analyzes the results with the sample data extraction experiments.
Key wordsOntology      Concept extraction      Maximum matching and frequency statistics      TF-IDF      Chinese word segmentation     
Received: 10 February 2011      Published: 11 June 2011
: 

TP391

 

Cite this article:

Gu Jun, Wang Hao. Study on Term Extraction on the Basis of Chinese Domain Texts. New Technology of Library and Information Service, 2011, 27(4): 29-34.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2011.04.05     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2011/V27/I4/29

[1] Berners-Lee T, Hendler J, Lassila O. The Semantic Web[J]. Scientific American, 2001,284(5): 28-37.

[2] Ying D, Schubea F. Ontology Research and Development. Part I: A Review of Ontology Generation [J]. Journal of Information Science, 2002, 28(2):123-136.

[3] Turney P D. Learning to Extract Key Phrases from Text[R]. National Research Council, Canada, NRC Technical Report ERB21057, 1999.

[4] Witten I H, Paynter G W, Frank E,et al. KEA: Practical Automatic Keyphrase Extraction[C]. In: Proceedings of the 4th ACM Conference on Digital Libraries, Berkeley, California, US.1999: 254-256.

[5] 姜韶华, 党延忠. 基于长度递减与串频统计的文本切分算法[J]. 情报学报,2006, 25(1): 74-79.

[6] 刘桃, 刘秉权, 徐志明,等. 领域术语自动抽取及其在文本分类中的应用[J]. 电子学报,2007, 35(2): 328-332.

[7] 何婷婷, 张小鹏. 特定领域本体自动构造方法[J]. 计算机工程,2007, 33(22): 235-237.

[8] 王昊,邓三鸿. HMM和CRFs在信息抽取应用中的比较研究[J]. 现代图书情报技术,2007(12): 57-63.

[9] 刘豹,张桂平,蔡东风. 基于统计和规则相结合的科技术语自动抽取研究[J]. 计算机工程与应用, 2008, 44(23): 147-150.

[10] 岑咏华, 韩哲, 季培培. 基于隐马尔科夫模型的中文术语识别研究[J]. 现代图书情报技术,2008(12):54-58.

[11] 温春, 王晓斌, 石昭祥. 中文领域本体学习中术语的自动抽取[J]. 计算机应用研究,2009,27(7): 2652-2655.

[12] 高文利. 基于本体的军备情报抽取系统的设计与实现[J]. 现代图书情报技术,2010(1): 83-87.

[13] 周浪,史树敏,冯冲,等. 基于多策略融合的中文术语抽取方法[J]. 情报学报,2010,29(3): 460-467.

[14] 国内外三种专利申请受理状况总累计表[EB/OL]. [2010-12-22].http://www.sipo.gov.cn/sipo2008/ghfzs/zltj/zljb/201101/t20110110_562647.html.

[15] ICTCLAS特色[EB/OL]. [2011-01-10]. http://ictclas.org/ictclas_feature.html.
[1] Sheng Shu, Huang Qi, Yang Yang, Xie Qiwen, Qin Xinguo. Exchanging Chinese Medical Information Based on HL7 FHIR[J]. 数据分析与知识发现, 2021, 5(11): 13-28.
[2] Zeng Zhen,Li Gang,Mao Jin,Chen Jinghao. Data Governance and Domain Ontology of Regional Public Security[J]. 数据分析与知识发现, 2020, 4(9): 41-55.
[3] Tang Xiaobo,Gao Hexuan. Classification of Health Questions Based on Vector Extension of Keywords[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
[4] Peng Chen,Lv Xueqiang,Sun Ning,Zang Le,Jiang Zhaocai,Song Li. Building Phrase Dictionary for Defective Products with Convolutional Neural Network[J]. 数据分析与知识发现, 2020, 4(11): 112-120.
[5] Shaohua Qiang,Yunlu Luo,Yupeng Li,Peng Wu. Ontology Reasoning for Financial Affairs with RBR and CBR[J]. 数据分析与知识发现, 2019, 3(8): 94-104.
[6] Shiqi Deng,Liang Hong. Constructing Domain Ontology for Intelligent Applications: Case Study of Anti Tele-Fraud[J]. 数据分析与知识发现, 2019, 3(7): 73-84.
[7] Zhu Fu,Yuefen Wang,Xuhui Ding. Semantic Representation of Design Process Knowledge Reuse[J]. 数据分析与知识发现, 2019, 3(6): 21-29.
[8] Guangshang Gao. A Survey of User Profiles Methods[J]. 数据分析与知识发现, 2019, 3(3): 25-35.
[9] Ying Wang,Li Qian,Jing Xie,Zhijun Chang,Beibei Kong. Building Knowledge Graph with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 15-26.
[10] He Youshi,He Shufang. Sentiment Mining of Online Product Reviews Based on Domain Ontology[J]. 数据分析与知识发现, 2018, 2(8): 60-68.
[11] Tang Huihui,Wang Hao,Zhang Zixuan,Wang Xueying. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[12] Pang Beibei,Gou Juanqiong,Mu Wenxin. Extracting Topics and Their Relationship from College Student Mentoring[J]. 数据分析与知识发现, 2018, 2(6): 92-101.
[13] Feng Guoming,Zhang Xiaodong,Liu Suhui. DBLC Model for Word Segmentation Based on Autonomous Learning[J]. 数据分析与知识发现, 2018, 2(5): 40-47.
[14] Ding Shengchun,Liu Menglu,Fu Zhu. Unified Multidimensional Model Based on Knowledge Flow in Conceptual Design[J]. 数据分析与知识发现, 2018, 2(2): 11-19.
[15] Ni Weijian,Sun Haohao,Liu Tong,Zeng Qingtian. An Unsupervised Approach to Optimize Chinese Word Segmentation on Domain Literature[J]. 数据分析与知识发现, 2018, 2(2): 96-104.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn