Please wait a minute...
New Technology of Library and Information Service  2011, Vol. 27 Issue (4): 29-34    DOI: 10.11925/infotech.1003-3513.2011.04.05
Current Issue | Archive | Adv Search |
Study on Term Extraction on the Basis of Chinese Domain Texts
Gu Jun1,2, Wang Hao1
1. Department of Information Management, Nanjing University, Nanjing 210093,China;
2. Baoshan Iron and Steel Company Ltd., Shanghai 201900,China
Download: PDF(541 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  Based on the ICTCLAS dictionary segmentation, this paper proposes a method that extracts relevant concept terminology from the Chinese patent texts by maximum matching and frequency statistics, then computes the weights of the items by TF-IDF and gets the final concept terminology. Finally, it analyzes the results with the sample data extraction experiments.
Key wordsOntology      Concept extraction      Maximum matching and frequency statistics      TF-IDF      Chinese word segmentation     
Received: 10 February 2011      Published: 11 June 2011
: 

TP391

 

Cite this article:

Gu Jun, Wang Hao. Study on Term Extraction on the Basis of Chinese Domain Texts. New Technology of Library and Information Service, 2011, 27(4): 29-34.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2011.04.05     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2011/V27/I4/29

[1] Berners-Lee T, Hendler J, Lassila O. The Semantic Web[J]. Scientific American, 2001,284(5): 28-37.

[2] Ying D, Schubea F. Ontology Research and Development. Part I: A Review of Ontology Generation [J]. Journal of Information Science, 2002, 28(2):123-136.

[3] Turney P D. Learning to Extract Key Phrases from Text[R]. National Research Council, Canada, NRC Technical Report ERB21057, 1999.

[4] Witten I H, Paynter G W, Frank E,et al. KEA: Practical Automatic Keyphrase Extraction[C]. In: Proceedings of the 4th ACM Conference on Digital Libraries, Berkeley, California, US.1999: 254-256.

[5] 姜韶华, 党延忠. 基于长度递减与串频统计的文本切分算法[J]. 情报学报,2006, 25(1): 74-79.

[6] 刘桃, 刘秉权, 徐志明,等. 领域术语自动抽取及其在文本分类中的应用[J]. 电子学报,2007, 35(2): 328-332.

[7] 何婷婷, 张小鹏. 特定领域本体自动构造方法[J]. 计算机工程,2007, 33(22): 235-237.

[8] 王昊,邓三鸿. HMM和CRFs在信息抽取应用中的比较研究[J]. 现代图书情报技术,2007(12): 57-63.

[9] 刘豹,张桂平,蔡东风. 基于统计和规则相结合的科技术语自动抽取研究[J]. 计算机工程与应用, 2008, 44(23): 147-150.

[10] 岑咏华, 韩哲, 季培培. 基于隐马尔科夫模型的中文术语识别研究[J]. 现代图书情报技术,2008(12):54-58.

[11] 温春, 王晓斌, 石昭祥. 中文领域本体学习中术语的自动抽取[J]. 计算机应用研究,2009,27(7): 2652-2655.

[12] 高文利. 基于本体的军备情报抽取系统的设计与实现[J]. 现代图书情报技术,2010(1): 83-87.

[13] 周浪,史树敏,冯冲,等. 基于多策略融合的中文术语抽取方法[J]. 情报学报,2010,29(3): 460-467.

[14] 国内外三种专利申请受理状况总累计表[EB/OL]. [2010-12-22].http://www.sipo.gov.cn/sipo2008/ghfzs/zltj/zljb/201101/t20110110_562647.html.

[15] ICTCLAS特色[EB/OL]. [2011-01-10]. http://ictclas.org/ictclas_feature.html.
[1] Shiqi Deng,Liang Hong. Constructing Domain Ontology for Intelligent Applications: Case Study of Anti Tele-Fraud[J]. 数据分析与知识发现, 2019, 3(7): 73-84.
[2] Zhu Fu,Yuefen Wang,Xuhui Ding. Semantic Representation of Design Process Knowledge Reuse[J]. 数据分析与知识发现, 2019, 3(6): 21-29.
[3] Guangshang Gao. A Survey of User Profiles Methods[J]. 数据分析与知识发现, 2019, 3(3): 25-35.
[4] Ying Wang,Li Qian,Jing Xie,Zhijun Chang,Beibei Kong. Building Knowledge Graph with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 15-26.
[5] Youshi He,Shufang He. Sentiment Mining of Online Product Reviews Based on Domain Ontology[J]. 数据分析与知识发现, 2018, 2(8): 60-68.
[6] Huihui Tang,Hao Wang,Zixuan Zhang,Xueying Wang. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[7] Beibei Pang,Juanqiong Gou,Wenxin Mu. Extracting Topics and Their Relationship from College Student Mentoring[J]. 数据分析与知识发现, 2018, 2(6): 92-101.
[8] Guoming Feng,Xiaodong Zhang,Suhui Liu. DBLC Model for Word Segmentation Based on Autonomous Learning[J]. 数据分析与知识发现, 2018, 2(5): 40-47.
[9] Shengchun Ding,Menglu Liu,Zhu Fu. Unified Multidimensional Model Based on Knowledge Flow in Conceptual Design[J]. 数据分析与知识发现, 2018, 2(2): 11-19.
[10] Weijian Ni,Haohao Sun,Tong Liu,Qingtian Zeng. An Unsupervised Approach to Optimize Chinese Word Segmentation on Domain Literature[J]. 数据分析与知识发现, 2018, 2(2): 96-104.
[11] Cong Yin,Liyi Zhang. Recommendation Algorithm for Post-Context Filtering Based on TF-IDF: Case Study of Catering O2O[J]. 数据分析与知识发现, 2018, 2(11): 28-36.
[12] Haili Tu,Xiaobo Tang. Building Product Recommendation Model Based on Tags[J]. 数据分析与知识发现, 2017, 1(9): 28-39.
[13] Changbing Li,Chongpeng Pang,Meiping Li. Extracting Product Features with Weight-based Apriori Algorithm[J]. 数据分析与知识发现, 2017, 1(9): 83-89.
[14] Erjing Chen,Enbo Jiang. Review of Studies on Text Similarity Measures[J]. 数据分析与知识发现, 2017, 1(6): 1-11.
[15] Rujiang Bai,Fuhai Leng,Junhua Liao. An Improved Cosine Text Similarity Computing Method Based on Semantic Chunk Feature[J]. 数据分析与知识发现, 2017, 1(6): 56-64.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn