Please wait a minute...
Advanced Search
现代图书情报技术  2013, Vol. 29 Issue (9): 54-59    DOI: 10.11925/infotech.1003-3513.2013.09.09
  知识组织与知识管理 本期目录 | 过刊浏览 | 高级检索 |
基于有效词频的改进C-value自动术语抽取方法
熊李艳, 谭龙, 钟茂生
华东交通大学信息工程学院 南昌 330013
An Automatic Term Extraction System of Improved C-value Based on Effective Word Frequency
Xiong Liyan, Tan Long, Zhong Maosheng
School of Information Engineering, East China Jiaotong University, Nanchang 330013, China
全文: PDF(453 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 现有的中文术语自动抽取方法主要针对术语的高频特征与单元性指标,而低频术语和术语的术语性指标缺乏有效的处理方法。针对上述问题,将背景语料库引入C-value方法,提出词语领域分布度与有效词频的概念,通过计算候选术语的EC-value值来自动抽取术语,并结合术语簇识别与挖掘,改善低频术语抽取性能。通过计算机领域术语抽取实验,表明本文提出的改进方法(EC-value方法)能更有效地衡量术语的术语性,改善低频术语抽取性能。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
熊李艳
钟茂生
谭龙
关键词 自动术语抽取EC-value有效词频术语簇    
Abstract:Existing Chinese term automatic extraction methods focus on the high-frequency characteristics and unithood indicators of terms, while low frequency terms and termhood indicators lack of effective treatment methods. In response to these problems, this paper introduces the background corpus into C-value method and proposes the concepts of word field distribution degree and effective word frequency. Then the paper automatically extracts the terms by calculating EC-value (Effective C-value) of candidate terms, and improves the extraction performance of low-frequency terms combined with the term cluster recognition and mining. The term extraction experiment in the computer field shows that the proposed improved method (EC-value method) can measure the termhood of terms more effectively, and improve the extraction performance of low-frequency terms.
Key wordsAutomatic term extraction    EC-value    Effective word frequency    Term cluster
收稿日期: 2013-06-17     
:  TP391.1  
基金资助:本文系国家自然科学基金项目“论证体篇章'结构与语义’协同交叉分析模型与算法研究”(项目编号:61240036)、教育部人文社会科学基金项目“论证体篇章'结构与语义’协同分析方法研究”(项目编号:11YJC740157)和江西省自然科学基金项目“面向语义理解的网页文本'结构与语义’协同交叉分析模型研究”(项目编号:20114BAB201027)的研究成果之一。
通讯作者: 谭龙     E-mail: tanlonga109@163.com
引用本文:   
熊李艳, 谭龙, 钟茂生. 基于有效词频的改进C-value自动术语抽取方法[J]. 现代图书情报技术, 2013, 29(9): 54-59.
Xiong Liyan, Tan Long, Zhong Maosheng. An Automatic Term Extraction System of Improved C-value Based on Effective Word Frequency. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2013.09.09.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2013.09.09
[1] 谷俊,王昊.基于领域中文文本的术语抽取方法研究[J]. 现代图书情报技术,2011 (4):29-34.(Gu Jun,Wang Hao.Study on Term Extraction on the Basis of Chinese Domain Texts[J].New Technology of Library and Information Service,2011(4):29-34.)
[2] 祝清松,冷伏海.自动术语识别存在的问题及发展趋势综述[J]. 图书情报工作,2012,56(18):104-109.(Zhu Qingsong,Leng Fuhai.Existing Problems and Developing Trends of Automatic Term Recognition[J].Library and Information Service,2012,56(18):104-109.)
[3] 韩红旗,安小米.C-value值和Unithood指标结合的中文科技术语抽取[J]. 图书情报工作,2012,56(19):85-89.(Han Hongqi,An Xiaomi.Chinese Scientific and Technical Term Extraction by Using C-value and Unithood Measure[J].Library and Information Service,2012,56(19):85-89.)
[4] 胡健坤.基于齐次马尔可夫链的中文新术语识别方法研究[D].广州:华南理工大学,2012.(Hu Jiankun.Chinese New Term Detection Method Based on Homogeneous Markov Chain Research[D].Guangzhou:South China University of Technology,2012.)
[5] 季培培,鄢小燕,岑咏华.面向领域中文文本信息处理的术语识别与抽取研究综述[J]. 图书情报工作,2010,54(16):124-l29.(Ji Peipei,Yan Xiaoyan,Cen Yonghua.A Survey of Term Recognition and Extraction for Domain-specific Chinese Text Information Processing[J].Library and Information Service,2010,54(16):124-129.)
[6] Frantzi K,Ananiadou S,Mima H.Automatic Recognition of Multi-word Terms:The C-value/NC-value Method[J].International Journa1 on Digital Libraries,2000,3(2):115-130.
[7] 周浪,史树敏,冯冲,等.基于多策略融合的中文术语抽取方法[J]. 情报学报,2010,29(3):460-467.(Zhou Lang,Shi Shumin,Feng Chong,et al.A Chinese Term Extraction System Based on Multi-Strategies Integration[J].Journal of the China Society for Scientific and Technical Information,2010,29(3):460-467.)
[8] 胡阿沛,张静,刘俊丽.基于改进C-value 方法的中文术语抽取[J]. 现代图书情报技术,2013 (2):24-29.(Hu Apei,Zhang Jing,Liu Junli.Chinese Term Extraction Based on Improved C-value Method[J].New Technology of Library and Information Service,2013 (2):24-29.)
[9] Milios E,Zhang Y,He B,et al.Automatic Term Extraction and Document Similarity in Special Text Corpora[C].In:Proceedings of the 6th Conference of the Pacific Association for Computational Linguistics,Halifax,Canada.2003:1-10.
[10] Barrón-Cedeo A,Sierra G,Drouin P,et al.An Improved Automatic Term Recognition Method for Spanish[C].In:Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing.Heidelberg:Springer-Verlag,2009:125-136.
[11] Mima H,Ananiadou S.An Application and Evaluation of the C/NC-value Approach for the Automatic Term Recognition of Multi-word Units in Japanese[J].International Journal of Terminology,2001,6(2):175-194.
[12] 李嵩.语言学文献标题的术语提取研究[D].济南:山东大学,2007.(Li Song. Terminology Extraction Research of Linguistic Literature Titles[D].Jinan:Shandong University,2007.)
[13] 周浪.中文术语抽取若干问题研究[D].南京:南京理工大学,2010.(Zhou Lang.A Study on the Chinese Term Extraction[D].Nanjing:Nanjing University of Science and Technology,2010.)
[14] 梁颖红,张文静,张有承.C 值和互信息相结合的术语抽取[J]. 计算机应用与软件,2010,27(4):108-110.(Liang Yinghong,Zhang Wenjing,Zhang Youcheng.Term Recognition Based on Integration of C-Value and Mutual Information[J].Computer Applications and Software,2010,27(4):108-110.)
[15] Manning C D,Schütze H.Foundation of Statistical Natural Language Processing[M].Cambridge,MA:MIT Press,1999.
[1] 李晓峰,马静,李驰,朱恒民. 基于XGBoost模型的电商商品品名识别算法研究 *[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[2] 许德山, 李辉, 张运良. 文献关键词链接标引方法研究[J]. 现代图书情报技术, 2015, 31(9): 31-37.
[3] 陈诗琴, 李文江. WebSocket在图书馆移动信息服务中的应用[J]. 现代图书情报技术, 2015, 31(9): 90-96.
[4] 胡菊香, 吕学强, 刘克会. 利用类别引导词的投诉文本分类[J]. 现代图书情报技术, 2015, 31(7-8): 97-103.
[5] 段宇锋, 朱雯晶, 陈巧, 刘伟, 刘凤红. 条件随机场与领域本体元素集相结合的未登录词识别研究[J]. 现代图书情报技术, 2015, 31(4): 41-49.
[6] 李军锋, 吕学强, 周绍钧. 带权复杂图模型的专利关键词标引研究[J]. 现代图书情报技术, 2015, 31(3): 26-32.
[7] 马宾, 殷立峰. 一种基于Hadoop平台的并行朴素贝叶斯网络舆情快速分类算法[J]. 现代图书情报技术, 2015, 31(2): 78-84.
[8] 侯婷, 吕学强, 李卓. 专利术语抽取的层次过滤方法[J]. 现代图书情报技术, 2015, 31(1): 24-30.
[9] 唐守利, 徐宝祥. 基于本体的云服务语义检索系统研究[J]. 现代图书情报技术, 2014, 30(12): 27-35.
[10] 唐晓波, 肖璐. 基于依存句法网络的文本特征提取研究[J]. 现代图书情报技术, 2014, 30(11): 31-37.
[11] 石翠, 王杨, 杨彬, 姚晔. 面向中文专利文献的单层并列结构识别[J]. 现代图书情报技术, 2014, 30(10): 76-83.
[12] 张永军, 刘金岭, 马甲林. 中文短信文本信息流中多话题的分类抽取[J]. 现代图书情报技术, 2014, 30(7): 101-106.
[13] 李文江, 陈诗琴. 微信作为APP客户端的图书馆公共服务平台[J]. 现代图书情报技术, 2014, 30(7): 133-138.
[14] 汤青,吕学强,李卓,施水才,. 领域本体术语抽取研究*[J]. 现代图书情报技术, 2014, 30(1): 43-50.
[15] 李文江, 陈诗琴. 基于Android GCM服务的图书馆信息推送系统设计[J]. 现代图书情报技术, 2013, 29(11): 91-96.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn