自动术语识别——对科技文献进行文本挖掘的重要技术方法*

doi:10.11925/infotech.1003-3513.2008.08.02

现代图书情报技术

2008, Vol. 24

Issue (8): 12-17 https://doi.org/10.11925/infotech.1003-3513.2008.08.02

专题

本期目录 | 过刊浏览 | 高级检索

自动术语识别——对科技文献进行文本挖掘的重要技术方法*

刘建华^1,2张智雄¹徐健^1,2,3许雁冬¹

¹（中国科学院国家科学图书馆北京 100190）
²（中国科学院研究生院北京 100049）
³（中山大学资讯管理系广州 510275）

Automatic Term Recognition——An Important Method for Text Mining on Scientific Literature

Liu Jianhua ^1,2Zhang Zhixiong ¹Xu Jian ^1,2,3Xu Yandong¹

¹(National Science Library, Chinese Academy of Sciences, Beijing 100190, China)
²(Graduate University of the Chinese Academy of Sciences, Beijing 100049,China）
³(Department of Information Management，Sun Yat-Sen University, Guangzhou 510275,China）

摘要
参考文献
相关文章
Metrics

全文: PDF (598 KB)
输出: BibTeX | EndNote (RIS)

摘要

自动术语识别是知识抽取和文本挖掘等信息技术中的关键步骤。研究现有自动术语识别的主要思路，明确其中的关键问题，研究已有的相关项目和系统的术语识别方法，并分析现有的一些术语资源。借此丰富基于术语识别的文本挖掘理论和方法，为进一步构建相关试验系统提供良好借鉴。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	许雁冬
	刘建华
	张智雄
	徐健

关键词 ：自动术语识别, 术语变体, 术语歧义

Abstract：

Automatic Term Recognition(ATR) is a key process of knowledge technology such as knowledge extraction and text mining. To enrich the text mining theories and methods based on term recognition, support constructing related systems, it refers to some main existing methods for ATR, find key problems of the process. Through researches on related programs and systems, existing term resources, we could choose the best one for ourselves’ ATR system.

Key words： Automatic term recognition Term variation Term ambiguity

收稿日期: 2008-06-16 出版日期: 2008-08-25

G250.73

基金资助:

*本文系国家社会科学基金项目“从数字信息资源中实现知识抽取的理论和方法研究”(项目编号：05BTQ006)的研究成果之一。

通讯作者: 刘建华 E-mail: liujh@mail.las.ac.cn

作者简介: 刘建华,张智雄,徐健,许雁冬

引用本文:

刘建华,张智雄,徐健,许雁冬. 自动术语识别——对科技文献进行文本挖掘的重要技术方法*[J]. 现代图书情报技术, 2008, 24(8): 12-17.
Liu Jianhua,Zhang Zhixiong,Xu Jian,Xu Yandong. Automatic Term Recognition——An Important Method for Text Mining on Scientific Literature. New Technology of Library and Information Service, 2008, 24(8): 12-17.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2008.08.02 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2008/V24/I8/12

［1］ Feldman R, Fresko M, Kinar Y, et al. Text Mining at the Term Level［J］. Lecture Notes In Computer Science, 1998:65-73.
［2］ Mima H, Ananiadou S, Nenadic G. The ATRACT Workbench:Automatic Term Recognition and Clustering for Terms［J］. Lecture Notes in Computer Science, 2001,2166:126-133.
［3］ Milios E, Zhang Y, et al. Automatic Term Extraction and Document Similarity in Special Text Corpora［C］. In： Proceeding of the 6th conference of the Pacific Association for Computational Linguistics,New York:ACM, 2003:275-284.
［4］ Love S. Benchmarking the Performance of Two Automated Term-Extraction Systems:LOGOS and ATAO［EB/OL］.［2008-04-03］.http://www.olst.umontreal.ca/pdf/memoirelove.pdf.
［5］ Kajikawa Y, Sugiyama Y. Causal Knowledge Extraction by Natural Language Processing in Material Science:A Case Study in Chemical Vapor Deposition［J］. Data Science Journal, 2006,5:108-118.
［6］ Jensen L J, Saric J, Bork P.Literature Mining for the Biologist:from Information Retrieval to Biological Discovery［J］. Nature Reviews (Genetics), 2006,7:119-129.
［7］ Krauthammer M, Nenadic G. Term Identification in the Biomedical Literature［J］. Journal of Biomedical Informatics, 2004,37(6):512-526.
［8］ Asunción Gómez-Pérez, David Manzano-MachoA Survey of Ontology Learning Methods and Techniques ［EB/OL］.［2008-06-05］. http://www.sti-innsbruck.at/fileadmin/documents/deliverables/Ontoweb/D1.5.pdf.
［9］ Term versus Word［EB/OL］.［2008-02-24］. http://www.termiumplus.gc.ca/didacticiel_tutorial/english/lesson1/page1_2_4_e.html.
［10］ Alegria I, Arregi O, Balza I. Linguistic and Statistical Approaches to Basque Term Extraction［EB/OL］.［2008-2-24］. http://ixa.is.ehu.es.
［11］于卫. 自动中文术语识别若干方法研究［D］. 哈尔滨：哈尔滨工业大学，2004.
［12］ Ananiadou S, Nenadic G. Automatic Terminology Management in Biomedicine［M］. Text Mining for Biology and Biomedicine, UK:Artech House Publishers, 2006.
［13］ Buitelaar P, Cimiano P, Grobelnik M. Ontology Learning from Text［C］.In:the ECML/PKDD 2005 Workshop on:Knowledge Discovery and Ontologies, Porto, Protugal, 2005.
［14］ Olena Medelyna. Automatic Keyphrase Indexing with a Domain-Specific Thesaurus［D］. Germany：University of Freiburg, 2005.
［15］ TerMine Plugin for Protege 4 ［EB/OL］.［2008-4-3］. http://www.co-ode.org/downloads/protege-x/plugins/termine-docs.pdf.
［16］张榕.术语定义抽取、聚类与术语识别研究［D］.北京：北京语言大学，2006.
［17］ TerMine［EB/OL］.［2008-04-03］. http://www.nactem.ac.uk/software/termine/
［18］ Cheshire3-Termine Demonstration using Medline Abstracts［EB/OL］.［2008-04-03］.http://www.nactem.ac.uk/software/ctermine/.
［19］穗志方等.信息科学与技术领域术语自动提取研究［C］.见：第五届东亚术语论坛，2002.
［20］ UMLS［EB/OL］.［2008-04-03］. http://www.nlm.nih.gov/research/umls/.
［21］ Liu H,Johnson S B, Friedman C. Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS［J］. Journal of the American Medical Associations, 2002,9(6):621-636.
［22］ Harkema H, Gaizauskas R, Mark H, et al. A Large Scale Terminology Resource for Biomedical Text Processing. Linking Biological Literature［J］, Ontologies and Databases, 2004(6):53-60.
［23］俞士汶，于江生.中文概念词典的结构［J］.中文信息学报，2002,16（4）:12-20.
［24］ Zan H, Duan G, Fan M. Single World Term Extraction Using a Bilingual Semantic Lexicon-based Approach［C］. In:Third International Conference on Natural Computation, ICNC:IEEE Computer Society, 2007:451-456.

[1]	乔建忠. 一种基于改进BFS算法的主题搜索技术研究[J]. 现代图书情报技术, 2013, 29(7/8): 28-35.
[2]	乔建忠. 一种基于统计特征面向“类型”主题抓取的网页相关性判断策略研究[J]. 现代图书情报技术, 2012, 28(6): 9-16.
[3]	徐树维. 同步协作检索结果的相关性判断策略[J]. 现代图书情报技术, 2012, 28(4): 41-47.
[4]	邢美凤. 科技文献关键词冗余解决方案研究[J]. 现代图书情报技术, 2012, 28(1): 34-39.
[5]	夏玉华, 孙建德, 亓靖涛. 图书馆学术视频快速浏览技术中的关键帧提取[J]. 现代图书情报技术, 2011, 27(10): 40-44.
[6]	徐健. 基于句法依赖关系模板的术语相似度计算方法[J]. 现代图书情报技术, 2011, 27(9): 28-33.
[7]	乔建忠. 基于锚与链接文本扩展的KBES算法隧道策略[J]. 现代图书情报技术, 2011, 27(3): 45-50.
[8]	徐健张智雄肖卓邓昭俊. 科技术语语义相似度计算方法研究综述[J]. 现代图书情报技术, 2010, 26(7/8): 51-57.
[9]	刘建华,张智雄. 基于Stanford Parser的实体间关系识别[J]. 现代图书情报技术, 2009, 25(5): 1-5.
[10]	殷蜀梅,张智雄,吴振新. 一种从医学文本中实现自动关键词抽取和筛选的技术方法*[J]. 现代图书情报技术, 2008, 24(8): 31-36.
[11]	张智雄,吴振新,刘建华,徐健,洪娜,赵琦. 当前知识抽取的主要技术方法解析*[J]. 现代图书情报技术, 2008, 24(8): 2-11.
[12]	徐健,张智雄,吴振新. 实体关系抽取的技术方法综述*[J]. 现代图书情报技术, 2008, 24(8): 18-23.
[13]	刘晓娟. 网站自动评价中的指标形式化研究[J]. 现代图书情报技术, 2008, 24(4): 61-65.
[14]	魏茂乾,谢靖,马自卫. 统一检索与服务扩展系统的轻量级架构设计与实现[J]. 现代图书情报技术, 2007, 2(11): 19-22.
[15]	焦玉英,成全. 基于本体的知识网格集成服务研究*[J]. 现代图书情报技术, 2007, 2(8): 6-11.

Viewed

Full text

Abstract

Cited

Shared

Discussed