政务领域本体术语的自动抽取*

doi:10.11925/infotech.1003-3513.2010.04.10

现代图书情报技术

2010, Vol. 26

Issue (4): 59-65 https://doi.org/10.11925/infotech.1003-3513.2010.04.10

情报分析与研究

本期目录 | 过刊浏览 | 高级检索

政务领域本体术语的自动抽取*

翟笃风¹ 刘柏嵩²

¹（宁波大学商学院宁波 315211）
²（宁波大学网络中心宁波 315211）

Automatic Domain-specific Term Extraction in Administrative-domain Ontology

Zhai Dufeng¹,Liu Baisong²

¹ (School of Business, Ningbo University, Ningbo 315211, China)
²(Ningbo University Network Center, Ningbo 315211, China)

摘要
参考文献
相关文章
Metrics

全文: PDF (598 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要

提出一种新的政务本体术语自动抽取的方法。首先通过中文分词技术和单字合并法提取政务文本中的词作为候选术语；通过C-value求解法和TF-IDF算法对候选术语进行过滤抽取，从而实现政务领域术语的自动抽取。通过实验比较，发现该方法在不影响领域术语抽取召回率的同时可以提高抽取术语的正确率。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	翟笃风
	刘柏嵩

关键词 ：政务领域本体 , 术语 , 单字合并法 , C-value , TFIDF算法

Abstract：

This paper introduces a new method to extract the administrative-domain Ontology term automatically. Firstly, some words that are representative of the candidate terms should be extracted through the technology of word segmentation and the characters merger method. Secondly, the candidate terms are filtered by the way of C-value method and TF-IDF algorithm to achieve the automatic domain-specific term extraction in administrative-domain Ontology. Finally，the experiment shows that this method can improve the accuracy of the extracted terms and do not affect the recall-rate.

Key words： Administrative-domain Ontology Terms Characters merger method C-value TFIDF algorithm

收稿日期: 2010-03-22 出版日期: 2010-04-25

TP391

基金资助:

*本文系国家社会科学基金项目“领域本体的自动构建和应用研究”（项目编号:08CTQ014）的研究成果之一。

通讯作者: 翟笃风 E-mail: zhaidufeng@126.com

引用本文:

翟笃风刘柏嵩. 政务领域本体术语的自动抽取*[J]. 现代图书情报技术, 2010, 26(4): 59-65.
Zhai Dufeng,Liu Baisong. Automatic Domain-specific Term Extraction in Administrative-domain Ontology. New Technology of Library and Information Service, 2010, 26(4): 59-65.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2010.04.10 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2010/V26/I4/59

［1］ eEurope 2002 Action Plan［EB/OL］.（2000-06-20）. ［2010-02-05］.http://ec.europa.eu/information _society/eeurope/2002/action_plan/pdf/actionplan_en.pdf.
［2］电子政务主题词表编制与应用系统课题组.《综合电子政务主题词表》(试用本)范畴表［M］.北京:科学技术文献出版社,2005.
［3］温春，王晓斌，石昭祥.中文领域本体学习中术语的自动抽取［J］.计算机应用研究,2009,27(7):2652-2655.
［4］ Bourigault D.Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases ［C］．In：Proceedings of the 14th International Conference on Computational Linguistics, Nantes, France.Morristown, NJ, USA：Association for Computational Linguistics，1992:977-981.
［5］ Shamsfard M, Barforoush A A ．Learning Ontologies from Natural Language Texts ［J］．International Journal of Human-Computer Studies，2004，60(1)：17-63.
［6］ Frantzi K, Ananiadou S,Mima H.Automatic Recognition of Multi-Word Terms:the C-value/NC-value Method［J］.International Journal on Digital Libraries, 2000,3(2)：115-130.
［7］ Justeson J S, Katz S M. Technical Terminology:Some Linguistic Properties and an Algorithm for Identification in Text［J］.Natural Language Engineering,1995,1（1）：9-27.
［8］ Rezgui Y.Text-based Domain Ontology Building Using TF-IDF and Metric Clusters Techniques［J］. The Knowledge Engineering Review,2007,22(4):379-403.
［9］ Pantel P,Lin D．A Statistical Corpus-based Term Extractor［C］．In：Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence:Advances in Artificial Intelligence. London, UK：Springer-Verlag，2001：36-46.
［10］杜波,田怀凤,王立,等.基于多策略的专业领域术语抽取器的设计［J］.计算机工程,2005,31(14)：159-160.
［11］ Missikoff M, Navigli R, Velardi P.Integrated Approach for Web Ontology Learning and Engineering［J］．IEEE Computer，2002，35(11)：60-63.
［12］ Tsay Y T, Tsai W H．Attributed String Matching by Split-and-Merge for On-Line Chinese Character Recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence，1993,15(2):180-185.
［13］官莹莹.面向中文文本的本体学习方法研究［D］. 长春：吉林大学,2009.

[1]	吕英杰, 范静, 刘景方. 基于文体学的中文UGC作者身份识别研究[J]. 现代图书情报技术, 2013, 29(9): 48-53.
[2]	王昊, 邹杰利, 邓三鸿. 面向中文图书的自动标引模型构建及实验分析[J]. 现代图书情报技术, 2013, 29(7/8): 55-62.
[3]	郭舒. 文献数据库中作者名消歧算法研究[J]. 现代图书情报技术, 2013, 29(7/8): 69-74.
[4]	胡昌平, 陈果. 共词分析中的词语贡献度特征选择研究[J]. 现代图书情报技术, 2013, 29(7/8): 89-93.
[5]	李霄, 丁晟春. 垃圾商品评论信息的识别研究[J]. 现代图书情报技术, 2013, 29(1): 63-68.
[6]	曾少勤, 王惠临, 张寅生. 汉语文本的最小递归语义表示研究——以名词性量化短语为例[J]. 现代图书情报技术, 2012, (10): 35-41.
[7]	宋文, 黄金霞, 刘毅, 汤怡洁. 面向知识发现的SKE关键技术及服务[J]. 现代图书情报技术, 2012, 28(7): 13-18.
[8]	王莉. 基于关键词链的动态分面研究[J]. 现代图书情报技术, 2012, 28(7): 76-81.
[9]	刘萍, 陈烨. 词汇相似度研究进展综述[J]. 现代图书情报技术, 2012, 28(7): 82-89.
[10]	朱雯晶, 夏翠娟. 二维码在图书馆移动服务中的应用——以上海图书馆为例[J]. 现代图书情报技术, 2012, 28(7): 115-120.
[11]	马健, 杜泽宇, 李树青. 基于多兴趣特征分析的图书馆个性化图书推荐方法[J]. 现代图书情报技术, 2012, 28(6): 1-8.
[12]	江华, 苏晓光. 无词典中文高频词快速抽取算法[J]. 现代图书情报技术, 2012, 28(6): 50-53.
[13]	薛建武, 白燚. 本体拓扑结构关系存储研究[J]. 现代图书情报技术, 2012, 28(5): 26-31.
[14]	肖晶, 梁冰, 张晓丹, 吕世炅. 一种面向篇级数据的作者名消歧规则和算法[J]. 现代图书情报技术, 2012, 28(5): 55-59.
[15]	李振清, 刘建毅, 王枞, 吴旭. 同行评议专家遴选系统研究与实现[J]. 现代图书情报技术, 2012, 28(5): 81-86.

Viewed

Full text

Abstract

Cited

Shared

Discussed