Please wait a minute...
Advanced Search
现代图书情报技术  2016, Vol. 32 Issue (1): 73-80     https://doi.org/10.11925/infotech.1003-3513.2016.01.11
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
中文领域专业术语层次关系构建研究*
朱惠(),杨建林,王昊
南京大学信息管理学院 南京 210023.江苏省数据工程与知识服务重点实验室 南京 210023
Study on Construction of Domain Terminology Taxonomic Relation
Hui Zhu(),Jianlin Yang,Hao Wang
School of Information Management, Nanjing University, Nanjing 210023, China.Jiangsu Key Laboratory of Data Engineering and Knowledge Services, Nanjing 210023, China
全文: PDF (480 KB)   HTML ( 41
输出: BibTeX | EndNote (RIS)      
摘要 【目的】对如何从中文非结构化文本获取术语的层次关系进行探讨。【方法】从CNKI获取数字图书馆学科领域文献, 通过术语抽取、术语向量空间模型构建、BIRCH算法聚类和聚类标签确定构建术语的语义层次结构。【结果】构建数字图书馆领域术语的层次结构, 并对构建结果进行验证, 聚类正确率达到80.88%, 类标签抽取正确率达到89.71%。【局限】 对构建效果的验证是通过随机抽样进行的, 且仅与一种其他构建方法进行实证比较。【结论】应用BIRCH算法聚类构建术语层次结构, 该方法与K-means聚类方法相比具有明显优势, 具备较高的执行效率和聚类有效性。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
朱惠
杨建林
王昊
关键词 术语层次关系本体本体学习聚类    
Abstract

[Objective] Discuss how to obtain the terminology taxonomic relation from Chinese domain unstructured text. [Methods] Based on Digital Library domain text from CNKI, construct terminology hierarchy by terminology extraction, terminology Vector Space Model construction, BIRCH clustering and cluster tag distribution. [Results] Obtain the terminology taxonomic relation of Digital Library domain, and evaluate the effectiveness. The accuracy of clustering reaches up to 80.88%, and the accuracy of cluster tag extraction reaches up to 89.71%. [Limitations] Evaluate the effectiveness by random sampling, and in comparison with one method only. [Conclusions] Making use of BIRCH algorithm to construct terminology taxonomic relation, this algorithm has obvious advantage compared with K-means clustering method, and has higher execution and clustering effectiveness.

Key wordsTerminology    Taxonomic relation    Ontology    Ontology learning    Clustering
收稿日期: 2015-06-19      出版日期: 2016-02-04
基金资助:*本文系江苏省自然科学基金项目“面向专利预警的中文本体学习研究”(项目编号:BK20130587)和中央高校基本科研业务费专项资金项目“我国图书情报学科知识结构及演化动态研究”(项目编号:20620140645)的研究成果之一
引用本文:   
朱惠,杨建林,王昊. 中文领域专业术语层次关系构建研究*[J]. 现代图书情报技术, 2016, 32(1): 73-80.
Hui Zhu,Jianlin Yang,Hao Wang. Study on Construction of Domain Terminology Taxonomic Relation. New Technology of Library and Information Service, 2016, 32(1): 73-80.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2016.01.11      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2016/V32/I1/73
[1] Gruber T R.A Translation Approach to Portable Ontology Specifications[J]. Knowledge Acquisition, 1993, 5(2): 199-220.
[2] Rios-Alvarado A B, Lopez-Arevalo I, Sosa-Sosa V J. Learning Concept Hierarchies from Textual Resources for Ontologies Construction[J]. Expert Systems with Applications, 2013, 40(15): 5907-5915.
[3] 温春, 石昭祥, 张霄. 本体概念层次获取方法综述[J]. 计算机应用与软件, 2010, 27(9): 103-107.
[3] (Wen Chun, Shi Zhaoxiang, Zhang Xiao.A Survey on Ontology Concept Hierarchy Acquisition[J]. Computer Applications and Software, 2010, 27(9): 103-107.)
[4] Harries Z S.Mathematical Structures of Language[M]. New York: Wiley, 1968.
[5] Miller G A, Charles W.Contextual Correlates of Semantic Similarity[J]. Language and Cognitive Processes, 1991, 6(1): 1-28.
[6] Sun C, Zhao M, Long Y J.Learning Concepts and Taxonomic Relations by Metric Learning for Regression[J]. Communications in Statistics-Theory and Methods, 2014, 43(14): 2938-2950.
[7] Hu F H, Shao Z Q, Ruan T. Self-Supervised Chinese Ontology Learning from Online Encyclopedias [J]. The Scientific World Journal, 2014: Article ID 848631.
[8] Colace F, De Santo M, Greco L, et al.Terminological Ontology Learning and Population Using Latent Dirichlet Allocation[J]. Journal of Visual Languages and Computing, 2014, 25(6): 818-826.
[9] Meijer K, Frasincar F, Hogenboom F.A Semantic Approach for Extracting Domain Taxonomies from Text[J]. Decision Support Systems, 2014,62:78-93.
[10] De Knijff J, Frasincar F, Hogenboom F.Domain Taxonomy Learning from Text: The Subsumption Method Versus Hierarchical Clustering[J]. Data & Knowledge Engineering, 2013, 83: 54-69.
[11] 季培培, 鄢小燕, 岑咏华, 等. 面向领域中文文本信息处理的术语语义层次获取研究[J]. 现代图书情报技术, 2010(9): 37-41.
[11] (Ji Peipei, Yan Xiaoyan, Cen Yonghua, et al.Research of Term Semantic Hierarchy Induction for Domain-specific Chinese Text Information Processing[J]. New Technology of Library and Information Service, 2010(9): 37-41.)
[12] 林源, 陈志泊, 孙俏. 计算机领域术语的自动获取与层次构建[J]. 计算机工程, 2011, 37(2): 172-174.
[12] (Lin Yuan, Chen Zhibo, Sun Qiao.Computer Domain Term Automatic Extraction and Hierarchical Structure Building[J]. Computer Engineering, 2011, 37(2): 172-174.)
[13] 彭成, 季培培. 基于确定性退火的中文术语语义层次关联研究[J]. 计算机应用研究, 2011, 28(9): 3235-3238.
[13] (Peng Cheng, Ji Peipei.Research of Term Semantic Hierarchy Correlations Based on Deterministic Annealing[J]. Application Research of Computers, 2011, 28(9): 3235-3238.)
[14] 谷俊, 朱紫阳. 基于聚类算法的本体层次关系获取研究[J]. 现代图书情报技术, 2011(12): 46-51.
[14] (Gu Jun, Zhu Ziyang.Study on Ontology Hierarchy Relation Induction on Clustering Algorithm[J]. New Technology of Library and Information Service, 2011(12): 46-51.)
[15] 韩红旗, 徐硕, 桂婕, 等. 基于词形规则模板的术语层次关系抽取方法[J]. 情报学报, 2013, 32(7): 708-715.
[15] (Han Hongqi, Xu Shuo, Gui Jie, et al.Term Hierarchical Relation Extraction Method Based on Morphology Rule Template[J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(7): 708-715.)
[16] 涂鼎, 陈岭, 陈根才, 等. 基于多路层次聚类的商品评论数据概念分类构建[J]. 计算机研究与发展, 2013, 50(S): 208-215.
[16] (Tu Ding, Chen Ling, Chen Gencai, et al.Multi-way Hierarchical Clustering Based Concept Taxonomy Construction for Product Reviews[J]. Journal of Computer Research and Development, 2013, 50(S): 208-215.)
[17] 李树青. 基于引文关键词加权共现技术的图情学科领域本体自动构建方法研究[J]. 情报学报, 2012, 31(4): 371-380.
[17] (Li Shuqing.Research on Automatic Construction of Domain Ontology in Library and Information Science Based on Weighted Co-occurrence of Citation Keywords[J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(4): 371-380.)
[18] Zhang T, Ramakrishnan R, Livny M.BIRCH: A New Data
[18] Clustering Algorithm and Its Applications[J]. Data Mining and Knowledge Discovery, 1997, 1(2): 141-182.
[19] NLPIR [EB/OL]. [2014-06-03]. .
[20] 王昊, 苏新宁, 朱惠. 中文医学专业术语的层次结构生成研究[J]. 情报学报, 2014, 33(6): 594-604.
[20] (Wang Hao, Su Xinning, Zhu Hui.Study on Hierarchy Structure Generation of Chinese Medical Terminology[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(6): 594-604.)
[1] 王若琳, 牛振东, 蔺奇卡, 朱一凡, 邱萍, 陆浩, 刘东磊. 基于异质信息嵌入与RNN聚类参数预测的作者姓名消歧方法*[J]. 数据分析与知识发现, 2021, 5(8): 13-24.
[2] 王晰巍,贾若男,韦雅楠,张柳. 多维度社交网络舆情用户群体聚类分析方法研究*[J]. 数据分析与知识发现, 2021, 5(6): 25-35.
[3] 卢利农,祝忠明,张旺强,王小春. 基于Lingo3G聚类算法的机构知识库跨库知识整合与知识指纹服务实现[J]. 数据分析与知识发现, 2021, 5(5): 127-132.
[4] 张梦瑶, 朱广丽, 张顺香, 张标. 基于情感分析的微博热点话题用户群体划分模型 *[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[5] 盛姝, 黄奇, 杨洋, 解绮雯, 秦新国. HL7 FHIR框架下中国医疗领域信息交换研究与解决方案[J]. 数据分析与知识发现, 2021, 5(11): 13-28.
[6] 丁浩, 艾文华, 胡广伟, 李树青, 索炜. 融合用户兴趣波动时序的个性化推荐模型*[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[7] 杨辰, 陈晓虹, 王楚涵, 刘婷婷. 基于用户细粒度属性偏好聚类的推荐策略*[J]. 数据分析与知识发现, 2021, 5(10): 94-102.
[8] 于丰畅,程齐凯,陆伟. 基于几何对象聚类的学术文献图表定位研究[J]. 数据分析与知识发现, 2021, 5(1): 140-149.
[9] 温萍梅,叶志炜,丁文健,刘颖,徐健. 命名实体消歧研究进展综述*[J]. 数据分析与知识发现, 2020, 4(9): 15-25.
[10] 曾桢,李纲,毛进,陈璟浩. 区域公共安全数据治理与业务领域本体研究*[J]. 数据分析与知识发现, 2020, 4(9): 41-55.
[11] 邬金鸣,侯跃芳,崔雷. 基于医学主题词标引规则的词共现聚类分析结果自动判读和表达的研究[J]. 数据分析与知识发现, 2020, 4(9): 133-144.
[12] 席运江, 杜蝶蝶, 廖晓, 仉学红. 基于超网络的企业微博用户聚类研究及特征分析*[J]. 数据分析与知识发现, 2020, 4(8): 107-118.
[13] 李轲禹,王昊,龚丽娟,唐慧慧. 学术数据库中研究主题术语的质量测度及分布研究*[J]. 数据分析与知识发现, 2020, 4(6): 91-108.
[14] 杨旭,钱晓东. 基于改进的Vicsek模型的社会网络同步聚类算法*[J]. 数据分析与知识发现, 2020, 4(4): 119-128.
[15] 熊欣,王昊,张海潮,张宝隆. 中文术语粒度对其区分能力测度的影响分析*[J]. 数据分析与知识发现, 2020, 4(2/3): 143-152.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn