Please wait a minute...
Advanced Search
现代图书情报技术  2016, Vol. 32 Issue (9): 34-41    DOI: 10.11925/infotech.1003-3513.2016.09.04
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
一种分布式语义增强的词汇链文本表示模型构建方法
曲云鹏1,2,3(),王文玲3
1中国科学院大学 北京100049
2中国科学院文献情报中心 北京100190
3国家图书馆 北京100081
Using Semantic Model to Build Lexical Chains
Qu Yunpeng1,2,3(),Wang Wenling3
1University of Chinese Academy of Sciences, Beijing 100049, China
2National Science Library, Chinese Academy of Sciences, Beijing 100190, China
3National Library of China, Beijing 100081, China
全文: PDF(442 KB)   HTML ( 24
输出: BibTeX | EndNote (RIS)      
摘要 

目的】利用分布式语义关联计算词衔接关系, 解决目前词汇链构建时存在的词间关系探测深度不够等问题, 提高词汇链构建质量。【方法】对词汇链构建的技术方法进行归纳, 利用WordNet词典关系来计算文本中语言单元的语义关联, 利用分布式记忆模型来计算语言单元之间的潜在语义关系, 将这两种语义关系结合起来实现词汇链文本表示模型的构建。同时在理论研究的基础之上选择医学领域科技论文进行对比实验。【结果】在文本主题描述方面, 本文方法的词汇链构建结果要优于非贪婪算法, 算法耗时与非贪婪算法相当。【局限】算法耗时较长; 没有完整考虑词衔接关系; 只在对医学领域科技文献的主题识别中验证了该方法的有效性, 还需要在更多领域进行证明。【结论】分布式语义关联可以识别潜在语义, 对使用多元短语构建词汇链也有较大的帮助, 能有效地增强词汇链构建效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
曲云鹏
王文玲
关键词 WordNet分布式记忆词汇链分布式语义    
Abstract

[Objective] This paper uses Distributional Semantics to build high quality lexical chains. [Methods] First, we built an algorithm using WordNet Thesaurus to compute the semantic relations among language units of the texts. Second, we adopted the Distributional Memory Model to compute their latent semantic relations. Finally, we combined these relations to build the lexical chains, which were examined with papers from medical science. [Results] The proposed algorithm was better than the non-greedy methods to describe the papers’ topics. [Limitations] The efficiency of the algorithm needs to be improved. It should also be examined with papers from other fields. [Conclusions] The proposed model can detect the latent semantic relation, and then improve the quality of lexical chains building with phrases.

Key wordsWordNet    Distributional Memory    Lexical Chain    Distributional Semantics
收稿日期: 2016-04-08     
引用本文:   
曲云鹏,王文玲. 一种分布式语义增强的词汇链文本表示模型构建方法[J]. 现代图书情报技术, 2016, 32(9): 34-41.
Qu Yunpeng,Wang Wenling. Using Semantic Model to Build Lexical Chains. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2016.09.04.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2016.09.04
[1] Manabu O, Takeo H.Word Sense Disambiguation and Text Segmentation Based on Lexical Cohesion [C]. In: Proceedings of the 15th Conference on Computational Linguistics-Volume 2. Stroudsburg: Association for Computational Linguistics, 1994: 755-761.
[2] Barzilay R, Elhadad M.Using Lexical Chains for Text Summarization [A]. // Mani I, Maybury M T. Advances in Automatic Text Summarization[M].Cambridge: MIT Press, 1999: 357-380.
[3] Li S, You W, Li T, et al.Lexical-chain and It’s Application in Text Filtering [C]. In: Proceedings of the International Conference on Information Technology: Coding and Computing. Washington: IEEE Computer Society, 2004: 288-292.
[4] Moldovan D, Novischi A.Lexical Chains for Question Answering [C]. In: Proceedings of the 19th International Conference on Computational Linguistics-Volume 1. Stroudsburg: Association for Computational Linguistics, 2002: 1-7.
[5] St-Onge D.Detecting and Correcting Malapropisms with Lexical Chains [D]. Toronto: University of Toronto, 1995.
[6] Naveen Kumar M, Suresh R.Emotion Detection Using Lexical Chains[J]. International Journal of Computer Applications, 2012, 57(4): 1-4.
[7] 曲云鹏, 王文玲. 词汇链文本表示模型计算方法综述[J]. 知识管理论坛, 2016(2): 136-144.
[7] (Qu Yunpeng, Wang Wenling.An Overview on the Computing Method of the Lexical Chain Text Representation[J]. Knowledge Management Forum, 2016(2): 136-144.)
[8] Hirst G, St-Onge D.Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms[J]. Lecture Notes in Physics, 1995, 728(9): 123-149.
[9] Morris J, Hirst G.Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text[J]. Computational Linguistics, 1991, 17(1): 21-48.
[10] 刘铭, 王晓龙, 刘远超. 基于词汇链的关键短语抽取方法的研究[J]. 计算机学报, 2010, 33(7): 1246-1255.
[10] (Liu Ming, Wang Xiaolong, Liu Yuanchao.Research of Key-Phrase Extraction Based on Lexical Chain[J]. Chinese Journal of Computers, 2010, 33(7): 1246-1255.)
[11] 胡学钢, 李星华, 谢飞, 等. 基于词汇链的中文新闻网页关键词抽取方法[J]. 模式识别与人工智能, 2010, 23(1): 45-51.
[11] (Hu Xuegang, Li Xinghua, Xie Fei, et al.Keyword Extraction Based on Lexical Chains for Chinese News Web Pages[J]. Pattern Recognition and Artificial Intelligence, 2010, 23(1): 45-51.)
[12] 裘江南, 罗志成, 王延章. 基于词汇链的应急预案主题抽取方法研究[J]. 情报学报, 2008, 27(6): 891-896.
[12] (Qiu Jiangnan, Luo Zhicheng, Wang Yanzhang.Research on Semantic Relatedness Based Subjects Extraction from Emergency Plans Literature[J]. Journal of the China Society for Scientific and Technical Information, 2008, 27(6): 891-896.)
[13] Dias G, Santos C, Cleuziou G.Automatic Knowledge Representation Using a Graph-based Algorithm for Language-independent Lexical Chaining [C]. In: Proceedings of the Workshop on Information Extraction Beyond the Document. Stroudsburg: Association for Computational Linguistics, 2006: 36-47.
[14] Remus S, Biemann C.Three Knowledge-free Methods for Automatic Lexical Chain Extraction [C]. In: Proceedings of NAACL-HLT 2013. Stroudsburg: Association for Computational Linguistics, 2013: 989-999.
[15] 叶春蕾, 冷伏海. 基于词汇链的路线图关键词抽取方法研究[J]. 现代图书情报技术, 2013(1): 50-56.
[15] (Ye Chunlei, Leng Fuhai.Study on the Keyword Extraction from Roadmap Based on the Lexical Chains[J]. New Technology of Library and Information Service, 2013(1): 50-56.)
[16] Medelyan O.Computing Lexical Chains with Graph Clustering [C]. In: Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop. Stroudsburg: Association for Computational Linguistics, 2007: 85-90.
[17] Marathe M, Hirst G.Lexical Chains Using Distributional Measures of Concept Distance [C]. In: Proceedings of the 11th International Conference on Computational Linguistics. 2010: 291-302.
[18] Basili R, Pennacchiotti M.Distributional Lexical Semantics: Toward Uniform Representation Paradigms for Advanced Acquisition and Processing Tasks[J]. Natural Language Engineering, 2010, 16(4): 347-358.
[19] Molino P, Basile P, Caputo A, et al.Exploiting Distributional Semantic Models in Question Answering [C]. In: Proceedings of the 2012 IEEE 6th International Conference on Semantic Computing. Washington, DC: IEEE Computer Society, 2012: 146-153.
[20] Padó S, Lapata M.Dependency-based Construction of Semantic Space Models[J]. Computational Linguistics, 2007, 33(2): 161-199.
[21] Landauer T K, Dumais S T.A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge[J]. Psychological Review, 1997, 104(2): 211-240.
[22] Sahlgren M.An Introduction to Random Indexing [C]. In: Proceedings of Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, Copenhagen, Denmark. 2005.
[23] Baroni M, Lenci A. One Distributional Memory, Many Semantic Spaces [C]. In: Proceedings of the Workshop on Geometrical Models of Natural Language Semantics. Stroudsburg, PA: Association for Computational Linguistics, 2009: 1-8.
[24] Baroni M, Lenci A.Distributional Memory: A General Framework for Corpus-based Semantics[J]. Computational Linguistics, 2010, 36(4): 673-721.
[25] Padó S, Utt J.A Distributional Memory for German [C]. In: Proceedings of the KONVENS 2012. 2012: 462-470.
[26] ?najder J, Padó S, Agi? ?.Building and Evaluating a Distributional Memory for Croatian [C]. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics.2013: 784-789.
[27] De Marneffe M-C, Manning C D. Stanford Typed Dependencies Manual [EB/OL]. [2016-04-07]. .
[28] Evert S.The Statistics of Word Cooccurrences [Elektronische Ressource]: Word Pairs and Collocations [D]. Stuttgart: University of Stuttgart, 2005.
[29] Turney P D, Pantel P.From Frequency to Meaning: Vector Space Models of Semantics[J]. Journal of Artificial Intelligence Research, 2010, 37(4): 141-188.
[30] Fellbaum C, Miller G.WordNet: An Electronic Lexical Database [M]. Cambridge, MA: MIT Press, 1998.
[31] Silber H G, Mccoy K F.Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization[J]. Computational Linguistics, 2002, 28(4): 487-496.
[32] Barzilay R, Elhadad M.Using Lexical Chains for Text Summarization [C]. In: Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization. 1997: 10-17.
[33] Manning C D, Surdeanu M, Bauer J, et al.The Stanford CoreNLP Natural Language Processing Toolkit [C]. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2014: 55-60.
[34] Hoey M.Patterns of Lexis in Text [M]. Oxford University Press, 1991.
[1] 叶春蕾, 冷伏海. 基于词汇链的路线图关键词抽取方法研究[J]. 现代图书情报技术, 2013, 29(1): 50-56.
[2] 白如江, 于晓繁, 王效岳. 国内外主要本体库比较分析研究[J]. 现代图书情报技术, 2011, 27(1): 3-13.
[3] 王效岳, 胡泽文, 白如江. WordNet与SUMO本体之间的映射机制研究[J]. 现代图书情报技术, 2011, 27(1): 22-30.
[4] 胡泽文, 王效岳, 白如江. 基于SUMO和WordNet本体集成的文本分类模型研究[J]. 现代图书情报技术, 2011, 27(1): 31-38.
[5] 翟东升,刘晨,欧阳轶慧. 专利信息获取分析系统设计与实现*[J]. 现代图书情报技术, 2009, 25(5): 55-60.
[6] 饶洋辉,叶良,程洁. WordNet在文本聚类中的应用研究*[J]. 现代图书情报技术, 2009, (10): 67-70.
[7] 贾君枝,董刚. 汉语框架网络本体与VerbNet、WordNet集成研究*[J]. 现代图书情报技术, 2008, 24(6): 6-10.
[8] 张会平,吕学强,施水才,李渝勤 . 基于WordNet的语义分布词典建设*[J]. 现代图书情报技术, 2007, 2(3): 55-59.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn