Please wait a minute...
Advanced Search
现代图书情报技术  2014, Vol. 30 Issue (3): 73-79     https://doi.org/10.11925/infotech.1003-3513.2014.03.11
  知识组织与知识管理 本期目录 | 过刊浏览 | 高级检索 |
学术论文大纲中关键术语抽取方法研究
何远标1,2, 乐小虬1, 张帆1,2
1 中国科学院国家科学图书馆 北京 100190;
2 中国科学院大学 北京 100049
Research on Keyphrase Extraction from Scholarly Article Outline
He Yuanbiao1,2, Le Xiaoqiu1, Zhang Fan1,2
1 National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
2 University of Chinese Academy of Sciences, Beijing 100049, China
全文: PDF (475 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的] 针对学术论文大纲内容精炼、层次性的特点,研究从中抽取重要且具有实质意义术语的方法。[方法] 结合语言学规则和术语词典从大纲各级标题中识别出候选术语集,然后根据术语间的句法依存关系计算tf-idf,并利用大纲结构量化术语层级特征,最后结合tf-idf与层级特征对候选术语进行排名,选择出关键术语。[结果] 实验证明,该方法的候选术语识别F值达到89.57%,术语选择F值达到36.89%。[局限] 采用的术语抽取规则不完备,且tf-idf计算过程中的权值设置仅使用经验值,导致未能达到最优效果。[结论] 该方法能有效抽取大纲中的关键术语,适用于层级结构中的关键术语抽取。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
乐小虬
张帆
何远标
关键词 候选术语识别候选术语选择句法依存关系层级特征    
Abstract

[Objective] According to the succinct and hierarchical character of scholarly article outlines, this paper concentrates on finding a method to extract important and meaningful phrases from the outlines. [Methods] This paper first adopts a combined method of linguistic rules and terminology dictionaries to identify the candidate phrases. Then, it calculates tf-idf based on syntactic dependencies between phrases, and quantifies the hierarchical feature according to hierarchical structure of outline. At last, it combines the tf-idf and the hierarchical feature to rank candidate phrases, and selects the keyphrases. [Results] Experiments show that the F-score of the candidate phrases identification reaches 89.57%, and the F-score of candidate phrases selection reaches 36.89%. [Limitations] In this method, the inadequate phrase extraction rules and the empirical values involved in weight setting during tf-idf calculation lead to non-optimal effect. [Conclusions] This method can effectively extract the keyphrase from outlines, and is suitable for keyphrase extraction from hierarchical structure.

Key wordsCandidate phrases identification    Candidate phrases selection    Syntactic dependencies    Hierarchical feature
收稿日期: 2013-09-26      出版日期: 2014-04-15
:  TP393  
基金资助:

本文系国家科技支撑计划子课题“基于文献知识网络的领域学术关系研究与示范”(项目编号:2011BAH10B06-04)的研究成果之一。

通讯作者: 何远标 E-mail:bill_ho@foxmail.com     E-mail: bill_ho@foxmail.com
作者简介: 作者贡献声明:何远标: 负责调研,细化研究方向及技术方法路线,设计实验方案;负责实验,包括数据采集、清洗与结构化,编程及实验结果分析;论文撰写与最终版本修订;乐小虬: 提出研究方向和论文选题方向,就研究思路、实验方案及技术路线提供指导;张帆: 数据标注,部分编程及数据分析;参与论文修改。
引用本文:   
何远标, 乐小虬, 张帆. 学术论文大纲中关键术语抽取方法研究[J]. 现代图书情报技术, 2014, 30(3): 73-79.
He Yuanbiao, Le Xiaoqiu, Zhang Fan. Research on Keyphrase Extraction from Scholarly Article Outline. New Technology of Library and Information Service, 2014, 30(3): 73-79.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.03.11      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2014/V30/I3/73

[1] Kim S N, Medelyan O, Kan M Y, et al. SemEval-2010 Task 5: Automatic Keyphrase Extraction from Scientific Articles[C]. In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval'10). Stroudsburg: Association for Computational Linguistics, 2010: 21-26.

[2] Nguyen T D, Kan M. Keyphrase Extraction in Scientific Publications [C]. In: Proceedings of the 10th International Conference on Asian Digital Libraries: Looking Back 10 Years and Forging New Frontiers (ICADL'07). Berlin, Heidelberg: Springer-Verlag, 2007: 317-326.

[3] Kim S N, Kan M. Re-examining Automatic Keyphrase Extraction Approaches in Scientific Articles [C]. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications (MWE'09). Stroudsburg: Association for Computational Linguistics, 2009: 9-16.

[4] HaCohen-Kerner Y, Gross Z, Masa A. Automatic Extraction and Learning of Keyphrases from Scientific Articles [C]. In: Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing'05). Berlin,Heidelberg: Springer-Verlag, 2005: 657-669.

[5] Planta E, Tonelli S. KX: A Flexible System for Keyphrase Extraction [C].In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval'10). Stroudsburg: Association for Computational Linguistics, 2010: 170-173.

[6] Alzahrani S, Palade V, Salim N, et al. Using Structural Information and Citation Evidence to Detect Significant Plagiarism Cases in Scientific Publications [J]. Journal of the American Society for Information Science and Technology, 2012, 63(2): 286-312.

[7] Nguyen T D, Luong M. WINGNUS: Keyphrase Extraction Utilizing Document Logical Structure [C]. In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval'10). Stroudsburg: Association for Computational Linguistics, 2010: 166-169.

[8] Wikipedia. Outline [EB/OL]. [2013-09-24]. http://en.wikipedia. org/wiki/Outline_(list)#cite_note-2.

[9] Alotaiby F, Foda S, Alkharashi I. New Approaches to Automatic Headline Generation for Arabic Documents [J]. Journal of Engineering and Computer Innovations, 2012, 3(1): 11-25.

[10] Nguyen C Q, Phan T T. An Ontology-based Approach for Key Phrase Extraction [C]. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. Stroudsburg: Association for Computational Linguistics, 2009: 181-184.

[11] Lopez C, Prince V, Roche M. Automatic Titling of Electronic Documents with Noun Phrase Extraction [C]. In: Proceedings of 2010 International Conference of Soft Computing and Pattern Recognition (SoCPaR), Paris, France. IEEE, 2010: 168-171.

[12] 百度百科. 特征 [EB/OL]. [2013-09-24]. http://baike.baidu. com/view/1069886.htm.(Baidu Baike. Characteristic[EB/OL]. [2013-09-24]. http://baike.baidu.com/view/1069886.htm.)

[13] Berend G, Farkas R. SZTERGAK: Feature Engineering for Keyphrase Extraction [C]. In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval'10). Stroudsburg: Association for Computational Linguistics, 2010: 186-189.

[14] 计然. 计算机领域术语的自动获取和层次构建 [J]. 硅谷, 2011 (20): 29-30.(Ji Ran. Terminology Autoamtic Acquisiting and Hierarchy Building in the Field of Computer [J]. Silicon Valley, 2011(20): 29-30.)

[15] 刘里, 刘小明. 基于分隔符和上下文术语的领域现象术语抽取 [J].华南理工大学学报:自然科学版, 2011, 39(7): 146-149, 155.(Liu Li, Liu Xiaoming. Extraction of Domain-Specific Phenomenal Terms Based on Separator and Contextual Terms [J]. Journal of South China University of Technology: Natural Science Edition, 2011,39(7):146-149, 155.)

[16] 祝清松, 冷伏海. 自动术语识别存在的问题及发展趋势综述 [J]. 图书情报工作, 2012, 56(18): 104-109.(Zhu Qingsong, Leng Fuhai. Existing Problems and Developing Trends of Automatic Term Recognition [J]. Library and Information Service, 2012, 56(18): 104-109.)

[17] Li D, Li S, Li W, et al. A Semi-supervised Key Phrase Extraction Approach: Learning from Title Phrases Through a Document Semantic Network [C]. In: Proceedings of the ACL 2010 Conference Short Papers. Stroudsburg: Association for Computational Linguistics, 2010: 296-300.

[18] Liu Z, Huang W, Zheng Y, et al. Automatic Keyphrase Extraction via Topic Decomposition [C]. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP'10). Stroudsburg: Association for Computational Linguistics, 2010: 366-376.

[19] Liao L, Huang H. Microblog Keyphrase Extraction Based on Similarity Features[C]. In: Proceedings of 2013 International Conference on Advanced Computer Science and Electronics Information (ICACSEI'13).2013.

[20] Tureney P D. Coherent Keyphrase Extraction via Web Mining [C]. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI'03). San Francisco: Morgan Kaufmann Publishers Inc., 2003: 434-439.

[21] Yu F, Xuan H, Zheng D. Key-Phrase Extraction Based on a Combination of CRF Model with Document Structure [C]. In: Proceedings of the 8th International Conference on Computational Intelligence and Security (CIS'12). Washington D C: IEEE Computer Society, 2012: 406-410.

[22] Zhao X, Jiang J, He J, et al. Topical Keyphrase Extraction from Twitter [C]. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT'11). 2011.

[23] Foetsch D, Pulvermueller E. A Concept and Implementation of Higher-level XML Transformation Languages [J]. Knowledge-Based Systems, 2009, 22(3): 186-194.

[24] The Stanford Natural Language Processing Group [EB/OL]. [2013-09-24]. http://nlp.stanford.edu.

[25] Medelyan O, Witten I H. Thesaurus Based Automatic Keyphrase Indexing [C]. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'06). New York: ACM, 2006: 296-297.

[1] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] 李文娜,张智雄. 基于置信学习的知识库错误检测方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 1-9.
[3] 孙羽, 裘江南. 基于网络分析和文本挖掘的意见领袖影响力研究 [J]. 数据分析与知识发现, 0, (): 1-.
[4] 王勤洁, 秦春秀, 马续补, 刘怀亮, 徐存真. 基于作者偏好和异构信息网络的科技文献推荐方法研究*[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[5] 李文娜, 张智雄. 基于联合语义表示的不同知识库中的实体对齐方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 1-9.
[6] 王昊, 林克柔, 孟镇, 李心蕾. 文本表示及其特征生成对法律判决书中多类型实体识别的影响分析[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[7] 杨晗迅, 周德群, 马静, 罗永聪. 基于不确定性损失函数和任务层级注意力机制的多任务谣言检测研究*[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
[8] 徐月梅, 王子厚, 吴子歆. 一种基于CNN-BiLSTM多特征融合的股票走势预测模型*[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[9] 黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[10] 王晰巍,贾若男,韦雅楠,张柳. 多维度社交网络舆情用户群体聚类分析方法研究*[J]. 数据分析与知识发现, 2021, 5(6): 25-35.
[11] 阮小芸,廖健斌,李祥,杨阳,李岱峰. 基于人才知识图谱推理的强化学习可解释推荐研究*[J]. 数据分析与知识发现, 2021, 5(6): 36-50.
[12] 刘彤,刘琛,倪维健. 多层次数据增强的半监督中文情感分析方法*[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[13] 陈文杰,文奕,杨宁. 基于节点向量表示的模糊重叠社区划分算法*[J]. 数据分析与知识发现, 2021, 5(5): 41-50.
[14] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[15] 闫强,张笑妍,周思敏. 基于义原相似度的关键词抽取方法 *[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn