Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (3): 73-79    DOI: 10.11925/infotech.1003-3513.2014.03.11
Current Issue | Archive | Adv Search |
Research on Keyphrase Extraction from Scholarly Article Outline
He Yuanbiao1,2, Le Xiaoqiu1, Zhang Fan1,2
1 National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
2 University of Chinese Academy of Sciences, Beijing 100049, China
Export: BibTeX | EndNote (RIS)      

[Objective] According to the succinct and hierarchical character of scholarly article outlines, this paper concentrates on finding a method to extract important and meaningful phrases from the outlines. [Methods] This paper first adopts a combined method of linguistic rules and terminology dictionaries to identify the candidate phrases. Then, it calculates tf-idf based on syntactic dependencies between phrases, and quantifies the hierarchical feature according to hierarchical structure of outline. At last, it combines the tf-idf and the hierarchical feature to rank candidate phrases, and selects the keyphrases. [Results] Experiments show that the F-score of the candidate phrases identification reaches 89.57%, and the F-score of candidate phrases selection reaches 36.89%. [Limitations] In this method, the inadequate phrase extraction rules and the empirical values involved in weight setting during tf-idf calculation lead to non-optimal effect. [Conclusions] This method can effectively extract the keyphrase from outlines, and is suitable for keyphrase extraction from hierarchical structure.

Key wordsCandidate phrases identification      Candidate phrases selection      Syntactic dependencies      Hierarchical feature     
Received: 26 September 2013      Published: 15 April 2014
:  TP393  

Cite this article:

He Yuanbiao, Le Xiaoqiu, Zhang Fan. Research on Keyphrase Extraction from Scholarly Article Outline. New Technology of Library and Information Service, 2014, 30(3): 73-79.

URL:     OR

[1] Kim S N, Medelyan O, Kan M Y, et al. SemEval-2010 Task 5: Automatic Keyphrase Extraction from Scientific Articles[C]. In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval'10). Stroudsburg: Association for Computational Linguistics, 2010: 21-26.

[2] Nguyen T D, Kan M. Keyphrase Extraction in Scientific Publications [C]. In: Proceedings of the 10th International Conference on Asian Digital Libraries: Looking Back 10 Years and Forging New Frontiers (ICADL'07). Berlin, Heidelberg: Springer-Verlag, 2007: 317-326.

[3] Kim S N, Kan M. Re-examining Automatic Keyphrase Extraction Approaches in Scientific Articles [C]. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications (MWE'09). Stroudsburg: Association for Computational Linguistics, 2009: 9-16.

[4] HaCohen-Kerner Y, Gross Z, Masa A. Automatic Extraction and Learning of Keyphrases from Scientific Articles [C]. In: Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing'05). Berlin,Heidelberg: Springer-Verlag, 2005: 657-669.

[5] Planta E, Tonelli S. KX: A Flexible System for Keyphrase Extraction [C].In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval'10). Stroudsburg: Association for Computational Linguistics, 2010: 170-173.

[6] Alzahrani S, Palade V, Salim N, et al. Using Structural Information and Citation Evidence to Detect Significant Plagiarism Cases in Scientific Publications [J]. Journal of the American Society for Information Science and Technology, 2012, 63(2): 286-312.

[7] Nguyen T D, Luong M. WINGNUS: Keyphrase Extraction Utilizing Document Logical Structure [C]. In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval'10). Stroudsburg: Association for Computational Linguistics, 2010: 166-169.

[8] Wikipedia. Outline [EB/OL]. [2013-09-24]. http://en.wikipedia. org/wiki/Outline_(list)#cite_note-2.

[9] Alotaiby F, Foda S, Alkharashi I. New Approaches to Automatic Headline Generation for Arabic Documents [J]. Journal of Engineering and Computer Innovations, 2012, 3(1): 11-25.

[10] Nguyen C Q, Phan T T. An Ontology-based Approach for Key Phrase Extraction [C]. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. Stroudsburg: Association for Computational Linguistics, 2009: 181-184.

[11] Lopez C, Prince V, Roche M. Automatic Titling of Electronic Documents with Noun Phrase Extraction [C]. In: Proceedings of 2010 International Conference of Soft Computing and Pattern Recognition (SoCPaR), Paris, France. IEEE, 2010: 168-171.

[12] 百度百科. 特征 [EB/OL]. [2013-09-24]. com/view/1069886.htm.(Baidu Baike. Characteristic[EB/OL]. [2013-09-24].

[13] Berend G, Farkas R. SZTERGAK: Feature Engineering for Keyphrase Extraction [C]. In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval'10). Stroudsburg: Association for Computational Linguistics, 2010: 186-189.

[14] 计然. 计算机领域术语的自动获取和层次构建 [J]. 硅谷, 2011 (20): 29-30.(Ji Ran. Terminology Autoamtic Acquisiting and Hierarchy Building in the Field of Computer [J]. Silicon Valley, 2011(20): 29-30.)

[15] 刘里, 刘小明. 基于分隔符和上下文术语的领域现象术语抽取 [J].华南理工大学学报:自然科学版, 2011, 39(7): 146-149, 155.(Liu Li, Liu Xiaoming. Extraction of Domain-Specific Phenomenal Terms Based on Separator and Contextual Terms [J]. Journal of South China University of Technology: Natural Science Edition, 2011,39(7):146-149, 155.)

[16] 祝清松, 冷伏海. 自动术语识别存在的问题及发展趋势综述 [J]. 图书情报工作, 2012, 56(18): 104-109.(Zhu Qingsong, Leng Fuhai. Existing Problems and Developing Trends of Automatic Term Recognition [J]. Library and Information Service, 2012, 56(18): 104-109.)

[17] Li D, Li S, Li W, et al. A Semi-supervised Key Phrase Extraction Approach: Learning from Title Phrases Through a Document Semantic Network [C]. In: Proceedings of the ACL 2010 Conference Short Papers. Stroudsburg: Association for Computational Linguistics, 2010: 296-300.

[18] Liu Z, Huang W, Zheng Y, et al. Automatic Keyphrase Extraction via Topic Decomposition [C]. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP'10). Stroudsburg: Association for Computational Linguistics, 2010: 366-376.

[19] Liao L, Huang H. Microblog Keyphrase Extraction Based on Similarity Features[C]. In: Proceedings of 2013 International Conference on Advanced Computer Science and Electronics Information (ICACSEI'13).2013.

[20] Tureney P D. Coherent Keyphrase Extraction via Web Mining [C]. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI'03). San Francisco: Morgan Kaufmann Publishers Inc., 2003: 434-439.

[21] Yu F, Xuan H, Zheng D. Key-Phrase Extraction Based on a Combination of CRF Model with Document Structure [C]. In: Proceedings of the 8th International Conference on Computational Intelligence and Security (CIS'12). Washington D C: IEEE Computer Society, 2012: 406-410.

[22] Zhao X, Jiang J, He J, et al. Topical Keyphrase Extraction from Twitter [C]. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT'11). 2011.

[23] Foetsch D, Pulvermueller E. A Concept and Implementation of Higher-level XML Transformation Languages [J]. Knowledge-Based Systems, 2009, 22(3): 186-194.

[24] The Stanford Natural Language Processing Group [EB/OL]. [2013-09-24].

[25] Medelyan O, Witten I H. Thesaurus Based Automatic Keyphrase Indexing [C]. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'06). New York: ACM, 2006: 296-297.

[1] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] Li Wenna,Zhang Zhixiong. Research on Knowledge Base Error Detection Method Based on Confidence Learning[J]. 数据分析与知识发现, 2021, 5(9): 1-9.
[3] Sun Yu, Qiu Jiangnan. Research on Influence of Opinion Leaders Based on Network Analysis and Text Mining [J]. 数据分析与知识发现, 0, (): 1-.
[4] Wang Qinjie, Qin Chunxiu, Ma Xubu, Liu Huailiang, Xu Cunzhen. Recommending Scientific Literature Based on Author Preference and Heterogeneous Information Network[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[5] Li Wenna, Zhang Zhixiong. Entity Alignment Method for Different Knowledge Repositories with Joint Semantic Representation[J]. 数据分析与知识发现, 2021, 5(7): 1-9.
[6] Wang Hao, Lin Kerou, Meng Zhen, Li Xinlei. Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[7] Yang Hanxun, Zhou Dequn, Ma Jing, Luo Yongcong. Detecting Rumors with Uncertain Loss and Task-level Attention Mechanism[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
[8] Xu Yuemei, Wang Zihou, Wu Zixin. Predicting Stock Trends with CNN-BiLSTM Based Multi-Feature Integration Model[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[9] Huang Mingxuan,Jiang Caoqing,Lu Shoudong. Expanding Queries Based on Word Embedding and Expansion Terms[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[10] Wang Xiwei,Jia Ruonan,Wei Yanan,Zhang Liu. Clustering User Groups of Public Opinion Events from Multi-dimensional Social Network[J]. 数据分析与知识发现, 2021, 5(6): 25-35.
[11] Ruan Xiaoyun,Liao Jianbin,Li Xiang,Yang Yang,Li Daifeng. Interpretable Recommendation of Reinforcement Learning Based on Talent Knowledge Graph Reasoning[J]. 数据分析与知识发现, 2021, 5(6): 36-50.
[12] Liu Tong,Liu Chen,Ni Weijian. A Semi-Supervised Sentiment Analysis Method for Chinese Based on Multi-Level Data Augmentation[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[13] Chen Wenjie,Wen Yi,Yang Ning. Fuzzy Overlapping Community Detection Algorithm Based on Node Vector Representation[J]. 数据分析与知识发现, 2021, 5(5): 41-50.
[14] Zhang Guobiao,Li Jie. Detecting Social Media Fake News with Semantic Consistency Between Multi-model Contents[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[15] Yan Qiang,Zhang Xiaoyan,Zhou Simin. Extracting Keywords Based on Sememe Similarity[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938