[Objective] According to the succinct and hierarchical character of scholarly article outlines, this paper concentrates on finding a method to extract important and meaningful phrases from the outlines. [Methods] This paper first adopts a combined method of linguistic rules and terminology dictionaries to identify the candidate phrases. Then, it calculates tf-idf based on syntactic dependencies between phrases, and quantifies the hierarchical feature according to hierarchical structure of outline. At last, it combines the tf-idf and the hierarchical feature to rank candidate phrases, and selects the keyphrases. [Results] Experiments show that the F-score of the candidate phrases identification reaches 89.57%, and the F-score of candidate phrases selection reaches 36.89%. [Limitations] In this method, the inadequate phrase extraction rules and the empirical values involved in weight setting during tf-idf calculation lead to non-optimal effect. [Conclusions] This method can effectively extract the keyphrase from outlines, and is suitable for keyphrase extraction from hierarchical structure.
何远标, 乐小虬, 张帆. 学术论文大纲中关键术语抽取方法研究[J]. 现代图书情报技术, 2014, 30(3): 73-79.
He Yuanbiao, Le Xiaoqiu, Zhang Fan. Research on Keyphrase Extraction from Scholarly Article Outline. New Technology of Library and Information Service, 2014, 30(3): 73-79.
[1] Kim S N, Medelyan O, Kan M Y, et al. SemEval-2010 Task 5: Automatic Keyphrase Extraction from Scientific Articles[C]. In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval'10). Stroudsburg: Association for Computational Linguistics, 2010: 21-26.
[2] Nguyen T D, Kan M. Keyphrase Extraction in Scientific Publications [C]. In: Proceedings of the 10th International Conference on Asian Digital Libraries: Looking Back 10 Years and Forging New Frontiers (ICADL'07). Berlin, Heidelberg: Springer-Verlag, 2007: 317-326.
[3] Kim S N, Kan M. Re-examining Automatic Keyphrase Extraction Approaches in Scientific Articles [C]. In: Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications (MWE'09). Stroudsburg: Association for Computational Linguistics, 2009: 9-16.
[4] HaCohen-Kerner Y, Gross Z, Masa A. Automatic Extraction and Learning of Keyphrases from Scientific Articles [C]. In: Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing'05). Berlin,Heidelberg: Springer-Verlag, 2005: 657-669.
[5] Planta E, Tonelli S. KX: A Flexible System for Keyphrase Extraction [C].In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval'10). Stroudsburg: Association for Computational Linguistics, 2010: 170-173.
[6] Alzahrani S, Palade V, Salim N, et al. Using Structural Information and Citation Evidence to Detect Significant Plagiarism Cases in Scientific Publications [J]. Journal of the American Society for Information Science and Technology, 2012, 63(2): 286-312.
[7] Nguyen T D, Luong M. WINGNUS: Keyphrase Extraction Utilizing Document Logical Structure [C]. In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval'10). Stroudsburg: Association for Computational Linguistics, 2010: 166-169.
[9] Alotaiby F, Foda S, Alkharashi I. New Approaches to Automatic Headline Generation for Arabic Documents [J]. Journal of Engineering and Computer Innovations, 2012, 3(1): 11-25.
[10] Nguyen C Q, Phan T T. An Ontology-based Approach for Key Phrase Extraction [C]. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. Stroudsburg: Association for Computational Linguistics, 2009: 181-184.
[11] Lopez C, Prince V, Roche M. Automatic Titling of Electronic Documents with Noun Phrase Extraction [C]. In: Proceedings of 2010 International Conference of Soft Computing and Pattern Recognition (SoCPaR), Paris, France. IEEE, 2010: 168-171.
[13] Berend G, Farkas R. SZTERGAK: Feature Engineering for Keyphrase Extraction [C]. In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval'10). Stroudsburg: Association for Computational Linguistics, 2010: 186-189.
[14] 计然. 计算机领域术语的自动获取和层次构建 [J]. 硅谷, 2011 (20): 29-30.(Ji Ran. Terminology Autoamtic Acquisiting and Hierarchy Building in the Field of Computer [J]. Silicon Valley, 2011(20): 29-30.)
[15] 刘里, 刘小明. 基于分隔符和上下文术语的领域现象术语抽取 [J].华南理工大学学报:自然科学版, 2011, 39(7): 146-149, 155.(Liu Li, Liu Xiaoming. Extraction of Domain-Specific Phenomenal Terms Based on Separator and Contextual Terms [J]. Journal of South China University of Technology: Natural Science Edition, 2011,39(7):146-149, 155.)
[16] 祝清松, 冷伏海. 自动术语识别存在的问题及发展趋势综述 [J]. 图书情报工作, 2012, 56(18): 104-109.(Zhu Qingsong, Leng Fuhai. Existing Problems and Developing Trends of Automatic Term Recognition [J]. Library and Information Service, 2012, 56(18): 104-109.)
[17] Li D, Li S, Li W, et al. A Semi-supervised Key Phrase Extraction Approach: Learning from Title Phrases Through a Document Semantic Network [C]. In: Proceedings of the ACL 2010 Conference Short Papers. Stroudsburg: Association for Computational Linguistics, 2010: 296-300.
[18] Liu Z, Huang W, Zheng Y, et al. Automatic Keyphrase Extraction via Topic Decomposition [C]. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP'10). Stroudsburg: Association for Computational Linguistics, 2010: 366-376.
[19] Liao L, Huang H. Microblog Keyphrase Extraction Based on Similarity Features[C]. In: Proceedings of 2013 International Conference on Advanced Computer Science and Electronics Information (ICACSEI'13).2013.
[20] Tureney P D. Coherent Keyphrase Extraction via Web Mining [C]. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI'03). San Francisco: Morgan Kaufmann Publishers Inc., 2003: 434-439.
[21] Yu F, Xuan H, Zheng D. Key-Phrase Extraction Based on a Combination of CRF Model with Document Structure [C]. In: Proceedings of the 8th International Conference on Computational Intelligence and Security (CIS'12). Washington D C: IEEE Computer Society, 2012: 406-410.
[22] Zhao X, Jiang J, He J, et al. Topical Keyphrase Extraction from Twitter [C]. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT'11). 2011.
[23] Foetsch D, Pulvermueller E. A Concept and Implementation of Higher-level XML Transformation Languages [J]. Knowledge-Based Systems, 2009, 22(3): 186-194.
[24] The Stanford Natural Language Processing Group [EB/OL]. [2013-09-24]. http://nlp.stanford.edu.
[25] Medelyan O, Witten I H. Thesaurus Based Automatic Keyphrase Indexing [C]. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'06). New York: ACM, 2006: 296-297.