|
|
Using Intelligent System to Extract Search Terms for Sci-Tech Novelty Retrieval |
Wang Peixia1,2,Yu Hai1,2,Chen Li1,2,Wang Yongji1() |
1Institute of Software, Chinese Academy of Sciences, Beijing 100190, China 2University of Chinese Academy of Sciences, Beijing 100049, China |
|
|
Abstract [Objective] This paper aims to identifying the search terms more effectively in sci-tech novelty retrieval, which could reduce the subjectivity, heavy workload, de-normalization and time-consuming issues facing the manual methods. [Context] We used the corpus generated by the sci-tech novelty retrieval as the source of domain knowledge to extract search terms. Then, we discussed the relationship between the corpus and the keyword extraction. [Methods] We proposed an incremental iterative method to extract keywords from the sci-tech novelty retrieval project with the help of domain feature expansion. [Results] Compared to search terms from the real world sci-tech novelty retrieval, the recall rates of the 10 search terms extracted by the new method reached 80%. [Conclusions] The proposed method could identify most keywords and then improve the efficiency and effectiveness of the novelty retrieval tasks.
|
Received: 28 July 2016
Published: 20 December 2016
|
[1] | 黄江玲. 影响科技查新质量的重要因子分析[J]. 情报探索, 2008(8): 67-68. | [1] | (Huang Jiangling.Analysis of Important Factors Affecting the Quality of Science and Technology Novelty Search[J]. Information Research, 2008(8): 67-68.) | [2] | 曹欢增. 提高科技文献查全率的几项措施[J]. 科技情报开发与经济, 2008, 18(32): 72-74. | [2] | (Cao Huanzeng.Some Measures for Increasing the Recall Ratio of Sci-tech Literatures[J]. Sci-Tech Information Development & Economy, 2008, 18(32): 72-74.) | [3] | 陈予琳. 关键词检索方法在科技查新中的应用研究[J]. 河南师范大学学报: 自然科学版, 2011, 39(3): 171-173. | [3] | (Chen Yulin.Keyword Search Method Application Research on Science and Technology Novelty Check[J]. Journal of Henan Normal University: Natural Science Edition, 2011, 39(3): 171-173.) | [4] | 张柏秋, 吴晓鐄. 科技查新检索中的关键词选择[J]. 情报科学, 2008, 26(9): 1344-1348. | [4] | (Zhang Baiqiu, Wu Xiaohuang.Keywords Selection in Science Technology Novelty Retrieval[J]. Information Science, 2008, 26(9): 1344-1348.) | [5] | Hasan K, Ng V.Automatic Keyphrase Extraction: A Survey of the State of the Art [C]. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014: 1262-1273. | [6] | Frank E, Paynter G W, Witten I H, et al.Domain-specific Learning Algorithms for Keyphrase Extraction [C]. In: Proceedings of the 16th International Conference on Artificial Intelligence (IJCAI-99), 1999: 668-673. | [7] | Turney P D.Learning Algorithms for Keyphrase Extraction[J]. Information Retrieval, 2002, 2(4): 303-336. | [8] | Nguyen T D, Kan M-Y.Keyphrase Extraction in Scienti?c Publications [C]. In: Proceedings of International Conference on Asian Digital Libraries (ICADL), 2007: 317-326. | [9] | Lopez P, Romary L.HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID[C]. In: Proceedings of International Workshop on Semantic Evaluation. Association for Computational Linguistics, 2010: 248-251. | [10] | Krapivin M, Autayeu M, Marchese M, et al.Improving Machine Learning Approaches for Keyphrases Extraction from Scienti?c Documents with Natural Language Knowledge [C]. In: Proceedings of the Joint JCDL/ICADL’ International Digital Libraries Conference, 2010: 102-111. | [11] | Jiang X, Hu Y, Li H.A Ranking Approach to Keyphrase Extraction [C]. In: Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009: 756-757. | [12] | Turney P D.Coherent Keyphrase Extraction via Web Mining[C]. In: Proceedings of the 18th International Joint Conference on Arti?cial Intelligence, 2003: 434-439. | [13] | Kumar N, Srinathan K.Automatic Keyphrase Extraction from Scientific Documents Using N-gram Filtration Technique [C]. In: Proceedings of the 8th ACM Symposium on Document Engineering. 2008: 199-208. | [14] | 潘丽敏, 吴军华, 林萌, 等. 融合多特征的中文关键词提取方法[J]. 信息网络安全, 2014(8): 40-44. | [14] | (Pan Limin, Wu Junhua, Lin Meng, et al.Algorithm of Chinese Keywords Extraction Based on Multi-feature[J]. Netinfo Security, 2014(8): 40-44.) | [15] | Hulth A.Improved Automatic Keyword Extraction Given More Linguistic Knowledge [C]. In: Proceedings of Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2003: 216-223. | [16] | Pasquier C.Task 5: Single Document Keyphrase Extraction Using Sentence Clustering and Latent Dirichlet Allocation [C]. In: Proceedings of the 5th International Workshop on Semantic Evaluation. Association for Computational Linguistics, 2010: 154-157. | [17] | 石晶, 李万龙. 基于LDA模型的主题词抽取方法[J]. 计算机工程, 2010, 36(19): 81-83. | [17] | (Shi Jing, Li Wanlong.Topic Words Extraction Method Based on LDA Model[J]. Computer Engineering, 2010, 36(19): 81-83.) | [18] | 刘俊, 邹东升, 邢欣来, 等. 基于主题特征的关键词抽取[J]. 计算机应用研究, 2012, 29(11): 4224-4227. | [18] | (Liu Jun, Zou Dongsheng, Xing Xinlai, et al.Keyphrase Extraction Based on Topic Feature[J]. Application Research of Computers, 2012, 29(11): 4224-4227.) | [19] | Mihalcea R, Tarau P.TextRank: Bringing Order into Texts [C]. In: Proceedings of EMNLP-04 and the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411. | [20] | Page L, Rrin S, Motwani R, et al.The PageRank Citation Ranking: Bringing Order to the Web [C]. In: Proceedings of the 7th International World Wide Web Conference. 1998: 1-14. | [21] | 韩其琛, 李冬梅. 基于叙词表的林业信息语义检索模型[J]. 计算机科学与探索, 2016, 10(1): 122-129. | [21] | (Han Qichen, Li Dongmei.Semantic Model with Thesaurus for Forestry Information Retrieval[J]. Journal of Frontiers of Computer Science & Technology, 2016, 10(1): 122-129.) | [22] | 熊霞. 基于叙词表词间关系的领域信息检索[D]. 北京: 中国农业科学院, 2011. | [22] | (Xiong Xia.Domain Information Retrieval Based on Term Relationships of Thesaurus [D]. Beijing: Chinese Academy of Agricultural Sciences, 2011.) | [23] | Hulth A, Karlgren J, Jonsson A, et al.Automatic Keyword Extraction Using Domain Knowledge [C]. In: Proceedings of International Conference on Intelligent Text Processing and Computational Linguistics, 2001: 472-482. | [24] | Coursey K H, Mihalcea R, Moen W E.Automatic Keyword Extraction for Learning Object Repositories[J]. Proceedings of the American Society for Information Science & Technology, 2009, 45(1): 1-10. | [25] | Li G, Wang H.Improved Automatic Keyword Extraction Based on TextRank Using Domain Knowledge [C]. In: Proceedings of the 3rd CCF Conference, NLPCC 2014, Shenzhen, China. 2014, 496: 403-413. | [26] | Jiang B, Xun E, Qi J.A Domain Independent Approach for Extracting Terms from Research Papers[C]. In: Proceedings of the Australasian Database Conference. Springer International Publishing, 2015: 155-166. | [27] | Lopes L, Fernandes P, Vieira R.Estimating Term Domain Relevance Through Term Frequency, Disjoint Corpora Frequency-TF-DCF[J]. Knowledge-Based Systems, 2016, 97: 237-249. | [28] | 詹恒飞, 杨岳湘, 方宏. Nutch分布式网络爬虫研究与优化[J]. 计算机科学与探索, 2011, 5(1): 68-74. | [28] | (Zhan Hengfei, Yang Yuexiang, Fang Hong.Research and Optimization of Nutch Distributed Crawler[J]. Journal of Frontiers of Computer Science & Technology, 2011, 5(1): 68-74.) | [29] | 卢萍, 蔡群. 中文科技论文关键词的标引[J]. 广州医学院学报, 2000, 28(2): 93-94. | [29] | (Lu Ping, Cai Qun.Keyword Indexing of Chinese Scientific and Technical Paper[J]. Academic Journal of Guangzhou Medical College, 2000, 28(2): 93-94.) | [30] | Guo C, Lu X.Selecting Publication Keywords for Domain Analysis in Bibliometrics: A Comparison of Three Methods[J]. Journal of Informetrics, 2016, 10(1): 212-223. | [31] | 洪道广. Google Scholar的数据整合研究[J]. 现代情报, 2010, 30(7): 39-41. | [31] | (Hong Daoguang.Research on Data Integration of Google Scholar[J]. Modern Information, 2010, 30(7): 39-41.) | [32] | Rossi R G, Maracini R M, Rezende S O.Analysis of Domain Independent Statistical Keyword Extraction Methods for Incremental Clustering[J]. Learning and Nonlinear Models, 2014, 12(1): 17-37. |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|