[Objective] This paper aims to identifying the search terms more effectively in sci-tech novelty retrieval, which could reduce the subjectivity, heavy workload, de-normalization and time-consuming issues facing the manual methods. [Context] We used the corpus generated by the sci-tech novelty retrieval as the source of domain knowledge to extract search terms. Then, we discussed the relationship between the corpus and the keyword extraction. [Methods] We proposed an incremental iterative method to extract keywords from the sci-tech novelty retrieval project with the help of domain feature expansion. [Results] Compared to search terms from the real world sci-tech novelty retrieval, the recall rates of the 10 search terms extracted by the new method reached 80%. [Conclusions] The proposed method could identify most keywords and then improve the efficiency and effectiveness of the novelty retrieval tasks.
王培霞,余海,陈力,王永吉. 科技查新中检索词智能抽取系统的设计与实现*[J]. 现代图书情报技术, 2016, 32(11): 82-93.
Wang Peixia,Yu Hai,Chen Li,Wang Yongji. Using Intelligent System to Extract Search Terms for Sci-Tech Novelty Retrieval. New Technology of Library and Information Service, 2016, 32(11): 82-93.
(Huang Jiangling.Analysis of Important Factors Affecting the Quality of Science and Technology Novelty Search[J]. Information Research, 2008(8): 67-68.)
(Cao Huanzeng.Some Measures for Increasing the Recall Ratio of Sci-tech Literatures[J]. Sci-Tech Information Development & Economy, 2008, 18(32): 72-74.)
(Chen Yulin.Keyword Search Method Application Research on Science and Technology Novelty Check[J]. Journal of Henan Normal University: Natural Science Edition, 2011, 39(3): 171-173.)
(Zhang Baiqiu, Wu Xiaohuang.Keywords Selection in Science Technology Novelty Retrieval[J]. Information Science, 2008, 26(9): 1344-1348.)
[5]
Hasan K, Ng V.Automatic Keyphrase Extraction: A Survey of the State of the Art [C]. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014: 1262-1273.
[6]
Frank E, Paynter G W, Witten I H, et al.Domain-specific Learning Algorithms for Keyphrase Extraction [C]. In: Proceedings of the 16th International Conference on Artificial Intelligence (IJCAI-99), 1999: 668-673.
[7]
Turney P D.Learning Algorithms for Keyphrase Extraction[J]. Information Retrieval, 2002, 2(4): 303-336.
[8]
Nguyen T D, Kan M-Y.Keyphrase Extraction in Scienti?c Publications [C]. In: Proceedings of International Conference on Asian Digital Libraries (ICADL), 2007: 317-326.
[9]
Lopez P, Romary L.HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID[C]. In: Proceedings of International Workshop on Semantic Evaluation. Association for Computational Linguistics, 2010: 248-251.
[10]
Krapivin M, Autayeu M, Marchese M, et al.Improving Machine Learning Approaches for Keyphrases Extraction from Scienti?c Documents with Natural Language Knowledge [C]. In: Proceedings of the Joint JCDL/ICADL’ International Digital Libraries Conference, 2010: 102-111.
[11]
Jiang X, Hu Y, Li H.A Ranking Approach to Keyphrase Extraction [C]. In: Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009: 756-757.
[12]
Turney P D.Coherent Keyphrase Extraction via Web Mining[C]. In: Proceedings of the 18th International Joint Conference on Arti?cial Intelligence, 2003: 434-439.
[13]
Kumar N, Srinathan K.Automatic Keyphrase Extraction from Scientific Documents Using N-gram Filtration Technique [C]. In: Proceedings of the 8th ACM Symposium on Document Engineering. 2008: 199-208.
(Pan Limin, Wu Junhua, Lin Meng, et al.Algorithm of Chinese Keywords Extraction Based on Multi-feature[J]. Netinfo Security, 2014(8): 40-44.)
[15]
Hulth A.Improved Automatic Keyword Extraction Given More Linguistic Knowledge [C]. In: Proceedings of Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2003: 216-223.
[16]
Pasquier C.Task 5: Single Document Keyphrase Extraction Using Sentence Clustering and Latent Dirichlet Allocation [C]. In: Proceedings of the 5th International Workshop on Semantic Evaluation. Association for Computational Linguistics, 2010: 154-157.
(Liu Jun, Zou Dongsheng, Xing Xinlai, et al.Keyphrase Extraction Based on Topic Feature[J]. Application Research of Computers, 2012, 29(11): 4224-4227.)
[19]
Mihalcea R, Tarau P.TextRank: Bringing Order into Texts [C]. In: Proceedings of EMNLP-04 and the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
[20]
Page L, Rrin S, Motwani R, et al.The PageRank Citation Ranking: Bringing Order to the Web [C]. In: Proceedings of the 7th International World Wide Web Conference. 1998: 1-14.
(Han Qichen, Li Dongmei.Semantic Model with Thesaurus for Forestry Information Retrieval[J]. Journal of Frontiers of Computer Science & Technology, 2016, 10(1): 122-129.)
[22]
熊霞. 基于叙词表词间关系的领域信息检索[D]. 北京: 中国农业科学院, 2011.
[22]
(Xiong Xia.Domain Information Retrieval Based on Term Relationships of Thesaurus [D]. Beijing: Chinese Academy of Agricultural Sciences, 2011.)
[23]
Hulth A, Karlgren J, Jonsson A, et al.Automatic Keyword Extraction Using Domain Knowledge [C]. In: Proceedings of International Conference on Intelligent Text Processing and Computational Linguistics, 2001: 472-482.
[24]
Coursey K H, Mihalcea R, Moen W E.Automatic Keyword Extraction for Learning Object Repositories[J]. Proceedings of the American Society for Information Science & Technology, 2009, 45(1): 1-10.
[25]
Li G, Wang H.Improved Automatic Keyword Extraction Based on TextRank Using Domain Knowledge [C]. In: Proceedings of the 3rd CCF Conference, NLPCC 2014, Shenzhen, China. 2014, 496: 403-413.
[26]
Jiang B, Xun E, Qi J.A Domain Independent Approach for Extracting Terms from Research Papers[C]. In: Proceedings of the Australasian Database Conference. Springer International Publishing, 2015: 155-166.
[27]
Lopes L, Fernandes P, Vieira R.Estimating Term Domain Relevance Through Term Frequency, Disjoint Corpora Frequency-TF-DCF[J]. Knowledge-Based Systems, 2016, 97: 237-249.
(Zhan Hengfei, Yang Yuexiang, Fang Hong.Research and Optimization of Nutch Distributed Crawler[J]. Journal of Frontiers of Computer Science & Technology, 2011, 5(1): 68-74.)
(Lu Ping, Cai Qun.Keyword Indexing of Chinese Scientific and Technical Paper[J]. Academic Journal of Guangzhou Medical College, 2000, 28(2): 93-94.)
[30]
Guo C, Lu X.Selecting Publication Keywords for Domain Analysis in Bibliometrics: A Comparison of Three Methods[J]. Journal of Informetrics, 2016, 10(1): 212-223.
[31]
洪道广. Google Scholar的数据整合研究[J]. 现代情报, 2010, 30(7): 39-41.
[31]
(Hong Daoguang.Research on Data Integration of Google Scholar[J]. Modern Information, 2010, 30(7): 39-41.)
[32]
Rossi R G, Maracini R M, Rezende S O.Analysis of Domain Independent Statistical Keyword Extraction Methods for Incremental Clustering[J]. Learning and Nonlinear Models, 2014, 12(1): 17-37.