Please wait a minute...
New Technology of Library and Information Service  2016, Vol. 32 Issue (11): 82-93    DOI: 10.11925/infotech.1003-3513.2016.11.10
Orginal Article Current Issue | Archive | Adv Search |
Using Intelligent System to Extract Search Terms for Sci-Tech Novelty Retrieval
Wang Peixia1,2,Yu Hai1,2,Chen Li1,2,Wang Yongji1()
1Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
2University of Chinese Academy of Sciences, Beijing 100049, China
Download: PDF(655 KB)   HTML ( 47
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper aims to identifying the search terms more effectively in sci-tech novelty retrieval, which could reduce the subjectivity, heavy workload, de-normalization and time-consuming issues facing the manual methods. [Context] We used the corpus generated by the sci-tech novelty retrieval as the source of domain knowledge to extract search terms. Then, we discussed the relationship between the corpus and the keyword extraction. [Methods] We proposed an incremental iterative method to extract keywords from the sci-tech novelty retrieval project with the help of domain feature expansion. [Results] Compared to search terms from the real world sci-tech novelty retrieval, the recall rates of the 10 search terms extracted by the new method reached 80%. [Conclusions] The proposed method could identify most keywords and then improve the efficiency and effectiveness of the novelty retrieval tasks.

Key wordsSci-Tech novelty retrieval      Search terms      extraction      Online crawler     
Received: 28 July 2016      Published: 20 December 2016

Cite this article:

Wang Peixia,Yu Hai,Chen Li,Wang Yongji. Using Intelligent System to Extract Search Terms for Sci-Tech Novelty Retrieval. New Technology of Library and Information Service, 2016, 32(11): 82-93.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2016.11.10     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2016/V32/I11/82

[1] 黄江玲. 影响科技查新质量的重要因子分析[J]. 情报探索, 2008(8): 67-68.
[1] (Huang Jiangling.Analysis of Important Factors Affecting the Quality of Science and Technology Novelty Search[J]. Information Research, 2008(8): 67-68.)
[2] 曹欢增. 提高科技文献查全率的几项措施[J]. 科技情报开发与经济, 2008, 18(32): 72-74.
[2] (Cao Huanzeng.Some Measures for Increasing the Recall Ratio of Sci-tech Literatures[J]. Sci-Tech Information Development & Economy, 2008, 18(32): 72-74.)
[3] 陈予琳. 关键词检索方法在科技查新中的应用研究[J]. 河南师范大学学报: 自然科学版, 2011, 39(3): 171-173.
[3] (Chen Yulin.Keyword Search Method Application Research on Science and Technology Novelty Check[J]. Journal of Henan Normal University: Natural Science Edition, 2011, 39(3): 171-173.)
[4] 张柏秋, 吴晓鐄. 科技查新检索中的关键词选择[J]. 情报科学, 2008, 26(9): 1344-1348.
[4] (Zhang Baiqiu, Wu Xiaohuang.Keywords Selection in Science Technology Novelty Retrieval[J]. Information Science, 2008, 26(9): 1344-1348.)
[5] Hasan K, Ng V.Automatic Keyphrase Extraction: A Survey of the State of the Art [C]. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014: 1262-1273.
[6] Frank E, Paynter G W, Witten I H, et al.Domain-specific Learning Algorithms for Keyphrase Extraction [C]. In: Proceedings of the 16th International Conference on Artificial Intelligence (IJCAI-99), 1999: 668-673.
[7] Turney P D.Learning Algorithms for Keyphrase Extraction[J]. Information Retrieval, 2002, 2(4): 303-336.
[8] Nguyen T D, Kan M-Y.Keyphrase Extraction in Scienti?c Publications [C]. In: Proceedings of International Conference on Asian Digital Libraries (ICADL), 2007: 317-326.
[9] Lopez P, Romary L.HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID[C]. In: Proceedings of International Workshop on Semantic Evaluation. Association for Computational Linguistics, 2010: 248-251.
[10] Krapivin M, Autayeu M, Marchese M, et al.Improving Machine Learning Approaches for Keyphrases Extraction from Scienti?c Documents with Natural Language Knowledge [C]. In: Proceedings of the Joint JCDL/ICADL’ International Digital Libraries Conference, 2010: 102-111.
[11] Jiang X, Hu Y, Li H.A Ranking Approach to Keyphrase Extraction [C]. In: Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009: 756-757.
[12] Turney P D.Coherent Keyphrase Extraction via Web Mining[C]. In: Proceedings of the 18th International Joint Conference on Arti?cial Intelligence, 2003: 434-439.
[13] Kumar N, Srinathan K.Automatic Keyphrase Extraction from Scientific Documents Using N-gram Filtration Technique [C]. In: Proceedings of the 8th ACM Symposium on Document Engineering. 2008: 199-208.
[14] 潘丽敏, 吴军华, 林萌, 等. 融合多特征的中文关键词提取方法[J]. 信息网络安全, 2014(8): 40-44.
[14] (Pan Limin, Wu Junhua, Lin Meng, et al.Algorithm of Chinese Keywords Extraction Based on Multi-feature[J]. Netinfo Security, 2014(8): 40-44.)
[15] Hulth A.Improved Automatic Keyword Extraction Given More Linguistic Knowledge [C]. In: Proceedings of Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2003: 216-223.
[16] Pasquier C.Task 5: Single Document Keyphrase Extraction Using Sentence Clustering and Latent Dirichlet Allocation [C]. In: Proceedings of the 5th International Workshop on Semantic Evaluation. Association for Computational Linguistics, 2010: 154-157.
[17] 石晶, 李万龙. 基于LDA模型的主题词抽取方法[J]. 计算机工程, 2010, 36(19): 81-83.
[17] (Shi Jing, Li Wanlong.Topic Words Extraction Method Based on LDA Model[J]. Computer Engineering, 2010, 36(19): 81-83.)
[18] 刘俊, 邹东升, 邢欣来, 等. 基于主题特征的关键词抽取[J]. 计算机应用研究, 2012, 29(11): 4224-4227.
[18] (Liu Jun, Zou Dongsheng, Xing Xinlai, et al.Keyphrase Extraction Based on Topic Feature[J]. Application Research of Computers, 2012, 29(11): 4224-4227.)
[19] Mihalcea R, Tarau P.TextRank: Bringing Order into Texts [C]. In: Proceedings of EMNLP-04 and the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
[20] Page L, Rrin S, Motwani R, et al.The PageRank Citation Ranking: Bringing Order to the Web [C]. In: Proceedings of the 7th International World Wide Web Conference. 1998: 1-14.
[21] 韩其琛, 李冬梅. 基于叙词表的林业信息语义检索模型[J]. 计算机科学与探索, 2016, 10(1): 122-129.
[21] (Han Qichen, Li Dongmei.Semantic Model with Thesaurus for Forestry Information Retrieval[J]. Journal of Frontiers of Computer Science & Technology, 2016, 10(1): 122-129.)
[22] 熊霞. 基于叙词表词间关系的领域信息检索[D]. 北京: 中国农业科学院, 2011.
[22] (Xiong Xia.Domain Information Retrieval Based on Term Relationships of Thesaurus [D]. Beijing: Chinese Academy of Agricultural Sciences, 2011.)
[23] Hulth A, Karlgren J, Jonsson A, et al.Automatic Keyword Extraction Using Domain Knowledge [C]. In: Proceedings of International Conference on Intelligent Text Processing and Computational Linguistics, 2001: 472-482.
[24] Coursey K H, Mihalcea R, Moen W E.Automatic Keyword Extraction for Learning Object Repositories[J]. Proceedings of the American Society for Information Science & Technology, 2009, 45(1): 1-10.
[25] Li G, Wang H.Improved Automatic Keyword Extraction Based on TextRank Using Domain Knowledge [C]. In: Proceedings of the 3rd CCF Conference, NLPCC 2014, Shenzhen, China. 2014, 496: 403-413.
[26] Jiang B, Xun E, Qi J.A Domain Independent Approach for Extracting Terms from Research Papers[C]. In: Proceedings of the Australasian Database Conference. Springer International Publishing, 2015: 155-166.
[27] Lopes L, Fernandes P, Vieira R.Estimating Term Domain Relevance Through Term Frequency, Disjoint Corpora Frequency-TF-DCF[J]. Knowledge-Based Systems, 2016, 97: 237-249.
[28] 詹恒飞, 杨岳湘, 方宏. Nutch分布式网络爬虫研究与优化[J]. 计算机科学与探索, 2011, 5(1): 68-74.
[28] (Zhan Hengfei, Yang Yuexiang, Fang Hong.Research and Optimization of Nutch Distributed Crawler[J]. Journal of Frontiers of Computer Science & Technology, 2011, 5(1): 68-74.)
[29] 卢萍, 蔡群. 中文科技论文关键词的标引[J]. 广州医学院学报, 2000, 28(2): 93-94.
[29] (Lu Ping, Cai Qun.Keyword Indexing of Chinese Scientific and Technical Paper[J]. Academic Journal of Guangzhou Medical College, 2000, 28(2): 93-94.)
[30] Guo C, Lu X.Selecting Publication Keywords for Domain Analysis in Bibliometrics: A Comparison of Three Methods[J]. Journal of Informetrics, 2016, 10(1): 212-223.
[31] 洪道广. Google Scholar的数据整合研究[J]. 现代情报, 2010, 30(7): 39-41.
[31] (Hong Daoguang.Research on Data Integration of Google Scholar[J]. Modern Information, 2010, 30(7): 39-41.)
[32] Rossi R G, Maracini R M, Rezende S O.Analysis of Domain Independent Statistical Keyword Extraction Methods for Incremental Clustering[J]. Learning and Nonlinear Models, 2014, 12(1): 17-37.
[1] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[2] Xiuxian Wen,Jian Xu. Research on Product Characteristics Extraction and Hedonic Price Based on User Comments[J]. 数据分析与知识发现, 2019, 3(7): 42-51.
[3] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[4] Junliang Yao,Xiaoqiu Le. Semantic Matching for Sci-Tech Novelty Retrieval[J]. 数据分析与知识发现, 2019, 3(6): 50-56.
[5] Ruihua Qi,Junyi Zhou,Xu Guo,Caihong Liu. Extracting Book Review Topics with Knowledge Base[J]. 数据分析与知识发现, 2019, 3(6): 83-91.
[6] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[7] Yuemin Wu,Ganggui Ding,Bin Hu. Extracting Relationship of Agricultural Financial Texts with Attention Mechanism[J]. 数据分析与知识发现, 2019, 3(5): 86-92.
[8] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[9] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[10] Zhen Zhang,Jin Zeng. Extracting Keywords from User Comments: Case Study of Meituan[J]. 数据分析与知识发现, 2019, 3(3): 36-44.
[11] Shengchun Ding,Linlin Hou,Ying Wang. Product Knowledge Map Construction Based on the E-commerce Data[J]. 数据分析与知识发现, 2019, 3(3): 45-56.
[12] Yue Yuan,Dongbo Wang,Shuiqing Huang,Bin Li. The Comparative Study of Different Tagging Sets on Entity Extraction of Classical Books[J]. 数据分析与知识发现, 2019, 3(3): 57-65.
[13] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[14] Ying Wang,Li Qian,Jing Xie,Zhijun Chang,Beibei Kong. Building Knowledge Graph with Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(1): 15-26.
[15] Li Yu,Li Qian,Changlei Fu,Huaming Zhao. Extracting Fine-grained Knowledge Units from Texts with Deep Learning[J]. 数据分析与知识发现, 2019, 3(1): 38-45.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn