Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (6): 50-56    DOI: 10.11925/infotech.2096-3467.2018.1390
Current Issue | Archive | Adv Search |
Semantic Matching for Sci-Tech Novelty Retrieval
Junliang Yao,Xiaoqiu Le()
(National Science Library, Chinese Academy of Sciences, Beijing 100190, China);(Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China)
Download: PDF(530 KB)   HTML ( 2
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to identify semantics similar to the novelty points from preliminary searching results, aiming to retrieve needed journal articles or patents automatically. [Methods] First, we designed a deep multi-task hierarchical classification model based on Bi-GRU-ATT. Then, we trained several different hierarchical classification models using International Patent Classification Table (IPC) categories and patents. Third, we used a small amount of paper data to fine-tune the model for papers and patents. Finally, we compared the semantic categories of new points and candidate records to collect the matching ones. [Results] With two-level classification of patents under IPC (E21B), the new model’s precisions were 82.37% and 73.55% respectively, which were better than the benchmark models. For real novelty search points data, the precision of semantic matching was 88.13%, which was 15.16% higher than that of TF-IDF. [Limitations] Only examined our model with a small amount of IPC categories . [Conclusions] The proposed method improves the semantic matching of novelty search points.

Key wordsSci-tech Novelty Retrieval      Semantic Matching      Multitask Learning      Bi-GRU-ATT     
Received: 10 December 2018      Published: 15 August 2019

Cite this article:

Junliang Yao,Xiaoqiu Le. Semantic Matching for Sci-Tech Novelty Retrieval. Data Analysis and Knowledge Discovery, 2019, 3(6): 50-56.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.1390     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2019/V3/I6/50

[1] 李凤侠, 战玉华, 赵军平, 等. 清华大学科技查新系统的开发与实践[J]. 大学图书馆学报, 2014, 32(2): 33-38.
[1] (Li Fengxia, Zhan Yuhua, Zhao Junping, et al.Design and Practice of Tsinghua University Sci-Tech Novelty Search System[J]. Journal of Academic Libraries, 2014, 32(2): 33-38.)
[2] 王培霞, 余海, 陈力, 等. 科技查新中检索词智能抽取系统的设计与实现[J]. 现代图书情报技术, 2016(11): 82-93.
[2] (Wang Peixia, Yu Hai, Chen Li, et al.Using Intelligent System to Extract Search Terms for Sci-Tech Novelty Retrieval[J]. New Technology of Library and Information Service, 2016(11): 82-93.)
[3] 王子璇, 乐小虬, 何远标. 基于WMD语义相似度的TextRank改进算法识别论文核心主题句研究[J]. 数据分析与知识发现, 2017, 1(4): 5-12.
[3] (Wang Zixuan, Le Xiaoqiu, He Yuanbiao.Recognizing Core Topic Sentences with Improved TextRank Algorithm Based on WMD Semantic Similarity[J]. Data Analysis and Knowledge Discovery, 2017, 1(4): 5-12.)
[4] Kusner M J, Sun Y, Kolkin N I, et al.From Word Embeddings to Document Distances[C]// Proceedings of the 32nd International Conference on International Conference on Machine Learning. 2015: 957-966.
[5] Huang P S, He X, Gao J, et al.Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data[C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. 2013: 2333-2338.
[6] 李欣, 王静静, 杨梓, 等. 基于SAO结构语义分析的新兴技术识别研究[J]. 情报杂志, 2016, 35(3): 80-84.
[6] (Li Xin, Wang Jingjing, Yang Zi, et al.Identifying Emerging Technologies Based on Subject-Action-Object[J]. Journal of Intelligence, 2016, 35(3): 80-84.)
[7] 何喜军, 马珊, 武玉英. 基于本体和SAO结构的线上技术供需信息语义匹配研究[J]. 情报科学, 2018, 36(11): 95-100.
[7] (He Xijun, Ma Shan, Wu Yuying.Research on Semantic Matching for Online Technology Supply and Demand Information Based on Ontology and SAO Structure[J]. Information Science, 2018, 36(11): 95-100.)
[8] Joulin A, Grave E, Bojanowski P, et al.Bag of Tricks for Efficient Text Classification[OL]. arXiv Preprint, arXiv: 1607.01759.
[9] Kim Y.Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408.5882.
[10] Li F, Zhang M, Fu G, et al.A Bi-LSTM-RNN Model for Relation Classification Using Low-Cost Sequence Features[OL]. arXiv Preprint, arXiv: 1608.07720.
[11] Pappas N, Popescu-Belis A.Multilingual Hierarchical Attention Networks for Document Classification[OL]. arXiv Preprint, arXiv: 1707.00896.
[12] Misra I, Shrivastava A, Gupta A, et al.Cross-Stitch Networks for Multi-task Learning[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 3994-4003.
[13] Mikolov T, Sutskever I, Chen K, et al.Distributed Representations of Words and Phrases and Their Compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26: 3111-3119.
[14] Cho K, Van Merrienboer B, Gulcehre C, et al.Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation[OL]. arXiv Preprint, arXiv: 1406.1078.
[15] Raffel C, Ellis D P W. Feed-Forward Networks with Attention can Solve Some Long-Term Memory Problems[OL]. arXiv Preprint, arXiv: 1512.08756.
[16] Howard J, Ruder S.Universal Language Model Fine-tuning for Text Classification[OL]. arXiv Preprint, arXiv: 1801.06146.
[1] Wang Peixia,Yu Hai,Chen Li,Wang Yongji. Using Intelligent System to Extract Search Terms for Sci-Tech Novelty Retrieval[J]. 现代图书情报技术, 2016, 32(11): 82-93.
[2] Hao Hui. A Duplicate Removal Algorithm of Cross-database Search Based on Sci-tech Novelty Retrieval[J]. 现代图书情报技术, 2015, 31(1): 89-95.
[3] Ji Shanshan, Li Chunwang. Study on Context-aware Service Mashup Implementation[J]. 现代图书情报技术, 2012, (12): 21-26.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn