|
|
Semantic Matching for Sci-Tech Novelty Retrieval |
Junliang Yao,Xiaoqiu Le() |
(National Science Library, Chinese Academy of Sciences, Beijing 100190, China);(Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China) |
|
|
Abstract [Objective] This paper tries to identify semantics similar to the novelty points from preliminary searching results, aiming to retrieve needed journal articles or patents automatically. [Methods] First, we designed a deep multi-task hierarchical classification model based on Bi-GRU-ATT. Then, we trained several different hierarchical classification models using International Patent Classification Table (IPC) categories and patents. Third, we used a small amount of paper data to fine-tune the model for papers and patents. Finally, we compared the semantic categories of new points and candidate records to collect the matching ones. [Results] With two-level classification of patents under IPC (E21B), the new model’s precisions were 82.37% and 73.55% respectively, which were better than the benchmark models. For real novelty search points data, the precision of semantic matching was 88.13%, which was 15.16% higher than that of TF-IDF. [Limitations] Only examined our model with a small amount of IPC categories . [Conclusions] The proposed method improves the semantic matching of novelty search points.
|
Received: 10 December 2018
Published: 15 August 2019
|
[1] | 李凤侠, 战玉华, 赵军平, 等. 清华大学科技查新系统的开发与实践[J]. 大学图书馆学报, 2014, 32(2): 33-38. | [1] | (Li Fengxia, Zhan Yuhua, Zhao Junping, et al.Design and Practice of Tsinghua University Sci-Tech Novelty Search System[J]. Journal of Academic Libraries, 2014, 32(2): 33-38.) | [2] | 王培霞, 余海, 陈力, 等. 科技查新中检索词智能抽取系统的设计与实现[J]. 现代图书情报技术, 2016(11): 82-93. | [2] | (Wang Peixia, Yu Hai, Chen Li, et al.Using Intelligent System to Extract Search Terms for Sci-Tech Novelty Retrieval[J]. New Technology of Library and Information Service, 2016(11): 82-93.) | [3] | 王子璇, 乐小虬, 何远标. 基于WMD语义相似度的TextRank改进算法识别论文核心主题句研究[J]. 数据分析与知识发现, 2017, 1(4): 5-12. | [3] | (Wang Zixuan, Le Xiaoqiu, He Yuanbiao.Recognizing Core Topic Sentences with Improved TextRank Algorithm Based on WMD Semantic Similarity[J]. Data Analysis and Knowledge Discovery, 2017, 1(4): 5-12.) | [4] | Kusner M J, Sun Y, Kolkin N I, et al.From Word Embeddings to Document Distances[C]// Proceedings of the 32nd International Conference on International Conference on Machine Learning. 2015: 957-966. | [5] | Huang P S, He X, Gao J, et al.Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data[C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. 2013: 2333-2338. | [6] | 李欣, 王静静, 杨梓, 等. 基于SAO结构语义分析的新兴技术识别研究[J]. 情报杂志, 2016, 35(3): 80-84. | [6] | (Li Xin, Wang Jingjing, Yang Zi, et al.Identifying Emerging Technologies Based on Subject-Action-Object[J]. Journal of Intelligence, 2016, 35(3): 80-84.) | [7] | 何喜军, 马珊, 武玉英. 基于本体和SAO结构的线上技术供需信息语义匹配研究[J]. 情报科学, 2018, 36(11): 95-100. | [7] | (He Xijun, Ma Shan, Wu Yuying.Research on Semantic Matching for Online Technology Supply and Demand Information Based on Ontology and SAO Structure[J]. Information Science, 2018, 36(11): 95-100.) | [8] | Joulin A, Grave E, Bojanowski P, et al.Bag of Tricks for Efficient Text Classification[OL]. arXiv Preprint, arXiv: 1607.01759. | [9] | Kim Y.Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408.5882. | [10] | Li F, Zhang M, Fu G, et al.A Bi-LSTM-RNN Model for Relation Classification Using Low-Cost Sequence Features[OL]. arXiv Preprint, arXiv: 1608.07720. | [11] | Pappas N, Popescu-Belis A.Multilingual Hierarchical Attention Networks for Document Classification[OL]. arXiv Preprint, arXiv: 1707.00896. | [12] | Misra I, Shrivastava A, Gupta A, et al.Cross-Stitch Networks for Multi-task Learning[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 3994-4003. | [13] | Mikolov T, Sutskever I, Chen K, et al.Distributed Representations of Words and Phrases and Their Compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26: 3111-3119. | [14] | Cho K, Van Merrienboer B, Gulcehre C, et al.Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation[OL]. arXiv Preprint, arXiv: 1406.1078. | [15] | Raffel C, Ellis D P W. Feed-Forward Networks with Attention can Solve Some Long-Term Memory Problems[OL]. arXiv Preprint, arXiv: 1512.08756. | [16] | Howard J, Ruder S.Universal Language Model Fine-tuning for Text Classification[OL]. arXiv Preprint, arXiv: 1801.06146. |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|