Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (6): 50-56    DOI: 10.11925/infotech.2096-3467.2018.1390
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
科技查新查新点语义匹配方法研究
姚俊良,乐小虬()
(中国科学院文献情报中心 北京 100190);(中国科学院大学经济与管理学院图书情报与档案管理系 北京 100190)
Semantic Matching for Sci-Tech Novelty Retrieval
Junliang Yao,Xiaoqiu Le()
(National Science Library, Chinese Academy of Sciences, Beijing 100190, China);(Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China)
全文: PDF(530 KB)   HTML ( 3
输出: BibTeX | EndNote (RIS)      
摘要 

目的】从科技查新候选检索结果中自动筛选与查新点语义相近的文献(期刊论文、专利)。【方法】设计基于Bi-GRU-ATT的深度多任务层次分类模型, 利用国际专利分类表(IPC)类别及专利数据, 训练多个不同层次分类模型, 利用少量论文数据进行Fine-tuning, 使之适用于论文和专利两种类别数据, 依照先父后子的次序识别查新点及候选记录的语义类别, 从而判定二者间的语义匹配度。【结果】在E21B专利分类下的两级分类模型中, 准确率分别达到82.37%和73.55%, 优于其他基准模型; 在使用真实查新点实验数据的语义匹配实验中, 语义匹配的精度达到88.13%, 比基准检索模型(TF-IDF)提高15.16%。【局限】仅在少量类别中开展训练, 还没有扩展到IPC所有分类中。【结论】初步实验表明该方法能够在一定程度上提升查新点语义匹配效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
姚俊良
乐小虬
关键词 科技查新语义匹配多任务学习Bi-GRU-ATT    
Abstract

[Objective] This paper tries to identify semantics similar to the novelty points from preliminary searching results, aiming to retrieve needed journal articles or patents automatically. [Methods] First, we designed a deep multi-task hierarchical classification model based on Bi-GRU-ATT. Then, we trained several different hierarchical classification models using International Patent Classification Table (IPC) categories and patents. Third, we used a small amount of paper data to fine-tune the model for papers and patents. Finally, we compared the semantic categories of new points and candidate records to collect the matching ones. [Results] With two-level classification of patents under IPC (E21B), the new model’s precisions were 82.37% and 73.55% respectively, which were better than the benchmark models. For real novelty search points data, the precision of semantic matching was 88.13%, which was 15.16% higher than that of TF-IDF. [Limitations] Only examined our model with a small amount of IPC categories . [Conclusions] The proposed method improves the semantic matching of novelty search points.

Key wordsSci-tech Novelty Retrieval    Semantic Matching    Multitask Learning    Bi-GRU-ATT
收稿日期: 2018-12-10     
引用本文:   
姚俊良,乐小虬. 科技查新查新点语义匹配方法研究[J]. 数据分析与知识发现, 2019, 3(6): 50-56.
Junliang Yao,Xiaoqiu Le. Semantic Matching for Sci-Tech Novelty Retrieval. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2018.1390.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.1390
[1] 李凤侠, 战玉华, 赵军平, 等. 清华大学科技查新系统的开发与实践[J]. 大学图书馆学报, 2014, 32(2): 33-38.
[1] (Li Fengxia, Zhan Yuhua, Zhao Junping, et al.Design and Practice of Tsinghua University Sci-Tech Novelty Search System[J]. Journal of Academic Libraries, 2014, 32(2): 33-38.)
[2] 王培霞, 余海, 陈力, 等. 科技查新中检索词智能抽取系统的设计与实现[J]. 现代图书情报技术, 2016(11): 82-93.
[2] (Wang Peixia, Yu Hai, Chen Li, et al.Using Intelligent System to Extract Search Terms for Sci-Tech Novelty Retrieval[J]. New Technology of Library and Information Service, 2016(11): 82-93.)
[3] 王子璇, 乐小虬, 何远标. 基于WMD语义相似度的TextRank改进算法识别论文核心主题句研究[J]. 数据分析与知识发现, 2017, 1(4): 5-12.
[3] (Wang Zixuan, Le Xiaoqiu, He Yuanbiao.Recognizing Core Topic Sentences with Improved TextRank Algorithm Based on WMD Semantic Similarity[J]. Data Analysis and Knowledge Discovery, 2017, 1(4): 5-12.)
[4] Kusner M J, Sun Y, Kolkin N I, et al.From Word Embeddings to Document Distances[C]// Proceedings of the 32nd International Conference on International Conference on Machine Learning. 2015: 957-966.
[5] Huang P S, He X, Gao J, et al.Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data[C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. 2013: 2333-2338.
[6] 李欣, 王静静, 杨梓, 等. 基于SAO结构语义分析的新兴技术识别研究[J]. 情报杂志, 2016, 35(3): 80-84.
[6] (Li Xin, Wang Jingjing, Yang Zi, et al.Identifying Emerging Technologies Based on Subject-Action-Object[J]. Journal of Intelligence, 2016, 35(3): 80-84.)
[7] 何喜军, 马珊, 武玉英. 基于本体和SAO结构的线上技术供需信息语义匹配研究[J]. 情报科学, 2018, 36(11): 95-100.
[7] (He Xijun, Ma Shan, Wu Yuying.Research on Semantic Matching for Online Technology Supply and Demand Information Based on Ontology and SAO Structure[J]. Information Science, 2018, 36(11): 95-100.)
[8] Joulin A, Grave E, Bojanowski P, et al.Bag of Tricks for Efficient Text Classification[OL]. arXiv Preprint, arXiv: 1607.01759.
[9] Kim Y.Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408.5882.
[10] Li F, Zhang M, Fu G, et al.A Bi-LSTM-RNN Model for Relation Classification Using Low-Cost Sequence Features[OL]. arXiv Preprint, arXiv: 1608.07720.
[11] Pappas N, Popescu-Belis A.Multilingual Hierarchical Attention Networks for Document Classification[OL]. arXiv Preprint, arXiv: 1707.00896.
[12] Misra I, Shrivastava A, Gupta A, et al.Cross-Stitch Networks for Multi-task Learning[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 3994-4003.
[13] Mikolov T, Sutskever I, Chen K, et al.Distributed Representations of Words and Phrases and Their Compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26: 3111-3119.
[14] Cho K, Van Merrienboer B, Gulcehre C, et al.Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation[OL]. arXiv Preprint, arXiv: 1406.1078.
[15] Raffel C, Ellis D P W. Feed-Forward Networks with Attention can Solve Some Long-Term Memory Problems[OL]. arXiv Preprint, arXiv: 1512.08756.
[16] Howard J, Ruder S.Universal Language Model Fine-tuning for Text Classification[OL]. arXiv Preprint, arXiv: 1801.06146.
[1] 王培霞,余海,陈力,王永吉. 科技查新中检索词智能抽取系统的设计与实现*[J]. 现代图书情报技术, 2016, 32(11): 82-93.
[2] 郝慧. 一种基于科技查新的跨库检索去重算法[J]. 现代图书情报技术, 2015, 31(1): 89-95.
[3] 李广利, 李书宁. 科技查新报告自动生成软件的设计与实现[J]. 现代图书情报技术, 2013, 29(2): 82-87.
[4] 纪姗姗, 李春旺. 情境感知的集成融汇服务方法研究[J]. 现代图书情报技术, 2012, (12): 21-26.
[5] 于婷,宋宇宁 . 计算机辅助软件在科技查新工作中的应用[J]. 现代图书情报技术, 2006, 1(12): 85-88.
[6] 马景娣,田稷. 基于J2EE的科技查新综合信息系统的设计与实现[J]. 现代图书情报技术, 2004, 20(8): 77-78.
[7] 周国华,邵正荣. 建立查新工作网络管理平台的尝试[J]. 现代图书情报技术, 2004, 20(6): 64-66.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn