|
|
GKTR Retrieval Model for Engineering Consulting Reports with Graph Convolution Topological and Keyword Features |
Lyu Xueqiang,Du Yifan,Zhang Le(),Pan Huiping,Tian Chi |
Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China |
|
|
Abstract [Objective] This paper proposes a text retrieval model for engineering consulting reports that combines graph convolution topological and keyword features. It addresses the insufficient semantic feature extraction issues in existing retrieval methods. [Methods] First, we built a text retrieval corpus of engineering consulting reports. Then, we fed the corpus into a BERT model to obtain contextual vectors. Third, we obtained the first matching score through a graph convolutional network and a deep interactive matching model. We also mapped the paragraph keywords to vectors using a Word2Vec model and calculated their similarity scores with the titles to obtain the second matching score. Finally, we got their final matching score by averaging the two matching scores. [Results] Compared with the joint ranking model CEDR, our new model was up to 3.06% higher in the P@20 metric. [Limitations] The data was mainly from engineering consulting reports of a large state-owned company, which needs to be expanded. [Conclusions] The GKTR model could effectively improve text retrieval for engineering consulting reports.
|
Received: 21 October 2022
Published: 16 May 2023
|
|
Fund:National Natural Science Foundation of China(62171043);Key Program of the National Language Commission of China(ZDI145-10) |
Corresponding Authors:
Zhang Le,ORCID:0000-0002-9620-511X,E-mail:zhangle@bistu.edu.cn。
|
[1] |
谢红生. 工程咨询报告校对常见问题研究[J]. 中国工程咨询, 2015(11): 46-47.
|
[1] |
(Xie Hongsheng. Research on Common Problems in Proofreading Engineering Consulting Report[J]. Chinese Consulting Engineers, 2015(11): 46-47.)
|
[2] |
丁志均, 杨青, 张会兵, 等. 基于非结构化文本检索模型综述[J]. 计算机应用研究, 2017, 34(6): 1601-1608,1612.
|
[2] |
(Ding Zhijun, Yang Qing, Zhang Huibing, et al. Review of Retrieval Models Based on Unstructured Text[J]. Application Research of Computers, 2017, 34(6): 1601-1608,1612.)
|
[3] |
Dierk S F. The SMART Retrieval System: Experiments in Automatic Document Processing—Gerard Salton, Ed. (Englewood Cliffs, N.J.: Prentice-Hall, 1971, 556 PP., $15.00)[J]. IEEE Transactions on Professional Communication, 1972, PC-15(1): 17.
|
[4] |
Robertson S E, Jones K S. Relevance Weighting of Search Terms[J]. Journal of the American Society for Information Science, 1976, 27(3): 129-146.
doi: 10.1002/asi.v27:3
|
[5] |
戚园园. 基于特征表示学习的文本检索研究[D]. 北京: 北京邮电大学, 2021.
|
[5] |
(Qi Yuanyuan. Research on Text Retrieval Based on Feature Representations Learning[D]. Beijing: Beijing University of Posts and Telecommunications, 2021.)
|
[6] |
Huang P S, He X D, Gao J F, et al. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data[C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 2013: 2333-2338.
|
[7] |
邹傲, 郝文宁, 靳大尉, 等. 基于预训练和深度哈希的大规模文本检索研究[J]. 计算机科学, 2021, 48(11): 300-306.
doi: 10.11896/jsjkx.210300266
|
[7] |
(Zou Ao, Hao Wenning, Jin Dawei, et al. Study on Text Retrieval Based on Pre-Training and Deep Hash[J]. Computer Science, 2021, 48(11): 300-306.)
doi: 10.11896/jsjkx.210300266
|
[8] |
Dai Z Y, Callan J. Deeper Text Understanding for IR with Contextual Neural Language Modeling[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2019: 985-988.
|
[9] |
陈丽萍, 任俊超. 基于对抗式数据增强的深度文本检索重排序[J]. 计算机系统应用, 2021, 30(7): 204-209.
|
[9] |
(Chen Liping, Ren Junchao. Deep Text Retrieval Re-Ranking Based on Adversarial Data Augmentation[J]. Computer Systems & Applications, 2021, 30(7): 204-209.)
|
[10] |
Schopf T, Braun D, Matthes F. Lbl2Vec: An Embedding-Based Approach for Unsupervised Document Retrieval on Predefined Topics[C]// Proceedings of the 17th International Conference on Web Information Systems and Technologies. 2021: 124-132.
|
[11] |
Gadamshetti S, Deepak G, Santhanavijayan A, et al. RDRLLJ:Integrating Deep Learning Approach with Latent Semantic Analysis for Document Retrieval[A]//Shetty N R, Patnaik L M, Nagaraj H C, et al. Emerging Research in Computing, Information, Communication and Applications[M]. Singapore: Springer, 2022: 999-1007.
|
[12] |
Abolghasemi A, Verberne S, Azzopardi L. Improving BERT-Based Query-by-Document Retrieval with Multi-Task Optimization[OL]. arXiv Preprint, arXiv: 2202.00373.
|
[13] |
张永伟, 刘婷, 刘畅, 等. 融合句法信息的文本语料库检索方法研究[J]. 数据分析与知识发现, 2022, 6(11): 25-37.
|
[13] |
(Zhang Yongwei, Liu Ting, Liu Chang, et al. Text Retrieval Based on Syntactic Information[J]. Data Analysis and Knowledge Discovery, 2022, 6(11): 25-37.)
|
[14] |
Qi Y Y, Zhang J Y, Liu Y S, et al. CGTR: Convolution Graph Topology Representation for Document Ranking[C]// Proceedings of the 29th ACM International Conference on Information & Knowledge Management. ACM, 2020: 2173-2176.
|
[15] |
曹帅. 基于深度学习的文本匹配研究综述[J]. 现代计算机, 2021(16): 74-78.
|
[15] |
(Cao Shuai. Survey of Research on Text Matching Based on Deep Learning[J]. Modern Computer, 2021(16): 74-78.)
|
[16] |
Fang H, Tao T, Zhai C X. Diagnostic Evaluation of Information Retrieval Models[J]. ACM Transactions on Information Systems, 2011, 29(2): Article No.7.
|
[17] |
Fang H, Zhai C X. Semantic Term Matching in Axiomatic Approaches to Information Retrieval[C]// Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2006: 115-122.
|
[18] |
Guo J F, Fan Y X, Ai Q Y, et al. A Deep Relevance Matching Model for Ad-Hoc Retrieval[C]// Proceedings of the 25th ACM International Conference on Information and Knowledge Management. ACM, 2016: 55-64.
|
[19] |
Xiong C Y, Dai Z Y, Callan J, et al. End-to-End Neural Ad-Hoc Ranking with Kernel Pooling[C]// Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2017: 55-64.
|
[20] |
Hui K, Yates A, Berberich K, et al. PACRR: A Position-Aware Neural IR Model for Relevance Matching[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1049-1058.
|
[21] |
Hui K, Yates A, Berberich K, et al. Co-PACRR: A Context-Aware Neural IR Model for Ad-Hoc Retrieval[C]// Proceedings of the 11th ACM International Conference on Web Search and Data Mining. ACM, 2018: 279-287.
|
[22] |
Ahmad W U, Chang K W, Wang H N. Context Attentive Document Ranking and Query Suggestion[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2019: 385-394.
|
[23] |
朱路, 邓芳, 刘坤, 等. 基于语义自编码哈希学习的跨模态检索方法[J]. 数据分析与知识发现, 2021, 5(12): 110-122.
|
[23] |
(Zhu Lu, Deng Fang, Liu Kun, et al. Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning[J]. Data Analysis and Knowledge Discovery, 2021, 5(12): 110-122.)
|
[24] |
Cui H J, Lu J Y, Ge Y, et al. How Can Graph Neural Networks Help Document Retrieval: A Case Study on CORD19 with Concept Map Generation[OL]. arXiv Preprint, arXiv: 2201. 04672.
|
[25] |
MacAvaney S, Yates A, Cohan A, et al. CEDR: Contextualized Embeddings for Document Ranking[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2019: 1101-1104.
|
[26] |
Humeau S, Shuster K, Lachaux M A, et al. Poly-Encoders: Transformer Architectures and Pre-Training Strategies for Fast and Accurate Multi-Sentence Scoring[OL]. arXiv Preprint, arXiv: 1905.01969.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|