GKTR Retrieval Model for Engineering Consulting Reports with Graph Convolution Topological and Keyword Features
Lyu Xueqiang,Du Yifan,Zhang Le(),Pan Huiping,Tian Chi
Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China
[Objective] This paper proposes a text retrieval model for engineering consulting reports that combines graph convolution topological and keyword features. It addresses the insufficient semantic feature extraction issues in existing retrieval methods. [Methods] First, we built a text retrieval corpus of engineering consulting reports. Then, we fed the corpus into a BERT model to obtain contextual vectors. Third, we obtained the first matching score through a graph convolutional network and a deep interactive matching model. We also mapped the paragraph keywords to vectors using a Word2Vec model and calculated their similarity scores with the titles to obtain the second matching score. Finally, we got their final matching score by averaging the two matching scores. [Results] Compared with the joint ranking model CEDR, our new model was up to 3.06% higher in the P@20 metric. [Limitations] The data was mainly from engineering consulting reports of a large state-owned company, which needs to be expanded. [Conclusions] The GKTR model could effectively improve text retrieval for engineering consulting reports.
吕学强, 杜一凡, 张乐, 潘慧萍, 田驰. GKTR:一种融合图卷积拓扑特征和关键词特征的工程咨询报告检索模型*[J]. 数据分析与知识发现, 2023, 7(12): 155-163.
Lyu Xueqiang, Du Yifan, Zhang Le, Pan Huiping, Tian Chi. GKTR Retrieval Model for Engineering Consulting Reports with Graph Convolution Topological and Keyword Features. Data Analysis and Knowledge Discovery, 2023, 7(12): 155-163.
(Ding Zhijun, Yang Qing, Zhang Huibing, et al. Review of Retrieval Models Based on Unstructured Text[J]. Application Research of Computers, 2017, 34(6): 1601-1608,1612.)
[3]
Dierk S F. The SMART Retrieval System: Experiments in Automatic Document Processing—Gerard Salton, Ed. (Englewood Cliffs, N.J.: Prentice-Hall, 1971, 556 PP., $15.00)[J]. IEEE Transactions on Professional Communication, 1972, PC-15(1): 17.
[4]
Robertson S E, Jones K S. Relevance Weighting of Search Terms[J]. Journal of the American Society for Information Science, 1976, 27(3): 129-146.
doi: 10.1002/asi.v27:3
[5]
戚园园. 基于特征表示学习的文本检索研究[D]. 北京: 北京邮电大学, 2021.
[5]
(Qi Yuanyuan. Research on Text Retrieval Based on Feature Representations Learning[D]. Beijing: Beijing University of Posts and Telecommunications, 2021.)
[6]
Huang P S, He X D, Gao J F, et al. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data[C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 2013: 2333-2338.
(Zou Ao, Hao Wenning, Jin Dawei, et al. Study on Text Retrieval Based on Pre-Training and Deep Hash[J]. Computer Science, 2021, 48(11): 300-306.)
doi: 10.11896/jsjkx.210300266
[8]
Dai Z Y, Callan J. Deeper Text Understanding for IR with Contextual Neural Language Modeling[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2019: 985-988.
(Chen Liping, Ren Junchao. Deep Text Retrieval Re-Ranking Based on Adversarial Data Augmentation[J]. Computer Systems & Applications, 2021, 30(7): 204-209.)
[10]
Schopf T, Braun D, Matthes F. Lbl2Vec: An Embedding-Based Approach for Unsupervised Document Retrieval on Predefined Topics[C]// Proceedings of the 17th International Conference on Web Information Systems and Technologies. 2021: 124-132.
[11]
Gadamshetti S, Deepak G, Santhanavijayan A, et al. RDRLLJ:Integrating Deep Learning Approach with Latent Semantic Analysis for Document Retrieval[A]//Shetty N R, Patnaik L M, Nagaraj H C, et al. Emerging Research in Computing, Information, Communication and Applications[M]. Singapore: Springer, 2022: 999-1007.
[12]
Abolghasemi A, Verberne S, Azzopardi L. Improving BERT-Based Query-by-Document Retrieval with Multi-Task Optimization[OL]. arXiv Preprint, arXiv: 2202.00373.
(Zhang Yongwei, Liu Ting, Liu Chang, et al. Text Retrieval Based on Syntactic Information[J]. Data Analysis and Knowledge Discovery, 2022, 6(11): 25-37.)
[14]
Qi Y Y, Zhang J Y, Liu Y S, et al. CGTR: Convolution Graph Topology Representation for Document Ranking[C]// Proceedings of the 29th ACM International Conference on Information & Knowledge Management. ACM, 2020: 2173-2176.
[15]
曹帅. 基于深度学习的文本匹配研究综述[J]. 现代计算机, 2021(16): 74-78.
[15]
(Cao Shuai. Survey of Research on Text Matching Based on Deep Learning[J]. Modern Computer, 2021(16): 74-78.)
[16]
Fang H, Tao T, Zhai C X. Diagnostic Evaluation of Information Retrieval Models[J]. ACM Transactions on Information Systems, 2011, 29(2): Article No.7.
[17]
Fang H, Zhai C X. Semantic Term Matching in Axiomatic Approaches to Information Retrieval[C]// Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2006: 115-122.
[18]
Guo J F, Fan Y X, Ai Q Y, et al. A Deep Relevance Matching Model for Ad-Hoc Retrieval[C]// Proceedings of the 25th ACM International Conference on Information and Knowledge Management. ACM, 2016: 55-64.
[19]
Xiong C Y, Dai Z Y, Callan J, et al. End-to-End Neural Ad-Hoc Ranking with Kernel Pooling[C]// Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2017: 55-64.
[20]
Hui K, Yates A, Berberich K, et al. PACRR: A Position-Aware Neural IR Model for Relevance Matching[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1049-1058.
[21]
Hui K, Yates A, Berberich K, et al. Co-PACRR: A Context-Aware Neural IR Model for Ad-Hoc Retrieval[C]// Proceedings of the 11th ACM International Conference on Web Search and Data Mining. ACM, 2018: 279-287.
[22]
Ahmad W U, Chang K W, Wang H N. Context Attentive Document Ranking and Query Suggestion[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2019: 385-394.
(Zhu Lu, Deng Fang, Liu Kun, et al. Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning[J]. Data Analysis and Knowledge Discovery, 2021, 5(12): 110-122.)
[24]
Cui H J, Lu J Y, Ge Y, et al. How Can Graph Neural Networks Help Document Retrieval: A Case Study on CORD19 with Concept Map Generation[OL]. arXiv Preprint, arXiv: 2201. 04672.
[25]
MacAvaney S, Yates A, Cohan A, et al. CEDR: Contextualized Embeddings for Document Ranking[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2019: 1101-1104.
[26]
Humeau S, Shuster K, Lachaux M A, et al. Poly-Encoders: Transformer Architectures and Pre-Training Strategies for Fast and Accurate Multi-Sentence Scoring[OL]. arXiv Preprint, arXiv: 1905.01969.