GKTR Retrieval Model for Engineering Consulting Reports with Graph Convolution Topological and Keyword Features

doi:10.11925/infotech.2096-3467.2022.1099

Data Analysis and Knowledge Discovery

2023, Vol. 7

Issue (12): 155-163 DOI: 10.11925/infotech.2096-3467.2022.1099

Current Issue | Archive | Adv Search

GKTR Retrieval Model for Engineering Consulting Reports with Graph Convolution Topological and Keyword Features

Lyu Xueqiang,Du Yifan,Zhang Le(

),Pan Huiping,Tian Chi

Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China

Download: PDF (919 KB) HTML ( 7 )
Export: BibTeX | EndNote (RIS)

Abstract

[Objective] This paper proposes a text retrieval model for engineering consulting reports that combines graph convolution topological and keyword features. It addresses the insufficient semantic feature extraction issues in existing retrieval methods. [Methods] First, we built a text retrieval corpus of engineering consulting reports. Then, we fed the corpus into a BERT model to obtain contextual vectors. Third, we obtained the first matching score through a graph convolutional network and a deep interactive matching model. We also mapped the paragraph keywords to vectors using a Word2Vec model and calculated their similarity scores with the titles to obtain the second matching score. Finally, we got their final matching score by averaging the two matching scores. [Results] Compared with the joint ranking model CEDR, our new model was up to 3.06% higher in the P@20 metric. [Limitations] The data was mainly from engineering consulting reports of a large state-owned company, which needs to be expanded. [Conclusions] The GKTR model could effectively improve text retrieval for engineering consulting reports.

Key words： Text Retrieval Graph Convolution Network Keywords BERT Joint Ranking

Received: 21 October 2022 Published: 16 May 2023

ZTFLH:	TP391
	G35

Fund:National Natural Science Foundation of China(62171043);Key Program of the National Language Commission of China(ZDI145-10)

Corresponding Authors: Zhang Le，ORCID：0000-0002-9620-511X，E-mail：zhangle@bistu.edu.cn。

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Xueqiang Lyu
	Yifan Du
	Le Zhang
	Huiping Pan
	Chi Tian

Cite this article:

Lyu Xueqiang, Du Yifan, Zhang Le, Pan Huiping, Tian Chi. GKTR Retrieval Model for Engineering Consulting Reports with Graph Convolution Topological and Keyword Features. Data Analysis and Knowledge Discovery, 2023, 7(12): 155-163.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.1099 OR https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I12/155

Retrieval Model Integrating Graph Convolution Topological Features and Keyword Features

Keyword Annotation Sample

Sample of Training Data Tag

Example of Subject Heading Dictionary

Experimental Environment

Experimental Results

[1]	谢红生. 工程咨询报告校对常见问题研究[J]. 中国工程咨询, 2015(11): 46-47.
[1]	(Xie Hongsheng. Research on Common Problems in Proofreading Engineering Consulting Report[J]. Chinese Consulting Engineers, 2015(11): 46-47.)
[2]	丁志均, 杨青, 张会兵, 等. 基于非结构化文本检索模型综述[J]. 计算机应用研究, 2017, 34(6): 1601-1608,1612.
[2]	(Ding Zhijun, Yang Qing, Zhang Huibing, et al. Review of Retrieval Models Based on Unstructured Text[J]. Application Research of Computers, 2017, 34(6): 1601-1608,1612.)
[3]	Dierk S F. The SMART Retrieval System: Experiments in Automatic Document Processing—Gerard Salton, Ed. (Englewood Cliffs, N.J.: Prentice-Hall, 1971, 556 PP., $15.00)[J]. IEEE Transactions on Professional Communication, 1972, PC-15(1): 17.
[4]	Robertson S E, Jones K S. Relevance Weighting of Search Terms[J]. Journal of the American Society for Information Science, 1976, 27(3): 129-146. doi: 10.1002/asi.v27:3
[5]	戚园园. 基于特征表示学习的文本检索研究[D]. 北京: 北京邮电大学, 2021.
[5]	(Qi Yuanyuan. Research on Text Retrieval Based on Feature Representations Learning[D]. Beijing: Beijing University of Posts and Telecommunications, 2021.)
[6]	Huang P S, He X D, Gao J F, et al. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data[C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 2013: 2333-2338.
[7]	邹傲, 郝文宁, 靳大尉, 等. 基于预训练和深度哈希的大规模文本检索研究[J]. 计算机科学, 2021, 48(11): 300-306. doi: 10.11896/jsjkx.210300266
[7]	(Zou Ao, Hao Wenning, Jin Dawei, et al. Study on Text Retrieval Based on Pre-Training and Deep Hash[J]. Computer Science, 2021, 48(11): 300-306.) doi: 10.11896/jsjkx.210300266
[8]	Dai Z Y, Callan J. Deeper Text Understanding for IR with Contextual Neural Language Modeling[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2019: 985-988.
[9]	陈丽萍, 任俊超. 基于对抗式数据增强的深度文本检索重排序[J]. 计算机系统应用, 2021, 30(7): 204-209.
[9]	(Chen Liping, Ren Junchao. Deep Text Retrieval Re-Ranking Based on Adversarial Data Augmentation[J]. Computer Systems & Applications, 2021, 30(7): 204-209.)
[10]	Schopf T, Braun D, Matthes F. Lbl2Vec: An Embedding-Based Approach for Unsupervised Document Retrieval on Predefined Topics[C]// Proceedings of the 17th International Conference on Web Information Systems and Technologies. 2021: 124-132.
[11]	Gadamshetti S, Deepak G, Santhanavijayan A, et al. RDRLLJ:Integrating Deep Learning Approach with Latent Semantic Analysis for Document Retrieval[A]//Shetty N R, Patnaik L M, Nagaraj H C, et al. Emerging Research in Computing, Information, Communication and Applications[M]. Singapore: Springer, 2022: 999-1007.
[12]	Abolghasemi A, Verberne S, Azzopardi L. Improving BERT-Based Query-by-Document Retrieval with Multi-Task Optimization[OL]. arXiv Preprint, arXiv: 2202.00373.
[13]	张永伟, 刘婷, 刘畅, 等. 融合句法信息的文本语料库检索方法研究[J]. 数据分析与知识发现, 2022, 6(11): 25-37.
[13]	(Zhang Yongwei, Liu Ting, Liu Chang, et al. Text Retrieval Based on Syntactic Information[J]. Data Analysis and Knowledge Discovery, 2022, 6(11): 25-37.)
[14]	Qi Y Y, Zhang J Y, Liu Y S, et al. CGTR: Convolution Graph Topology Representation for Document Ranking[C]// Proceedings of the 29th ACM International Conference on Information & Knowledge Management. ACM, 2020: 2173-2176.
[15]	曹帅. 基于深度学习的文本匹配研究综述[J]. 现代计算机, 2021(16): 74-78.
[15]	(Cao Shuai. Survey of Research on Text Matching Based on Deep Learning[J]. Modern Computer, 2021(16): 74-78.)
[16]	Fang H, Tao T, Zhai C X. Diagnostic Evaluation of Information Retrieval Models[J]. ACM Transactions on Information Systems, 2011, 29(2): Article No.7.
[17]	Fang H, Zhai C X. Semantic Term Matching in Axiomatic Approaches to Information Retrieval[C]// Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2006: 115-122.
[18]	Guo J F, Fan Y X, Ai Q Y, et al. A Deep Relevance Matching Model for Ad-Hoc Retrieval[C]// Proceedings of the 25th ACM International Conference on Information and Knowledge Management. ACM, 2016: 55-64.
[19]	Xiong C Y, Dai Z Y, Callan J, et al. End-to-End Neural Ad-Hoc Ranking with Kernel Pooling[C]// Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2017: 55-64.
[20]	Hui K, Yates A, Berberich K, et al. PACRR: A Position-Aware Neural IR Model for Relevance Matching[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1049-1058.
[21]	Hui K, Yates A, Berberich K, et al. Co-PACRR: A Context-Aware Neural IR Model for Ad-Hoc Retrieval[C]// Proceedings of the 11th ACM International Conference on Web Search and Data Mining. ACM, 2018: 279-287.
[22]	Ahmad W U, Chang K W, Wang H N. Context Attentive Document Ranking and Query Suggestion[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2019: 385-394.
[23]	朱路, 邓芳, 刘坤, 等. 基于语义自编码哈希学习的跨模态检索方法[J]. 数据分析与知识发现, 2021, 5(12): 110-122.
[23]	(Zhu Lu, Deng Fang, Liu Kun, et al. Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning[J]. Data Analysis and Knowledge Discovery, 2021, 5(12): 110-122.)
[24]	Cui H J, Lu J Y, Ge Y, et al. How Can Graph Neural Networks Help Document Retrieval: A Case Study on CORD19 with Concept Map Generation[OL]. arXiv Preprint, arXiv: 2201. 04672.
[25]	MacAvaney S, Yates A, Cohan A, et al. CEDR: Contextualized Embeddings for Document Ranking[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2019: 1101-1104.
[26]	Humeau S, Shuster K, Lachaux M A, et al. Poly-Encoders: Transformer Architectures and Pre-Training Strategies for Fast and Accurate Multi-Sentence Scoring[OL]. arXiv Preprint, arXiv: 1905.01969.

[1]	He Chaocheng, Huang Qian, Li Xinru, Wang Chunying, Wu Jiang. Trending Topics on Metaverse: A Microblog Text Analysis with BERT and DTM[J]. 数据分析与知识发现, 2023, 7(9): 25-38.
[2]	Zhao Xuefeng, Wu Delin, Wu Weiwei, Sun Zhuoluo, Hu Jinjin, Lian Ying, Shan Jiayu. Identifying High-Quality Technology Patents Based on Deep Learning and Multi-Category Polling Mechanism——Case Study of Patent Applications[J]. 数据分析与知识发现, 2023, 7(8): 30-45.
[3]	Ben Yanyan, Pang Xueqin. Identifying Medical Named Entities with Word Information[J]. 数据分析与知识发现, 2023, 7(5): 123-132.
[4]	Xu Kang, Yu Shengnan, Chen Lei, Wang Chuandong. Linguistic Knowledge-Enhanced Self-Supervised Graph Convolutional Network for Event Relation Extraction[J]. 数据分析与知识发现, 2023, 7(5): 92-104.
[5]	Su Mingxing, Wu Houyue, Li Jian, Huang Ju, Zhang Shunxiang. AEMIA:Extracting Commodity Attributes Based on Multi-level Interactive Attention Mechanism[J]. 数据分析与知识发现, 2023, 7(2): 108-118.
[6]	Zhao Yiming, Pan Pei, Mao Jin. Recognizing Intensity of Medical Query Intentions Based on Task Knowledge Fusion and Text Data Enhancement[J]. 数据分析与知识发现, 2023, 7(2): 38-47.
[7]	Wang Yufei, Zhang Zhixiong, Zhao Yang, Zhang Mengting, Li Xuesi. Designing and Implementing Automatic Title Generation System for Sci-Tech Papers[J]. 数据分析与知识发现, 2023, 7(2): 61-71.
[8]	Zhang Siyang, Wei Subo, Sun Zhengyan, Zhang Shunxiang, Zhu Guangli, Wu Houyue. Extracting Emotion-Cause Pairs Based on Multi-Label Seq2Seq Model[J]. 数据分析与知识发现, 2023, 7(2): 86-96.
[9]	Wu Xuxu, Chen Peng, Jiang Huan. Micro-Blog Fine-Grained Sentiment Analysis Based on Multi-Feature Fusion[J]. 数据分析与知识发现, 2023, 7(12): 102-113.
[10]	Gao Haoxin, Sun Lijuan, Wu Jingchen, Gao Yutong, Wu Xu. Online Sensitive Text Classification Model Based on Heterogeneous Graph Convolutional Network[J]. 数据分析与知识发现, 2023, 7(11): 26-36.
[11]	Li Nan, Wang Bo. Recognition and Visual Analysis of Interdisciplinary Semantic Drift[J]. 数据分析与知识发现, 2023, 7(10): 15-24.
[12]	Pan Xiaoyu, Ni Yuan, Jin Chunhua, Zhang Jian. Extracting Value Elements and Constructing Index System for Calligraphy Works Based on Hyperplane-BERT-Louvain Optimized LDA Model[J]. 数据分析与知识发现, 2023, 7(10): 109-118.
[13]	Shi Yunmei, Yuan Bo, Zhang Le, Lv Xueqiang. IMTS: Detecting Fake Reviews with Image and Text Semantics[J]. 数据分析与知识发现, 2022, 6(8): 84-96.
[14]	Wu Jiang, Liu Tao, Liu Yang. Mining Online User Profiles and Self-Presentations: Case Study of NetEase Music Community[J]. 数据分析与知识发现, 2022, 6(7): 56-69.
[15]	Zheng Jie, Huang Hui, Qin Yongbin. Matching Similar Cases with Legal Knowledge Fusion[J]. 数据分析与知识发现, 2022, 6(7): 99-106.

Viewed

Full text

Abstract

Cited

Shared

Discussed