Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (12): 155-163    DOI: 10.11925/infotech.2096-3467.2022.1099
Current Issue | Archive | Adv Search |
GKTR Retrieval Model for Engineering Consulting Reports with Graph Convolution Topological and Keyword Features
Lyu Xueqiang,Du Yifan,Zhang Le(),Pan Huiping,Tian Chi
Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China
Download: PDF (919 KB)   HTML ( 7
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a text retrieval model for engineering consulting reports that combines graph convolution topological and keyword features. It addresses the insufficient semantic feature extraction issues in existing retrieval methods. [Methods] First, we built a text retrieval corpus of engineering consulting reports. Then, we fed the corpus into a BERT model to obtain contextual vectors. Third, we obtained the first matching score through a graph convolutional network and a deep interactive matching model. We also mapped the paragraph keywords to vectors using a Word2Vec model and calculated their similarity scores with the titles to obtain the second matching score. Finally, we got their final matching score by averaging the two matching scores. [Results] Compared with the joint ranking model CEDR, our new model was up to 3.06% higher in the P@20 metric. [Limitations] The data was mainly from engineering consulting reports of a large state-owned company, which needs to be expanded. [Conclusions] The GKTR model could effectively improve text retrieval for engineering consulting reports.

Key wordsText Retrieval      Graph Convolution Network      Keywords      BERT      Joint Ranking     
Received: 21 October 2022      Published: 16 May 2023
ZTFLH:  TP391  
  G35  
Fund:National Natural Science Foundation of China(62171043);Key Program of the National Language Commission of China(ZDI145-10)
Corresponding Authors: Zhang Le,ORCID:0000-0002-9620-511X,E-mail:zhangle@bistu.edu.cn。   

Cite this article:

Lyu Xueqiang, Du Yifan, Zhang Le, Pan Huiping, Tian Chi. GKTR Retrieval Model for Engineering Consulting Reports with Graph Convolution Topological and Keyword Features. Data Analysis and Knowledge Discovery, 2023, 7(12): 155-163.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.1099     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I12/155

Retrieval Model Integrating Graph Convolution Topological Features and Keyword Features
段落文本 标注关键词
北京市大兴区旧宫镇第一中心小学始建于1948年,1978年迁址并确定为中心小学,2001年更名为第一中心小学。为落实大兴区教育规划和学校楼房化建设要求,整合旧宫镇教育资源…… 小学,教育资源,布局,教育质量
我国近二十年在社会经济各方面都取得了长足的发展,职业教育作为终身教育体系的重要组成部分,为我国的现代化建设培养了大量高素质的劳动者和实用型人才…… 职业教育,教学质量,技能
Keyword Annotation Sample
标题序号 段落序号 匹配序号 相似度分数
q1 d18 1 25.20
q1 d38 2 21.00
q1 d35 3 20.74
q1 d36 4 18.98
q1 d657 5 18.98
Sample of Training Data Tag
主题词 关键词
教育资源 教育资源、配套设施、资源配置、优质资源…
教育发展 教育发展、高质量发展、多元化发展、全面发展…
教育布局 教育布局、资源布局、教育结构布局、统筹布局…
学位 学位、入学率、学位缺口、学位不足、入学压力…
均衡 均衡、均衡配置、均衡发展、优质均衡、均衡资源配置…
Example of Subject Heading Dictionary
操作系统 Linux
CPU Intel(R)Xeon(R)Gold 5118 CPU @2.30GHz
显卡 Tesla P4
Python 3.6.9
PyTorch 1.10.0
Experimental Environment
排序方法 模型 P@20(%)
Vanilla BERT CEDR 73.33
GKTR 76.39
DRMM CEDR 76.11
GKTR 78.24
KNRM CEDR 73.89
GKTR 75.34
PACRR CEDR 74.44
GKTR 75.97
Experimental Results
[1] 谢红生. 工程咨询报告校对常见问题研究[J]. 中国工程咨询, 2015(11): 46-47.
[1] (Xie Hongsheng. Research on Common Problems in Proofreading Engineering Consulting Report[J]. Chinese Consulting Engineers, 2015(11): 46-47.)
[2] 丁志均, 杨青, 张会兵, 等. 基于非结构化文本检索模型综述[J]. 计算机应用研究, 2017, 34(6): 1601-1608,1612.
[2] (Ding Zhijun, Yang Qing, Zhang Huibing, et al. Review of Retrieval Models Based on Unstructured Text[J]. Application Research of Computers, 2017, 34(6): 1601-1608,1612.)
[3] Dierk S F. The SMART Retrieval System: Experiments in Automatic Document Processing—Gerard Salton, Ed. (Englewood Cliffs, N.J.: Prentice-Hall, 1971, 556 PP., $15.00)[J]. IEEE Transactions on Professional Communication, 1972, PC-15(1): 17.
[4] Robertson S E, Jones K S. Relevance Weighting of Search Terms[J]. Journal of the American Society for Information Science, 1976, 27(3): 129-146.
doi: 10.1002/asi.v27:3
[5] 戚园园. 基于特征表示学习的文本检索研究[D]. 北京: 北京邮电大学, 2021.
[5] (Qi Yuanyuan. Research on Text Retrieval Based on Feature Representations Learning[D]. Beijing: Beijing University of Posts and Telecommunications, 2021.)
[6] Huang P S, He X D, Gao J F, et al. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data[C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 2013: 2333-2338.
[7] 邹傲, 郝文宁, 靳大尉, 等. 基于预训练和深度哈希的大规模文本检索研究[J]. 计算机科学, 2021, 48(11): 300-306.
doi: 10.11896/jsjkx.210300266
[7] (Zou Ao, Hao Wenning, Jin Dawei, et al. Study on Text Retrieval Based on Pre-Training and Deep Hash[J]. Computer Science, 2021, 48(11): 300-306.)
doi: 10.11896/jsjkx.210300266
[8] Dai Z Y, Callan J. Deeper Text Understanding for IR with Contextual Neural Language Modeling[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2019: 985-988.
[9] 陈丽萍, 任俊超. 基于对抗式数据增强的深度文本检索重排序[J]. 计算机系统应用, 2021, 30(7): 204-209.
[9] (Chen Liping, Ren Junchao. Deep Text Retrieval Re-Ranking Based on Adversarial Data Augmentation[J]. Computer Systems & Applications, 2021, 30(7): 204-209.)
[10] Schopf T, Braun D, Matthes F. Lbl2Vec: An Embedding-Based Approach for Unsupervised Document Retrieval on Predefined Topics[C]// Proceedings of the 17th International Conference on Web Information Systems and Technologies. 2021: 124-132.
[11] Gadamshetti S, Deepak G, Santhanavijayan A, et al. RDRLLJ:Integrating Deep Learning Approach with Latent Semantic Analysis for Document Retrieval[A]//Shetty N R, Patnaik L M, Nagaraj H C, et al. Emerging Research in Computing, Information, Communication and Applications[M]. Singapore: Springer, 2022: 999-1007.
[12] Abolghasemi A, Verberne S, Azzopardi L. Improving BERT-Based Query-by-Document Retrieval with Multi-Task Optimization[OL]. arXiv Preprint, arXiv: 2202.00373.
[13] 张永伟, 刘婷, 刘畅, 等. 融合句法信息的文本语料库检索方法研究[J]. 数据分析与知识发现, 2022, 6(11): 25-37.
[13] (Zhang Yongwei, Liu Ting, Liu Chang, et al. Text Retrieval Based on Syntactic Information[J]. Data Analysis and Knowledge Discovery, 2022, 6(11): 25-37.)
[14] Qi Y Y, Zhang J Y, Liu Y S, et al. CGTR: Convolution Graph Topology Representation for Document Ranking[C]// Proceedings of the 29th ACM International Conference on Information & Knowledge Management. ACM, 2020: 2173-2176.
[15] 曹帅. 基于深度学习的文本匹配研究综述[J]. 现代计算机, 2021(16): 74-78.
[15] (Cao Shuai. Survey of Research on Text Matching Based on Deep Learning[J]. Modern Computer, 2021(16): 74-78.)
[16] Fang H, Tao T, Zhai C X. Diagnostic Evaluation of Information Retrieval Models[J]. ACM Transactions on Information Systems, 2011, 29(2): Article No.7.
[17] Fang H, Zhai C X. Semantic Term Matching in Axiomatic Approaches to Information Retrieval[C]// Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2006: 115-122.
[18] Guo J F, Fan Y X, Ai Q Y, et al. A Deep Relevance Matching Model for Ad-Hoc Retrieval[C]// Proceedings of the 25th ACM International Conference on Information and Knowledge Management. ACM, 2016: 55-64.
[19] Xiong C Y, Dai Z Y, Callan J, et al. End-to-End Neural Ad-Hoc Ranking with Kernel Pooling[C]// Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2017: 55-64.
[20] Hui K, Yates A, Berberich K, et al. PACRR: A Position-Aware Neural IR Model for Relevance Matching[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 1049-1058.
[21] Hui K, Yates A, Berberich K, et al. Co-PACRR: A Context-Aware Neural IR Model for Ad-Hoc Retrieval[C]// Proceedings of the 11th ACM International Conference on Web Search and Data Mining. ACM, 2018: 279-287.
[22] Ahmad W U, Chang K W, Wang H N. Context Attentive Document Ranking and Query Suggestion[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2019: 385-394.
[23] 朱路, 邓芳, 刘坤, 等. 基于语义自编码哈希学习的跨模态检索方法[J]. 数据分析与知识发现, 2021, 5(12): 110-122.
[23] (Zhu Lu, Deng Fang, Liu Kun, et al. Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning[J]. Data Analysis and Knowledge Discovery, 2021, 5(12): 110-122.)
[24] Cui H J, Lu J Y, Ge Y, et al. How Can Graph Neural Networks Help Document Retrieval: A Case Study on CORD19 with Concept Map Generation[OL]. arXiv Preprint, arXiv: 2201. 04672.
[25] MacAvaney S, Yates A, Cohan A, et al. CEDR: Contextualized Embeddings for Document Ranking[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2019: 1101-1104.
[26] Humeau S, Shuster K, Lachaux M A, et al. Poly-Encoders: Transformer Architectures and Pre-Training Strategies for Fast and Accurate Multi-Sentence Scoring[OL]. arXiv Preprint, arXiv: 1905.01969.
[1] He Chaocheng, Huang Qian, Li Xinru, Wang Chunying, Wu Jiang. Trending Topics on Metaverse: A Microblog Text Analysis with BERT and DTM[J]. 数据分析与知识发现, 2023, 7(9): 25-38.
[2] Zhao Xuefeng, Wu Delin, Wu Weiwei, Sun Zhuoluo, Hu Jinjin, Lian Ying, Shan Jiayu. Identifying High-Quality Technology Patents Based on Deep Learning and Multi-Category Polling Mechanism——Case Study of Patent Applications[J]. 数据分析与知识发现, 2023, 7(8): 30-45.
[3] Ben Yanyan, Pang Xueqin. Identifying Medical Named Entities with Word Information[J]. 数据分析与知识发现, 2023, 7(5): 123-132.
[4] Xu Kang, Yu Shengnan, Chen Lei, Wang Chuandong. Linguistic Knowledge-Enhanced Self-Supervised Graph Convolutional Network for Event Relation Extraction[J]. 数据分析与知识发现, 2023, 7(5): 92-104.
[5] Su Mingxing, Wu Houyue, Li Jian, Huang Ju, Zhang Shunxiang. AEMIA:Extracting Commodity Attributes Based on Multi-level Interactive Attention Mechanism[J]. 数据分析与知识发现, 2023, 7(2): 108-118.
[6] Zhao Yiming, Pan Pei, Mao Jin. Recognizing Intensity of Medical Query Intentions Based on Task Knowledge Fusion and Text Data Enhancement[J]. 数据分析与知识发现, 2023, 7(2): 38-47.
[7] Wang Yufei, Zhang Zhixiong, Zhao Yang, Zhang Mengting, Li Xuesi. Designing and Implementing Automatic Title Generation System for Sci-Tech Papers[J]. 数据分析与知识发现, 2023, 7(2): 61-71.
[8] Zhang Siyang, Wei Subo, Sun Zhengyan, Zhang Shunxiang, Zhu Guangli, Wu Houyue. Extracting Emotion-Cause Pairs Based on Multi-Label Seq2Seq Model[J]. 数据分析与知识发现, 2023, 7(2): 86-96.
[9] Wu Xuxu, Chen Peng, Jiang Huan. Micro-Blog Fine-Grained Sentiment Analysis Based on Multi-Feature Fusion[J]. 数据分析与知识发现, 2023, 7(12): 102-113.
[10] Gao Haoxin, Sun Lijuan, Wu Jingchen, Gao Yutong, Wu Xu. Online Sensitive Text Classification Model Based on Heterogeneous Graph Convolutional Network[J]. 数据分析与知识发现, 2023, 7(11): 26-36.
[11] Li Nan, Wang Bo. Recognition and Visual Analysis of Interdisciplinary Semantic Drift[J]. 数据分析与知识发现, 2023, 7(10): 15-24.
[12] Pan Xiaoyu, Ni Yuan, Jin Chunhua, Zhang Jian. Extracting Value Elements and Constructing Index System for Calligraphy Works Based on Hyperplane-BERT-Louvain Optimized LDA Model[J]. 数据分析与知识发现, 2023, 7(10): 109-118.
[13] Shi Yunmei, Yuan Bo, Zhang Le, Lv Xueqiang. IMTS: Detecting Fake Reviews with Image and Text Semantics[J]. 数据分析与知识发现, 2022, 6(8): 84-96.
[14] Wu Jiang, Liu Tao, Liu Yang. Mining Online User Profiles and Self-Presentations: Case Study of NetEase Music Community[J]. 数据分析与知识发现, 2022, 6(7): 56-69.
[15] Zheng Jie, Huang Hui, Qin Yongbin. Matching Similar Cases with Legal Knowledge Fusion[J]. 数据分析与知识发现, 2022, 6(7): 99-106.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn