政策文本的知识建模与关联问答研究

doi:10.11925/infotech.2096-3467.2022.0185

数据分析与知识发现

2022, Vol. 6

Issue (11): 79-92 https://doi.org/10.11925/infotech.2096-3467.2022.0185

研究论文

本期目录 | 过刊浏览 | 高级检索

政策文本的知识建模与关联问答研究

华斌^1,²,康月¹(

),范林昊²

¹天津财经大学管理科学与工程学院天津 300222
²天津财经大学理工学院天津 300222

Knowledge Modeling and Association Q&A for Policy Texts

Hua Bin^1,²,Kang Yue¹(

),Fan Linhao²

¹School of Management Science and Engineering, Tianjin University of Finance and Economics, Tianjin 300222, China
²School of Science and Technology, Tianjin University of Finance and Economics, Tianjin 300222, China

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (2455 KB) HTML ( 12 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】 实现一种以认知层语义知识理解为主导的关联政策智能问答方法，提升政府的社会综合服务效率与能力。【方法】 基于政策文本内涵建立知识模型表达政策知识；引入疑问词注意力机制，结合改进的ERNIE+CNN模型完成政策问题分类；利用融合句法分析的语义角色标注IDCNN+CRF模型与认知计算方法进行问题语义、语用层面知识获取；在知识融合与语义检索的基础上，利用知识聚合技术实现关联答案的生成，并采用BERT语义相似度计算与知识单元计量方法对答案进行双重质量评价。【结果】 问题分类准确率达到90.76%，分别高出原始BERT、ERNIE模型18.81、5.05个百分点；问题知识获取精确率达到95.88%，答案质量检验的正确率达到93.75%，答案的语义相似度结果为0.88，知识一致性结果为0.96。【局限】 问题知识获取方法性能受限于领域知识体系完整性，关联答案效果取决于政策知识抽取的准确性。【结论】 在对政策文本内容解构并进行知识表示的基础上，所提方法可以综合不同政策内容的问题答案，并具有较好的知识检验结果。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	华斌
	康月
	范林昊

关键词 ：智能问答, 文本挖掘, 电子政务, 政策知识建模, 知识图谱, 知识聚合

Abstract：

[Objective] This paper develops a smart question-answering model for association policy based on cognitive semantic knowledge understanding, aiming to improve the government services. [Methods] First, we established a model based on policy connotation to express policy knowledge. Then, we introduced the attention mechanism for question words and classified policy issues combining the improved ERNIE + CNN model. Third, we used the semantic role labeling IDCNN + CRF model and cognitive computing method to obtain the semantics and pragmatic knowledge. Finally, based on knowledge fusion and semantic retrieval, we utilized knowledge aggregation technology to generate relevant answers. We also adopted the BERT semantic similarity calculation and knowledge unit measurement to evaluate the quality of answers. [Results] The accuracy of problem classification reached 90.76%, which was 18.81% and 5.05% higher than those of the original BERT and ERNIE models. The precision of problem knowledge acquisition reached 95.88%, and the accuracy of the answer quality reached 93.75%. The semantic similarity of the answers was 0.88, while the knowledge consistency was 0.96. [Limitations] The performance of our model is limited by the integrity of the domain knowledge system, while the answers’ relevance relies on the accuracy of policy knowledge extraction. [Conclusions] Based on the deconstruction of policy contents and scientific knowledge representation, the proposed method can generate answers for questions on different policy contents.

Key words： Intelligent Question and Answering Text Mining E-Government Policy Knowledge Model Knowledge Graph Knowledge Aggregation

收稿日期: 2022-03-07 出版日期: 2023-01-13

ZTFLH:

TP391

通讯作者: 康月 E-mail: 18502612743@163.com

引用本文:

华斌,康月,范林昊. 政策文本的知识建模与关联问答研究[J]. 数据分析与知识发现, 2022, 6(11): 79-92.
Hua Bin,Kang Yue,Fan Linhao. Knowledge Modeling and Association Q&A for Policy Texts. Data Analysis and Knowledge Discovery, 2022, 6(11): 79-92.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0185 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I11/79

Fig.1 政策关联问答研究框架

Fig.2 政策知识模型构建

Fig.3 QAM-ERNIE-CNN模型结构

Table 1 概念间语义关系描述

Fig.4 最优阈值知识融合实验结果

Fig.5 单一政策知识图谱

Table 2 问题分类数据统计情况

Table 3 问题分类结果

Table 4 问句处理效率分析

问题类别	$q s i m$ 均值	$k r$ 均值	$k r'$ 均值	$k d i f$ 均值	$q c o n$ 均值
事实类（29）	0.958	1.770	1.753	0.034	0.991
方法类（25）	0.920	1.657	1.467	0.600	1.020
原因类（13）	0.768	2.769	2.231	0.846	0.859

Table 5 答案质量评价均值结果

Fig.6 双层政策检索查询示例

Fig.7 相关配套政策知识检索

Table 6 知识检索效率分析

Table 7 政策关联答案生成示例（部分）

[1]	工业和信息化部. 工业和信息化部中小企业局关于印发《中小企业数字化赋能服务产品及活动推荐目录(第一期)》的通知[EB/OL].(2020-04-21). https://www.miit.gov.cn/jgsj/qyj/wjfb/art/2020/art_4d845224c9ee4d4aa061841fb3f6014b.html.
[1]	(Ministry of Industry and Information Technology. Notice of the Small and Medium-Sized Enterprise Bureau of the Ministry of Industry and Information Technology on the Issuance of the Recommended Catalogue of Digital Empowerment Service Products and Activities for Small and Medium-Sized Enterprises(Phase I)[EB/OL].(2020-04-21). https://www.miit.gov.cn/jgsj/qyj/wjfb/art/2020/art_4d845224c9ee4d4aa061841fb3f6014b.html.)
[2]	国务院办公厅. 国务院办公厅关于印发2021年政务公开工作要点的通知[EB/OL].(2021-04-23). http://www.gov.cn/zhengce/content/2021-04/23/content_5601602.htm.
[2]	(General Office of the State Council. Notice of the General Office of the State Council on the Issuance of Key Points of Government Affairs Publicity in 2021[EB/OL].(2021-04-23). http://www.gov.cn/zhengce/content/2021-04/23/content_5601602.htm.)
[3]	Graesser A C, Murachver T. Symbolic Procedures of Question Answering[A]//The Psychology of Questions[M]. London: Routledge, 2017: 15-88.
[4]	Carter M. Minds and Computers: An Introduction to the Philosophy of Artificial Intelligence[M]. Edinburgh,UK: Edinburgh University Press, 2007.
[5]	Turing A M. Computing Machinery and Intelligence[A]//Parsing the Turing Test[M]. Dordrecht: Springer Netherlands, 2007: 23-65.
[6]	叶浩生. 身心二元论的困境与具身认知研究的兴起[J]. 心理科学, 2011, 34(4):999-1005.
[6]	(Ye Haosheng. The Dilemma of Dualism and the Rising of Embodied Cognition Programme[J]. Journal of Psychological Science, 2011, 34(4): 999-1005.)
[7]	Kiefer F. Morphology and Pragmatics[A]//The Handbook of Morphology[M]. Oxford, UK: Blackwell Publishing Ltd., 2017: 272-279.
[8]	Bhati R, Prasad S S. Open Domain Question Answering System Using Cognitive Computing[C]// Proceedings of 2016 6th International Conference-Cloud System and Big Data Engineering(Confluence). 2016: 34-39.
[9]	于晶. 基于社会化问答社区涌现模式分析的领域热点识别研究[J]. 情报学报, 2021, 40(2): 213-222.
[9]	(Yu Jing. Detection of Hotspot in Scientific Fields Based on Emerging Pattern Analysis of Social Q&A Community Contents[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(2): 213-222.)
[10]	Indurkhya N, Damerau F J. Handbook of Natural Language Processing[M]. Chapman and Hall/CRC, 2010.
[11]	Roberts K, Alam T, Bedrick S, et al. TREC-COVID: Rationale and Structure of an Information Retrieval Shared Task for COVID-19[J]. Journal of the American Medical Informatics Association, 2020, 27(9): 1431-1436. doi: 10.1093/jamia/ocaa091 pmid: 32365190
[12]	温有奎, 温浩, 乔晓东. 让知识产生智慧——基于人工智能的文本挖掘与问答技术研究[J]. 情报学报, 2019, 38(7): 722-730.
[12]	(Wen Youkui, Wen Hao, Qiao Xiaodong. Research on the Methods of Information Science and Artificial Intelligence Fusion Innovation[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(7): 722-730.)
[13]	Soares M A C, Parreiras F S. A Literature Review on Question Answering Techniques, Paradigms, and Systems[J]. Journal of King Saud University-Computer and Information Sciences, 2020, 32(6): 635-646. doi: 10.1016/j.jksuci.2018.08.005
[14]	Abacha A B, Zweigenbaum P. MEANS: A Medical Question-Answering System Combining NLP Techniques and Semantic Web Technologies[J]. Information Processing & Management, 2015, 51(5): 570-594. doi: 10.1016/j.ipm.2015.04.006
[15]	Abdi A, Idris N, Ahmad Z. QAPD: An Ontology-Based Question Answering System in the Physics Domain[J]. Soft Computing, 2018, 22(1): 213-230. doi: 10.1007/s00500-016-2328-2
[16]	Agarwal A, Sachdeva N, Yadav R K, et al. EDUQA: Educational Domain Question Answering System Using Conceptual Network Mapping[C]// Proceedings of 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing. 2019: 8137-8141.
[17]	Kourtin I, Mbarki S, Mouloudi A. A Legal Question Answering Ontology-Based System[C]// Proceedings of International Conference on Automatic Processing of Natural-Language Electronic Texts with NooJ. 2020: 218-229.
[18]	陈璟浩, 曾桢, 李纲. 基于知识图谱的“一带一路”投资问答系统构建[J]. 图书情报工作, 2020, 64(12): 95-105. doi: 10.13266/j.issn.0252-3116.2020.12.011
[18]	(Chen Jinghao, Zeng Zhen, Li Gang. A Question Answering System for “the Belt and Road” Investment Based on Knowledge Graph[J]. Library and Information Service, 2020, 64(12): 95-105.) doi: 10.13266/j.issn.0252-3116.2020.12.011
[19]	谭云丹. 科技政策智能问答系统架构及关键算法研究[D]. 重庆: 重庆邮电大学, 2020.
[19]	(Tan Yundan. Research on Architecture and Key Algorithm of Question Answering for Science and Technology Policies[D]. Chongqing: Chongqing University of Posts and Telecommunications, 2020.)
[20]	霍朝光, 钱毅, 祁天娇. 基于开放公文的新冠肺炎政策知识图谱构建与分析[J]. 档案学通讯, 2021(2): 53-62.
[20]	(Huo Chaoguang, Qian Yi, Qi Tianjiao. The Construction and Analysis of Epidemic Prevention Policy Knowledge Graph Based on Open Administrative Documents[J]. Archives Science Bulletin, 2021(2): 53-62.)
[21]	刘勘, 徐勤亚, 於陆. 面向营商环境的知识图谱构建研究[J]. 数据分析与知识发现, 2022, 6(4): 82-96.
[21]	(Liu Kan, Xu Qinya, Yu Lu. Constructing Knowledge Graph for Business Environment[J]. Data Analysis and Knowledge Discovery, 2022, 6(4): 82-96.)
[22]	武楷彪, 郎宇翔, 董瑜. 融合句法结构和词义信息的政策文本关联挖掘方法研究[J]. 数据分析与知识发现, 2022, 6(5): 20-33.
[22]	(Wu Kaibiao, Lang Yuxiang, Dong Yu. Mining Policy Text Relevance with Syntactic Structure and Semantic Information[J]. Data Analysis and Knowledge Discovery, 2022, 6(5): 20-33.)
[23]	Kryftis Y, Grammatikou M, Kalogeras D, et al. Policy-Based Management for Federation of Virtualized Infrastructures[J]. Journal of Network and Systems Management, 2017, 25(2): 229-252. doi: 10.1007/s10922-016-9390-z
[24]	李晗佶, 陈海庆. 机器翻译技术困境的哲学反思[J]. 大连理工大学学报(社会科学版), 2020, 41(6): 122-128.
[24]	(Li Hanji, Chen Haiqing. Philosophical Reflection on the Technical Dilemma of Machine Translation[J]. Journal of Dalian University of Technology(Social Sciences), 2020, 41(6): 122-128.)
[25]	Kreutzer R T, Sirrenberg M. Understanding Artificial Intelligence[M]. Cham: Springer International Publishing, 2020.
[26]	李超, 柴玉梅, 南晓斐, 等. 基于深度学习的问题分类方法研究[J]. 计算机科学, 2016, 43(12): 115-119. doi: 10.11896/j.issn.1002-137X.2016.12.020
[26]	(Li Chao, Chai Yumei, Nan Xiaofei, et al. Research on Problem Classification Method Based on Deep Learning[J]. Computer Science, 2016, 43(12): 115-119.) doi: 10.11896/j.issn.1002-137X.2016.12.020
[27]	Tomasello M. Cognitive Linguistics[A]//A Companion to Cognitive Science[M]. Oxford, UK: Blackwell Publishing Ltd., 2017: 477-487.
[28]	李金鹏, 张闯, 陈小军, 等. 自动文本摘要研究综述[J]. 计算机研究与发展, 2021, 58(1): 1-21.
[28]	(Li Jinpeng, Zhang Chuang, Chen Xiaojun, et al. Survey on Automatic Text Summarization[J]. Journal of Computer Research and Development, 2021, 58(1): 1-21.)
[29]	Ganesan K. ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks[OL]. arXiv Preprint, arXiv:1803.01937.
[30]	华斌, 吴诺, 贺欣. 基于知识融合的政务信息化项目多专家审批意见整合[J]. 数据分析与知识发现, 2021, 5(10): 124-136.
[30]	(Hua Bin, Wu Nuo, He Xin. Integrating Expert Reviews for Government Information Projects with Knowledge Fusion[J]. Data Analysis and Knowledge Discovery, 2021, 5(10): 124-136.)
[31]	王春城. 政策精准性与精准性政策——“精准时代”的一个重要公共政策走向[J]. 中国行政管理, 2018(1): 51-57.
[31]	(Wang Chuncheng. To Achieve Precision of Policy and Policy with Precision: A Significant Orientation of Public Policy in Precisiondepended Times[J]. Chinese Public Administration, 2018(1): 51-57.)
[32]	Sun Y, Wang S H, Li Y K, et al. ERNIE 2.0: A Continual Pre-training Framework for Language Understanding[OL]. arXiv Preprint, arXiv: 1907.12412.
[33]	马相东, 张文魁, 刘丁一. 地方政府招商引资政策的变迁历程与取向观察:1978—2021年[J]. 改革, 2021(8): 131-144.
[33]	(Ma Xiangdong, Zhang Wenkui, Liu Dingyi. The Changing Course and Orientation of Local Government’s Investment Promotion and Capital Introduction Policies: 1978—2021[J]. Reform, 2021(8): 131-144.)
[34]	国家质量监督检验检疫总局, 中国国家标准化管理委员会. 党政机关电子公文格式规范第1部分:公文结构: GB/T 33476.1—2016[S]. 北京: 中国标准出版社, 2016.
[34]	(General Administration of Quality Supervision, Inspection and Quarantine of the People’s Republic of China, Standardization Administration of the People’s Republic of China. Format Specification for Electronic Official Document of Party and Government Organs—Part 1: Official Document Structure: GB/T 33476.1—2016[S]. Beijing: Standards Press of China, 2016.)

[1]	刘春江, 李姝影, 胡汗林, 方曙. 图数据库在复杂网络分析中的研究与应用进展^*[J]. 数据分析与知识发现, 2022, 6(7): 1-11.
[2]	张晗, 安欣宇, 刘春鹤. 基于多源语义知识图谱的药物知识发现:以药物重定位为实证^*[J]. 数据分析与知识发现, 2022, 6(7): 87-98.
[3]	刘勘, 徐勤亚, 於陆. 面向营商环境的知识图谱构建研究^*[J]. 数据分析与知识发现, 2022, 6(4): 82-96.
[4]	张卫, 王昊, 陈玥彤, 范涛, 邓三鸿. 融合迁移学习与文本增强的中文成语隐喻知识识别与关联研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 167-183.
[5]	刘政昊, 钱宇星, 衣天龙, 吕华揆. 知识关联视角下金融证券知识图谱构建与相关股票发现^*[J]. 数据分析与知识发现, 2022, 6(2/3): 184-201.
[6]	程子佳, 陈翀. 面向流行性疾病科普的用户问题理解与答案内容组织^*[J]. 数据分析与知识发现, 2022, 6(2/3): 202-211.
[7]	侯党, 傅湘玲, 高嵩峰, 彭雷, 王友军, 宋美琦. 基于企业知识图谱的企业关联关系挖掘^*[J]. 数据分析与知识发现, 2022, 6(2/3): 212-221.
[8]	邓露,胡珀,李炫宏. 知识增强的生物医学文本生成式摘要研究^*[J]. 数据分析与知识发现, 2022, 6(11): 1-12.
[9]	周阳,李学俊,王冬磊,陈方,彭莉娟. 炸药配方设计知识图谱的构建与可视分析方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 42-53.
[10]	沈科杰, 黄焕婷, 化柏林. 基于公开履历数据的人物知识图谱构建*[J]. 数据分析与知识发现, 2021, 5(7): 81-90.
[11]	阮小芸,廖健斌,李祥,杨阳,李岱峰. 基于人才知识图谱推理的强化学习可解释推荐研究^*[J]. 数据分析与知识发现, 2021, 5(6): 36-50.
[12]	黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展^*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[13]	李贺,刘嘉宇,李世钰,吴迪,金帅岐. 基于疾病知识图谱的自动问答系统优化研究^*[J]. 数据分析与知识发现, 2021, 5(5): 115-126.
[14]	许光,任明,宋城宇. 西方媒体新闻中的中国经济形象提取^*[J]. 数据分析与知识发现, 2021, 5(5): 30-40.
[15]	代冰,胡正银. 基于文献的知识发现新近研究综述 ^*[J]. 数据分析与知识发现, 2021, 5(4): 1-12.

Viewed

Full text

Abstract

Cited

Shared

Discussed