Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (6): 26-37    DOI: 10.11925/infotech.2096-3467.2022.1000
Current Issue | Archive | Adv Search |
Construction and Verification of Type-Controllable Question Generation Model Based on Deep Learning and Knowledge Graphs
Wang Xiaofeng1,Sun Yujie2,Wang Huazhen2(),Zhang Hengzhang2
1The Academy of Chinese Language and Culture Education, Huaqiao University, Xiamen 361021, China
2College of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
Download: PDF (946 KB)   HTML ( 14
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This research aims to automatically generate questions, thereby reducing the workload of manual question generation. It also addresses the issues of uncontrollable question difficulty and limited question dimensions due to collaborative question generation. It encourages learners to engage in-deep reading comprehension with intelligent questions. [Methods] We proposed a question generation model based on Transformer and knowledge graph to automatically generate type-controllable questions. First, we input the knowledge graph into the Graph Transformer module of the TCQG (Type Controllable Question Generation) model for graph representation learning and obtained the subgraph vector. Then, we obtained matching external questions for each subgraph using similarity measures. Next, we input the parameters of 4MAT question type and those external questions into the BiLSTM network for externally enhanced vectors. Finally, we entered the subgraph vector and the externally enhanced vector into the Pointer-Generator Network of the TCQG model to generate questions. [Results] The TCQG model achieves better representation learning of the knowledge graph through the Graph Transformer. The BLEU value is 39.62 on the one-hop triple dataset. In evaluating “what is” questions, the BLEU score is 38.63. Both surpassed the baseline model. [Limitations] This research is limited by the types of questions and cannot cover all types of questions in human language. In addition, this research did not involve matching responses to the questions, which limits its real-world applications. [Conclusions] This research generates diverse, semantically rich, and naturally expressed questions needed in educational scenarios. It enables learners to benefit from the generated questions and engage in deeper reading comprehension.

Key wordsDeep Learning      Knowledge Graph      Type Controllable      Intelligent Question Generation     
Received: 21 September 2022      Published: 22 March 2023
ZTFLH:  TP391  
  G350  
Fund:Fundamental Research Funds for the Central Universities(17SKGC-QG13)
Corresponding Authors: Wang Huazhen,ORCID:0000-0002-6548-9957,E-mail: wanghuazhen@hqu.edu.cn。   

Cite this article:

Wang Xiaofeng, Sun Yujie, Wang Huazhen, Zhang Hengzhang. Construction and Verification of Type-Controllable Question Generation Model Based on Deep Learning and Knowledge Graphs. Data Analysis and Knowledge Discovery, 2023, 7(6): 26-37.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.1000     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I6/26

Type-Controllable Question Generation Mode Integrating Deep Learning and Knowledge Graph
问题类型 子类 问句 数据源 知识图谱多跳三元组
是何 不要和陌生人说话的制片人在2004年代表作品的男主角是谁? 思知 不要和陌生人说话->制片人->朱质冰->代表作品->中国式离婚(2004年)->男主角->陈道明
哪里 矽肺并发症的鉴别诊断的多发地区是哪里? CMeKG 矽肺->并发症->慢性阻塞性肺疾病->鉴别诊断->支气管肺->多发地区->北美
什么时候 北京机场大巴1线每天什么时间运营? 思知 北京机场大巴1线->运营时间->7:00-24:00
多少 信使号的探测目标的质量是多少? 思知 信使号->探测目标->水星->质量->3.3022×1023kg
是否 西游记中“唐三藏路阻火焰山,孙行者三调芭蕉扇”是第10集吗? 思知 《西游记》->故事名称->唐三藏路阻火焰山,孙行者三调芭蕉扇->集数->17
为何 为什么 为什么新疆火焰山那么热? 思知 新疆火焰山->地理位置->吐鲁番盆地->夏季最高气温->49.6度->属于->高温->原因->降水少,日照充足,地势低
如何 怎么样/怎么做 如何预防延迟性过敏反应的常见症状? CMeKG 延迟性过敏反应->常见症状->湿疹->预防措施->注意饮食调理,均衡营养,多注意个人卫生,做好过敏性皮炎的防治
若何 如果……,应该怎么做/怎么办 如果喝茶出现醉茶,应该怎么办? NLPCC-MH 喝茶->症状->醉茶->解决方法->吃糖果
Dataset Example
The Format of Encyclopedia Question and Answer Data
基线模型 模型说明
BiLSTM LSTM是RNN的一部分,BiLSTM是正向和反向LSTM的组合,两者都常用于自然语言处理任务中的上下文信息建模,是文本生成任务中的一种常用模型。
RNN+Attention 文本序列中的每个单词对生成任务的贡献是不相同的,存在一些无用词,因此引入Attention注意力机制衡量每个单词对分类任务的贡献大小。
RNN+Attention+Pointer-Generator Network 基于RNN + Attention模型引入Pointer-Generator Network复制机制,Pointer-Generator Network的作用是决定生成序列的输出从输入序列的元素中复制还是从词库中进行选择。
TCQG(ours) 本研究提出的模型,基于Graph Transformer进行知识图谱向量化表征,并将外部增强数据经过BiLSTM编码器进行特征表示。同时将Pointer-Generator Network与RAML奖励函数引入模型中,控制生成与知识点相关联的问句。
TCQG(A) (ours) 本模型用于消融实验,在TCQG模型的基础上将Graph Transformer模型更换为Graph Attention模型。Graph Attention在图神经网络中引入注意力机制,实现更好的邻居聚合。
TCQG(E)(ours) 本模型用于消融实验,在TCQG中的编码部分去除图谱向量表示模块,仅将实体以及关系输入BiLSTM中进行向量表示。
Baseline Model Settings
模型 BLEU-1(%) BLEU-2(%) BLEU-3(%) BLEU-4(%) Rouge-L(%)
Bi-LSTM 31.87 19.24 13.48 9.25 30.95
RNN + Attention 33.79 20.12 14.17 9.71 31.43
RNN + Attention + Pointer-Generator Network 34.48 21.15 14.61 10.44 32.68
TCQG(A) 35.58 21.35 13.74 10.26 33.58
TCQG(E) 36.63 20.97 14.47 10.67 33.75
TCQG 39.62 24.12 16.17 12.34 34.21
One-Hop Triple Dataset Evaluation Results
模型 BLEU-1(%) BLEU-2(%) BLEU-3(%) BLEU-4(%) Rouge-L(%)
Bi-LSTM 29.94 17.74 11.51 8.45 28.25
RNN + Attention 31.84 18.23 12.21 8.84 29.63
RNN + Attention + Pointer-Generator Network 32.54 19.35 12.81 9.69 30.18
TCQG(A) 33.72 19.85 12.91 9.48 30.48
TCQG(E) 34.41 19.48 12.97 9.75 30.85
TCQG 35.97 20.12 13.26 10.03 31.39
Two-Hop Triple Dataset Evaluation Results
模型 BLEU-1(%) BLEU-2(%) BLEU-3(%) BLEU-4(%) Rouge-L(%)
Bi-LSTM 27.84 16.41 10.17 7.85 25.19
RNN + Attention 29.76 17.23 10.91 7.97 26.47
RNN + Attention + Pointer-Generator Network 30.65 17.95 11.17 8.53 26.61
TCQG(A) 30.82 18.15 10.91 8.58 26.46
TCQG(E) 31.31 18.21 11.27 8.74 27.46
TCQG 32.04 18.87 12.13 8.93 28.45
Three-Hop Triple Dataset Evaluation Results
模型 BLEU-1 BLEU-2 BLEU-3 BLEU-4 Rouge-L
Bi-LSTM 32.52 20.52 14.48 10.62 32.21
RNN + Attention 34.82 21.39 15.17 10.89 32.74
RNN + Attention + Pointer-Generator Network 35.61 22.31 15.61 11.58 33.95
TCQG(A) 36.79 22.61 14.74 11.63 34.86
TCQG(E) 37.84 22.24 15.47 11.97 34.91
TCQG 38.63 23.87 16.32 12.69 35.87
The Evaluation Result of WHAT Questions
模型 BLEU-1 BLEU-2 BLEU-3 BLEU-4 Rouge-L
Bi-LSTM 30.17 17.91 11.93 8.65 28.95
RNN + Attention 31.94 18.82 12.56 8.97 29.91
RNN + Attention + Pointer-Generator Network 32.69 19.67 13.18 9.82 30.72
TCQG(A) 33.81 20.54 13.72 9.72 31.07
TCQG(E) 34.91 20.68 13.82 9.91 31.26
TCQG 36.31 20.93 14.25 10.14 31.94
The Evaluation Result of WHY Questions
模型 BLEU-1 BLEU-2 BLEU-3 BLEU-4 Rouge-L
Bi-LSTM 31.33 18.97 13.12 9.14 30.12
RNN + Attention 33.21 19.95 13.85 9.59 30.84
RNN+Attention+Pointer-Generator Network 34.11 20.71 14.42 10.31 31.68
TCQG(A) 34.91 21.04 13.61 10.17 33.17
TCQG(E) 35.15 20.71 14.09 10.59 33.21
TCQG 37.32 21.94 14.85 11.14 34.13
The Evaluation Result of HOW Questions
模型 BLEU-1 BLEU-2 BLEU-3 BLEU-4 Rouge-L
Bi-LSTM 26.71 15.72 9.42 7.07 24.10
RNN + Attention 28.59 16.09 10.13 7.19 25.58
RNN + Attention + Pointer-Generator Network 29.42 16.74 10.48 7.83 25.61
TCQG(A) 29.42 17.06 10.21 8.01 25.51
TCQG(E) 30.18 17.48 10.37 8.34 26.37
TCQG 31.11 18.19 11.95 8.61 27.90
The Evaluation Result of IF Questions
BLEU-1 Evaluation Results of the Model Under Different Question Types
样例编号 样例 样例编号 样例
样例1 模型:Bi-LSTM
输入图谱:柳江|||主要支流|||寨蒿河|||发源于|||贵州省黎平县高洋乡外部问句:晋江的发源地在什么地方?
问句类型:是何
参考问题:柳江的主要支流的发源地在哪里?
生成问题:柳江的寨蒿河的什么?
样例4 模型:TCQG(A)
输入图谱:柳江|||主要支流|||寨蒿河|||发源于|||贵州省黎平县高洋乡
外部问句:晋江的发源地在什么地方?
问句类型:是何
参考问题:柳江的主要支流的发源地在哪里?
生成问题:柳江的主要支流的发源于什么地方?
样例2 模型:RNN + Attention
输入图谱:柳江|||主要支流|||寨蒿河|||发源于|||贵州省黎平县高洋乡外部问句:晋江的发源地在什么地方?
问句类型:是何
参考问题:柳江的主要支流的发源地在哪里?
生成问题:柳江的主要支流寨蒿河发源于?
样例5 模型:TCQG(E)
输入图谱:柳江|||主要支流|||寨蒿河|||发源于|||贵州省黎平县高洋乡
外部问句:晋江的发源地在什么地方?
问句类型:是何
参考问题:柳江的主要支流的发源地在哪里?
生成问题:柳江的主要支流寨蒿河的发源于什么地方?
样例3 模型:RNN + Attention + Pointer-Generator Network
输入图谱:柳江|||主要支流|||寨蒿河|||发源于|||贵州省黎平县高洋乡外部问句:晋江的发源地在什么地方?
问句类型:是何
参考问题:柳江的主要支流的发源地在哪里?
生成问题:柳江的寨蒿河在哪里?
样例6 模型:TCQG
输入图谱:柳江|||主要支流|||寨蒿河|||发源于|||贵州省黎平县高洋乡外部问句:晋江的发源地在什么地方?
问句类型:是何
参考问题:柳江的主要支流的发源地在哪里?
生成问题:柳江的主要支流的发源地在什么地方?
Sample Questions Generated by the Model
[1] 刘明, 张津旭, 吴忠明. 智能提问技术及其教育应用[J]. 人工智能, 2022(2): 30-38.
[1] (Liu Ming, Zhang Jinxu, Wu Zhongming. Intelligent Questioning Techniques and Educational Applications[J]. Artificial Intelligence, 2022(2): 30-38.)
[2] Serban I V, Sordoni A, Lowe R, et al. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. New York: ACM, 2017: 3295-3301.
[3] Du X, Shao J, Cardie C. Learning to Ask: Neural Question Generation for Reading Comprehension[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1342-1352.
[4] Wang Y S, Liu C Y, Huang M L, et al. Learning to Ask Questions in Open-Domain Conversational Systems with Typed Decoders[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. ACL, 2018: 2193-2203.
[5] 鲍军威. 基于知识的自动问答与问题生成的研究[D]. 哈尔滨: 哈尔滨工业大学, 2019.
[5] (Bao Junwei. Research on Knowledge-Based Question Answering and Question Generation[D]. Harbin: Harbin Institute of Technology, 2019.)
[6] Chan Y H, Fan Y C. A Recurrent BERT-Based Model for Question Generation[C]// Proceedings of the 2nd Workshop on Machine Reading for Question Answering. 2019: 154-162.
[7] Indurthi S R, Raghu D, Khapra M M, et al. Generating Natural Language Question-Answer Pairs from a Knowledge Graph Using a RNN Based Question Generation Model[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 2017: 376-385.
[8] Koncel-Kedziorski R, Bekal D, Luan Y, et al. Text Generation from Knowledge Graphs with Graph Transformers[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 2284-2293.
[9] Kumar V, Hua Y C, Ramakrishnan G, et al. Difficulty-Controllable Multi-Hop Question Generation from Knowledge Graphs[C]// Proceedings of the 18th International Semantic Web Conference. 2019: 382-398.
[10] McCarthy B. About Teaching: 4MAT in the Classroom[M]. About Learning Inc., 2000.
[11] See A, Liu P J, Manning C D. Get to the Point: Summarization with Pointer-Generator Networks[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1073-1083.
[12] Papineni K, Roukos S, Ward T, et al. BLEU: A Method for Automatic Evaluation of Machine Translation[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. New York: ACM, 2002: 311-318.
[13] Lin C Y. Rouge: A Package for Automatic Evaluation of Summaries[C]// Proceedings of the Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL. 2004.
[14] de Boer P T, Kroese D P, Mannor S, et al. A Tutorial on the Cross-Entropy Method[J]. Annals of Operations Research, 2005, 134(1): 19-67.
doi: 10.1007/s10479-005-5724-z
[15] Chen X Y, Wu Z S, Hong M Y. Understanding Gradient Clipping in Private SGD: A Geometric Perspective[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 13773-13782.
[16] Pennington J, Socher R, Manning C D. GloVe: Gloval Vectors for Word Representation[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
[17] Sha L, Mou L L, Liu T Y, et al. Order-Planning Neural Text Generation from Structured Data[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018: 5414-5421.
[1] Wu Jialun,Zhang Ruonan,Kang Wulin,Yuan Puwei. Deep Learning Model of Drug Recommendation Based on Patient Similarity Analysis[J]. 数据分析与知识发现, 2023, 7(6): 148-160.
[2] Wang Nan,Wang Qi. Evaluating Student Engagement with Deep Learning[J]. 数据分析与知识发现, 2023, 7(6): 123-133.
[3] Liu Yang, Zhang Wen, Hu Yi, Mao Jin, Huang Fei. Hotel Stock Prediction Based on Multimodal Deep Learning[J]. 数据分析与知识发现, 2023, 7(5): 21-32.
[4] Huang Xuejian, Ma Tinghuai, Wang Gensheng. Detecting Weibo Rumors Based on Hierarchical Semantic Feature Learning Model[J]. 数据分析与知识发现, 2023, 7(5): 81-91.
[5] Li Kaijun, Niu Zhendong, Shi Kaize, Qiu Ping. Paper Recommendation Based on Academic Knowledge Graph and Subject Feature Embedding[J]. 数据分析与知识发现, 2023, 7(5): 48-59.
[6] Wang Yinqiu, Yu Wei, Chen Junpeng. Automatic Question-Answering in Chinese Medical Q & A Community with Knowledge Graph[J]. 数据分析与知识发现, 2023, 7(3): 97-109.
[7] Du Yue, Chang Zhijun, Dong Mei, Qian Li, Wang Ying. Constructing Large-scale Knowledge Graph for Massive Sci-Tech Literature[J]. 数据分析与知识发现, 2023, 7(2): 141-150.
[8] Zhang Zhengang, Yu Chuanming. Knowledge Graph Completion Model Based on Entity and Relation Fusion[J]. 数据分析与知识发现, 2023, 7(2): 15-25.
[9] Shen Lining, Yang Jiayi, Pei Jiaxuan, Cao Guang, Chen Gongzheng. A Fine-Grained Sentiment Recognition Method Based on OCC Model and Triggering Events[J]. 数据分析与知识发现, 2023, 7(2): 72-85.
[10] Wang Weijun, Ning Zhiyuan, Du Yi, Zhou Yuanchun. Identifying Interdisciplinary Sci-Tech Literature Based on Multi-Label Classification[J]. 数据分析与知识发现, 2023, 7(1): 102-112.
[11] Peng Cheng, Zhang Chunxia, Zhang Xin, Guo Jingtao, Niu Zhendong. Reasoning Model for Temporal Knowledge Graph Based on Entity Multiple Unit Coding[J]. 数据分析与知识发现, 2023, 7(1): 138-149.
[12] Xiao Yuhan, Lin Huiping. Mining Differentiated Demands with Aspect Word Extraction: Case Study of Smartphone Reviews[J]. 数据分析与知识发现, 2023, 7(1): 63-75.
[13] Cheng Quan, She Dexin. Drug Recommendation Based on Graph Neural Network with Patient Signs and Medication Data[J]. 数据分析与知识发现, 2022, 6(9): 113-124.
[14] Zhang Han, An Xinyu, Liu Chunhe. Building Multi-Source Semantic Knowledge Graph for Drug Repositioning[J]. 数据分析与知识发现, 2022, 6(7): 87-98.
[15] Liu Chunjiang, Li Shuying, Hu Hanlin, Fang Shu. Graph Databases for Complex Network Analysis[J]. 数据分析与知识发现, 2022, 6(7): 1-11.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn