Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (5): 20-33    DOI: 10.11925/infotech.2096-3467.2021.0606
Mining Policy Text Relevance with Syntactic Structure and Semantic Information
Wu Kaibiao,Lang Yuxiang,Dong Yu()
National Science Library, Chinese Academy of Sciences, Beijing 100190, China; Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
[Objective] This paper proposes a new method to analyze policy text relevance, aiming to retrieve more in-depth semantic information. [Methods] First, we built a new algorithm combining the dependency parsing analysis and word embedding model. Then, we analyzed the semantic relevance of policy texts from the perspective of sentence and word meaning information. Our method fully utilized the language characteristics of the policy texts to establish the extraction rules for dependency syntax. [Results] For test dataset with a relatively low degree of policy text association, our new algorithm’s F1 value reached 0.857, which was 22.78% higher than the algorithm fusing TF-IDF and cosine similarity. We also described policy text relevance with the subtle word differences. [Limitations] For semantic inforamiton mining, more research is needed to train word vector models for specific policy domains to further improve their accuracy. In sentence information mining, the accuracy of existing dependency syntactic analysis tools could be improved. [Conclusions] The proposed algorithm could effectively reveal the policy text association, as well as bring new research perspectives and tools for quantitative research on policy texts.

Key wordsPolicy Text Relevance      Dependency Parsing      Word Embedding     
Received: 20 June 2021      Published: 21 June 2022
ZTFLH:  D630  
Fund:Project of Literature and Information Capacity Building, Chinese Academy of Sciences(Y9290002)
Corresponding Authors: Dong Yu,ORCID:0000-0001-9006-5462

Cite this article:

Wu Kaibiao, Lang Yuxiang, Dong Yu. Mining Policy Text Relevance with Syntactic Structure and Semantic Information. Data Analysis and Knowledge Discovery, 2022, 6(5): 20-33.

Experimental Design
Schematic Diagram of Dependency Parsing Structure
Classification of Test Sets for Policy Text Relevance Verification
Schematic Diagram of Policy Text Relevance Mining Process
Heat Map of Test Dataset Similarity
Algorithm Performance Trend with Similarity Threshold
关联计算方法 最优相似度值 P R F1值
本文方法 0.345 0.875 0.840 0.857
基于TF-IDF和余弦相似度的方法 0.265 0.633 0.780 0.698
Experimental Results Between the Proposed Method with Conventional Topic Model
Schematic Diagram of the Test Dataset Related with Word “Talent”
Schematic Diagram of Policy Text Association Mining Based on Comparison of Word Forms
