Mining Policy Text Relevance with Syntactic Structure and Semantic Information
Wu Kaibiao,Lang Yuxiang,Dong Yu()
National Science Library, Chinese Academy of Sciences, Beijing 100190, China; Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
[Objective] This paper proposes a new method to analyze policy text relevance, aiming to retrieve more in-depth semantic information. [Methods] First, we built a new algorithm combining the dependency parsing analysis and word embedding model. Then, we analyzed the semantic relevance of policy texts from the perspective of sentence and word meaning information. Our method fully utilized the language characteristics of the policy texts to establish the extraction rules for dependency syntax. [Results] For test dataset with a relatively low degree of policy text association, our new algorithm’s F1 value reached 0.857, which was 22.78% higher than the algorithm fusing TF-IDF and cosine similarity. We also described policy text relevance with the subtle word differences. [Limitations] For semantic inforamiton mining, more research is needed to train word vector models for specific policy domains to further improve their accuracy. In sentence information mining, the accuracy of existing dependency syntactic analysis tools could be improved. [Conclusions] The proposed algorithm could effectively reveal the policy text association, as well as bring new research perspectives and tools for quantitative research on policy texts.
( Zhang Ruhao. Empirical Study of a Semantic and Proximity-Based Author Co-citation Analysis Method[J]. Library and Information Service, 2020, 64(8): 111-124.)
( Ma Feicheng, Li Xiaoyu, Zhang Bin. Analysis on the Structure, Function and Evolution of China’s Internet Content Regulation Regime[J]. Journal of the China Society for Scie.pngic and Technical Information, 2013, 32(11): 1124-1137.)
( Huang Cui, Zhao Peiqiang, Li Jiang. Research on China’s Science and Technology Policy Changes Based on Co-word Cluster Analysis[J]. Chinese Public Administration, 2015(9): 115-122.)
( Lang Mei. The Matching Degree Between Function of Local Government and Central Government under Big Data Perspective: A Research Based on the LDA Model of Gansu Province[J]. Journal of Intelligence, 2018, 37(9): 78-85.)
( Zhang Tao, Ma Haiqun. Comparative Study on A.pngicial Intelligence Policies in China Based on Text Similarity Computation[J]. Journal of Intelligence, 2021, 40(1): 39-47, 24.)
( Liu Heqing, Liang Yucheng. The Influence Mechanism of Policy Reproduction in China—A Study Based on Rural Policy Documents[J]. Sociological Studies, 2021, 36(1): 115-136.)
( Liu Gang, Fu Weiping, Ma Yingge. Research on the Evolution Mechanism of Policy Blood Network Based on Semantic[J]. Journal of Chinese Information Processing, 2018, 32(5):114-127.)
[13]
马莺歌. 基于语义的政策血缘网络演化机理研究[D]. 哈尔滨: 哈尔滨工程大学, 2015.
[13]
( Ma Yingge. Research on the Evolution Mechanism of Policy Blood Network Based on Semantic[D]. Harbin: Harbin Engineering University, 2015.)
( Wu Zuoyan, Wang Yu. New Measure of Sentences Similarity Based on Hierarchical Network of Concepts Theory and Dependency Parsing[J]. Computer Engineering and Applications, 2014, 50(3): 97-102.)
( Li Bin, Liu Ting, Qin Bing, et al. Chinese Sentence Similarity Computing Based on Semantic Dependency Relationship Analysis[J]. Application Research of Computers, 2003, 20(12): 15-17.)
( Deng Han, Zhu Xinhua, Li Qi, et al. Sentence Similarity Calculation Based on Syntactic Structure and Modifier[J]. Computer Engineering, 2017, 43(9): 240-244.)
( Shao Wei, Hua Bolin. Unsupervised Construction of Thesaurus in the Science and Technology Policy Based on Dependency Syntax Analysis[J]. Technology Intelligence Engineering, 2020, 6(6): 33-44.)
[19]
Mihalcea R, Corley C, Strapparava C. Corpus-Based and Knowledge-Based Measures of Text Semantic Similarity[C]// Proceedings of the 21st National Conference on A.pngicial Intelligence. 2006: 775-780.
[20]
来斯惟. 基于神经网络的词和文档语义向量表示方法研究[D]. 北京: 中国科学院大学, 2016.
[20]
( Lai Siwei. Word and Document Embeddings Based on Neural Network Approaches[D]. Beijing: University of Chinese Academy of Sciences, 2016.)
[21]
Levenshtein V. Binary Codes Capable of Correcting Deletions, Insertions, and Reversals[J]. Soviet Physics Doklady, 1965, 10: 707-710.
[22]
Melamed I D. Automatic Evaluation and Uniform Filter Cascades for Inducing n-Best Translation Lexicons[OL]. arXiv Preprint, arXiv: cmp-lg/9505044.
[23]
Kondrak G. N-gram Similarity and Distance[C]// Proceedings of International Symposium on String Processing and Information Retrieval.Springer, 2005: 115-126.
[24]
Smith T F, Waterman M S. Ide.pngication of Common Molecular Subsequences[J]. Journal of Molecular Biology, 1981, 147(1): 195-197.
pmid: 7265238
[25]
Wilkerson J, Smith D, Stramp N. Tracing the Flow of Policy Ideas in Legislatures: A Text Reuse Approach[J]. American Journal of Political Science, 2015, 59(4): 943-956.
doi: 10.1111/ajps.12175
[26]
Linder F, Desmarais B, Burgess M, et al. Text as Policy: Measuring Policy Similarity Through Bill Text Reuse[J]. Policy Studies Journal, 2020, 48(2): 546-574.
doi: 10.1111/psj.12257
[27]
Li S, Zhao Z, Hu R F, et al. Analogical Reasoning on Chinese Morphological and Semantic Relations[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 138-143.
( Zhu Xinhua, Ma Runcong, Sun Liu, et al. Word Semantic Similarity Computation Based on HowNet and CiLin[J]. Journal of Chinese Information Processing, 2016, 30(4): 29-36.)
( Xinhua News Agency. Proposals of the Central Committee of the Communist Party of China on Formulating the Fourteenth Five-Year Plan for National Economic and Social Development and the Long-term Goals for 2035[EB/OL]. (2020-11-03). [2021-04-20]. http://www.gov.cn/zhengce/2020-11/03/content_5556991.htm.)
( Notice of the People’s Government of Guangdong Province on Issuing the Development Plan for the New Generation of A.pngicial Intelligence in Guangdong Province[EB/OL]. (2018-08-10). [2021-04-20]. http://www.gd.gov.cn/gkmlpt/content/0/147/post_147108.html#7.)
( Notice of the General Office of the Shanghai Municipal People’s Government on Issuing the “Implementation Opinions on Promoting the Development of New Generation A.pngicial Intelligence”[EB/OL]. (2017-10-26). [2021-04-20]. https://www.shanghai.gov.cn/nw42639/20200823/0001-42639_54242.html.)