|
|
Constructing Degree Lexicon for STI Policy Texts |
Zheng Xinman,Dong Yu() |
National Science Library, Chinese Academy of Sciences, Beijing 100190, China Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China |
|
|
Abstract [Objective] This paper constructs a sentiment lexicon for STI policy texts, aiming to identify and quantify the embedded attitudes of policy makers. It tries to address the issues of existing studies, which ignore the semantic intensity of words. [Methods] First, we summarized the characteristics of policy texts and proposed a method to construct degree lexicon. This lexicon chose seed words from expert knowledge, expanded domain degree words with the PMI algorithm, and screened these words with Tongyi Cilin. Finally, we combined the TextRank algorithm with the new lexicon and conducted an experimental validation. [Results] The constructed degree lexicon yielded better results in policy text analysis than the traditional single text mining algorithm. [Limitations] The weights of our lexicon needs to be refined. [Conclusions] The degree words in STI policy texts are abundant, standardized and stable. The new lexicon can effectively utilize degree words, and learn more semantic features of policy texts.
|
Received: 18 February 2021
Published: 30 June 2021
|
|
Fund:Project of Literature and Information Capacity Building, Chinese Academy of Sciences(Y9290002) |
Corresponding Authors:
Dong Yu,ORCID:0000-0001-9006-5462
E-mail: dongy@mail.las.ac.cn
|
[1] |
李钢. 公共政策内容分析方法:理论与应用[M]. 重庆: 重庆大学出版社, 2007.
|
[1] |
(Li Gang. Methods of Public Policy Content Analysis: Theory and Applications[M]. Chongqing: Chongqing University Press, 2007.)
|
[2] |
郑新曼, 董瑜. 政策文本量化研究的综述与展望[J]. 现代情报, 2021, 41(2): 168-177.
|
[2] |
(Zheng Xinman, Dong Yu. Review on Quantitative Analysis of Political Texts[J]. Journal of Modern Information, 2021, 41(2): 168-177.)
|
[3] |
魏韡, 向阳, 陈千. 中文文本情感分析综述[J]. 计算机应用, 2011, 31(12): 3321-3323.
|
[3] |
(Wei Wei, Xiang Yang, Chen Qian. Survey on Chinese Text Sentiment Analysis[J]. Journal of Computer Applications, 2011, 31(12): 3321-3323.)
|
[4] |
赵妍妍, 秦兵, 石秋慧, 等. 大规模情感词典的构建及其在情感分类中的应用[J]. 中文信息学报, 2017, 31(2): 187-193.
|
[4] |
(Zhao Yanyan, Qin Bing, Shi Qiuhui, et al. Large-Scale Sentiment Lexicon Collection and Its Application in Sentiment Classification[J]. Journal of Chinese Information Processing, 2017, 31(2): 187-193.)
|
[5] |
符淮青. 现代汉语词汇[M]. 第2版. 北京: 北京大学出版社, 2004.
|
[5] |
(Fu Huaiqing. Modern Chinese Word[M]. The 2nd Edition. Beijing: Peking University Press, 2004.)
|
[6] |
朴镇秀. 现代汉语形容词的量研究[D]. 上海:复旦大学, 2009.
|
[6] |
(Piao Zhenxiu . Study of Quantity in Modern Chinese Adjectives[D]. Shanghai: Fudan University, 2009.)
|
[7] |
吕文杰. 现代汉语程度范畴表达方式研究[D]. 长春:吉林大学, 2013.
|
[7] |
(Lü Wenjie. A Study on Expressions of Degree Category in Modern Chinese[D]. Changchun: Jilin University, 2013.)
|
[8] |
张国宪. 形容词的记量[J]. 世界汉语教学, 1996, 10(4): 35-44.
|
[8] |
(Zhang Guoxian. Quantitative Measurement of Chinese Adjectives[J]. Chinese Teaching in the World, 1996, 10(4): 35-44.)
|
[9] |
朱德熙. 现代汉语语法研究[M]. 北京: 商务印书馆出版社, 1985.
|
[9] |
(Zhu Dexi. Study on Modern Chinese Grammar[M]. Beijing: The Commercial Press, 1985.)
|
[10] |
蔺璜, 郭姝慧. 程度副词的特点范围与分类[J]. 山西大学学报(哲学社会科学版), 2003, 26(2): 71-74.
|
[10] |
(Lin Huang, Guo Shuhui. On the Characteristics, Range and Classification of Adverbs of Degree[J]. Journal of Shanxi University(Philosophy and Social Sciences), 2003, 26(2): 71-74.)
|
[11] |
刘平. 现代汉语程度副词及程度副词结构研究[D]. 武汉:武汉大学, 2011.
|
[11] |
(Liu Ping. A Study on Adverbs of Degree and Their Structures in Modern Chinese[D]. Wuhan: Wuhan University, 2011.)
|
[12] |
李宇明. 程度与否定[J]. 世界汉语教学, 1999, 13(1): 29-36.
|
[12] |
(Li Yuming. Adverbs of Degree and Negation Particles[J]. Chinese Teaching in the World, 1999, 13(1): 29-36.)
|
[13] |
赵国军. 现代汉语变量表达研究[D]. 上海:华东师范大学, 2008.
|
[13] |
(Zhao Guojun. On Expression of Interchange Between Quantity Subcategories in Modern Chinese[D]. Shanghai: East China Normal University, 2008.)
|
[14] |
罗琼鹏. 等级性、量级结构与汉语性质形容词分类[J]. 汉语学习, 2018 (1): 27-38.
|
[14] |
(Luo Qiongpeng. Gradability, Scale Structure and Classification of Simple Adjectives in Chinese[J]. Chinese Language Learning, 2018 (1): 27-38.)
|
[15] |
鲁英. 政治语篇中的人际元话语研究: 以2012年《国务院政府工作报告》为个案[J]. 外语学刊, 2012 (5): 52-55.
|
[15] |
(Lu Ying. On Interactional Metadiscourse in Political Texts: A Case Study of Report on the Work of Government (2012)[J]. Foreign Language Research, 2012(5): 52-55.)
|
[16] |
陈涛涛. 党政机关公文写作处理:规则方法与范本[M]. 北京: 中国法制出版社, 2014.
|
[16] |
(Chen Taotao. The Writing and Processing of Official Documents for Party and Government Organs: Rules, Methods and Models[M]. Beijing: China Legal Publishing House, 2014.)
|
[17] |
徐音华. 从改革开放以来的《国务院政府工作报告》看我国公文词汇的衍变[D]. 成都: 四川师范大学, 2012.
|
[17] |
(Xu Yinhua. From the Reform and Opening up of the “State Government Work Report” to See Our Documents Vocabulary Evolution[D]. Chengdu: Sichuan Normal University, 2012.)
|
[18] |
李朦. 现代命令体公文语言研究[D]. 成都:四川师范大学, 2013.
|
[18] |
(Li Meng. A Study on the Language of Contemporary Injunctive Documents[D]. Chengdu: Sichuan Normal University, 2013.)
|
[19] |
王国璋. 汉语褒贬义词语用法词典[M]. 北京: 华语教学出版社, 2001.
|
[19] |
(Wang Guozhang. A Dictionary of Chinese Praise and Blame Words[M]. Beijing: Sinolingua, 2001.)
|
[20] |
HowNet. OpenHowNet’s Home Page[EB/OL].[2021-06-18]. https://openhownet.thunlp.org/about_hownet.
|
[21] |
李枫林, 范雅娴. 领域情感词典构建方法研究[J]. 图书馆理论与实践, 2019 (12): 60-65, 112.
|
[21] |
(Li Fenglin, Fan Yaxian. Research on Construction Method of Domain Sentiment Lexicon[J]. Library Theory and Practice, 2019 (12): 60-65, 112.)
|
[22] |
敦欣卉, 张云秋, 杨铠西. 基于微博的细粒度情感分析[J]. 数据分析与知识发现, 2017, 1(7): 61-72.
|
[22] |
(Dun Xinhui, Zhang Yunqiu, Yang Kaixi. Fine-grained Sentiment Analysis Based on Weibo[J]. Data Analysis and Knowledge Discovery, 2017, 1(7): 61-72.)
|
[23] |
Wu F Z, Huang Y F, Song Y Q, et al. Towards Building a High-quality Microblog-specific Chinese Sentiment Lexicon[J]. Decision Support Systems, 2016, 87: 39-49.
doi: 10.1016/j.dss.2016.04.007
|
[24] |
沈艳, 陈赟, 黄卓. 文本大数据分析在经济学和金融学中的应用:一个文献综述[J]. 经济学(季刊), 2019, 18(4): 1153-1186.
|
[24] |
(Shen Yan, Chen Yun, Huang Zhuo. A Literature Review of Textual Analysis in Economic and Financial Research[J]. China Economic Quarterly, 2019, 18(4): 1153-1186.)
|
[25] |
胡家珩, 岑咏华, 吴承尧. 基于深度学习的领域情感词典自动构建: 以金融领域为例[J]. 数据分析与知识发现, 2018, 2(10): 95-102.
|
[25] |
(Hu Jiaheng, Cen Yonghua, Wu Chengyao. Constructing Sentiment Dictionary with Deep Learning: Case Study of Financial Data[J]. Data Analysis and Knowledge Discovery, 2018, 2(10): 95-102.)
|
[26] |
汪昌云, 武佳薇. 媒体语气、投资者情绪与IPO定价[J]. 金融研究, 2015 (9): 174-189.
|
[26] |
(Wang Changyun, Wu Jiawei. Media Tone, Investor Sentiment and IPO Pricing[J]. Journal of Financial Research, 2015 (9): 174-189.)
|
[27] |
蒋翠清, 郭轶博, 刘尧. 基于中文社交媒体文本的领域情感词典构建方法研究[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
|
[27] |
(Jiang Cuiqing, Guo Yibo, Liu Yao. Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text[J]. Data Analysis and Knowledge Discovery, 2019, 3(2): 98-107.)
|
[28] |
徐琳宏, 丁堃, 陈娜, 等. 中文文献引文情感语料库构建[J]. 情报学报, 2020, 39(1): 25-37.
|
[28] |
(Xu Linhong, Ding Kun, Chen Na, et al. Corpus Construction for Citation Sentiment in Chinese Literature[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(1): 25-37.)
|
[29] |
Li J, Chen Y, Shen Y, et al. Measuring China’s Stock Market Sentiment[J/OL]. SSRN Electronic Journal, 2019. DOI: 10.2139/ssrn.3377684.
doi: 10.2139/ssrn.3377684
|
[30] |
蒋健. 文本分类中特征提取和特征加权方法研究[D]. 重庆:重庆大学, 2010.
|
[30] |
(Jiang Jian. Study on Feature Selection and Feature Weighting of Text Classification[D]. Chongqing: Chongqing University, 2010.)
|
[31] |
Hoberg G, Phillips G. Text-Based Network Industries and Endogenous Product Differentiation[J]. Journal of Political Economy, 2016, 124(5): 1423-1465.
doi: 10.1086/688176
|
[32] |
李婷婷, 姬东鸿. 基于SVM和CRF多特征组合的微博情感分析[J]. 计算机应用研究, 2015, 32(4): 978-981.
|
[32] |
(Li Tingting, Ji Donghong. Sentiment Analysis of Micro-blog Based on SVM and CRF Using Various Combinations of Features[J]. Application Research of Computers, 2015, 32(4): 978-981.)
|
[33] |
曾子明, 杨倩雯. 基于LDA和AdaBoost多特征组合的微博情感分析[J]. 数据分析与知识发现, 2018, 2(8): 51-59.
|
[33] |
(Zeng Ziming, Yang Qianwen. Sentiment Analysis for Micro-blogs with LDA and AdaBoost[J]. Data Analysis and Knowledge Discovery, 2018, 2(8): 51-59.)
|
[34] |
王科, 夏睿. 情感词典自动构建方法综述[J]. 自动化学报, 2016, 42(4): 495-511.
|
[34] |
(Wang Ke, Xia Rui. A Survey on Automatical Construction Methods of Sentiment Lexicons[J]. Acta Automatica Sinica, 2016, 42(4): 495-511.)
|
[35] |
钮菊生. 论现代公共政策的功能与特点[J]. 江海学刊, 2001(5): 71-75.
|
[35] |
(Niu Jusheng. On the Function and Features of Modern Public Policy[J]. Jianghai Academic Journal (Bimonthly), 2001(5): 71-75.)
|
[36] |
张宝建, 李鹏利, 陈劲, 等. 国家科技创新政策的主题分析与演化过程: 基于文本挖掘的视角[J]. 科学学与科学技术管理, 2019, 40(11): 15-31.
|
[36] |
(Zhang Baojian, Li Pengli, Chen Jin, et al. Thematic Analysis and Evolution Process of National Science and Technology Innovation Policy: Based on the Perspective of Text Mining[J]. Science of Science and Management of S. & T., 2019, 40(11): 15-31.)
|
[37] |
尹均生. 中国写作学大辞典[M]. 北京: 中国检察出版社, 1998.
|
[37] |
(Yin Junsheng. Dictionary of Chinese Writing[M]. Beijing: China Procuratorial Press, 1998.)
|
[38] |
杨正联. 公共政策文本解读的方法论[J]. 理论探讨, 2007(4): 143-147.
|
[38] |
(Yang Zhenglian. Methodologies for the Interpretation of Public Policy Texts[J]. Theoretical Investigation, 2007(4): 143-147.)
|
[39] |
Carvalho A, Pinto-Coelho Z, Seixas E. Listening to the Public -Enacting Power: Citizen Access, Standing and Influence in Public Participation Discourses[J]. Journal of Environmental Policy & Planning, 2019, 21(5): 563-576.
|
[40] |
Turney P D, Littman M L. Measuring Praise and Criticism: Inference of Semantic Orientation from Association[J]. ACM Transactions on Information Systems, 2003, 21(4): 315-346.
doi: 10.1145/944012.944013
|
[41] |
梅家驹. 同义词词林[M]. 上海: 上海辞书出版社, 1983.
|
[41] |
(Mei Jiaju. Tongyici Cilin[M]. Shanghai: Shanghai Lexicographic Publishing House, 1983.)
|
[42] |
田久乐, 赵蔚. 基于同义词词林的词语相似度计算方法[J]. 吉林大学学报(信息科学版), 2010, 28(6): 602-608.
|
[42] |
(Tian Jiule, Zhao Wei. Words Similarity Algorithm Based on Tongyici Cilin in Semantic Web Adaptive Learning System[J]. Journal of Jilin University(Information Science Edition), 2010, 28(6): 602-608.)
|
[43] |
李鸿儒. 定性研究中的信度和效度[D]. 哈尔滨:哈尔滨工程大学, 2009.
|
[43] |
(Li Hongru. Reliability and Validity in Qualitative Research[D]. Harbin: Harbin Engineering University, 2009.)
|
[44] |
王静. 著名专家解读《“十三五”国家科技创新规划》——刘德培、吕薇、薛澜的阐释[EB/OL].[2020-11-10]. http://news.sciencenet.cn/htmlnews/2016/8/353201.shtm.
|
[44] |
(Wang Jing. Renowned Experts Interpret the 13th Five-Year Plan for National Science and Technology Innovation-Explanations[EB/OL].[2020-11-10]. http://news.sciencenet.cn/htmlnews/2016/8/353201.shtm.)
|
[45] |
中国政府网国务院政策文件库[DB/OL].[2020-11-10]. http://www.gov.cn/zhengce/zhengcewenjianku/index.htm.
|
[45] |
(Chinese Government Network State Council Policy Document Database [DB/OL].[2020-11-10]. http://www.gov.cn/zhengce/zhengcewenjianku/index.htm.)
|
[46] |
Mihalcea R, Tarau P. TextRank: Bringing Order into Text [C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. ACL, 2004: 404-411.
|
[47] |
北京市科学技术委员会. 《北京市“十三五”时期加强全国科技创新中心建设规划》解读[EB/OL].[2020-11-20].http://kw.beijing.gov.cn/art/2016/10/9/art_2410_57010.html.
|
[47] |
(Beijing Municipal Science & Technnology Commission. Interpretation of Beijing’s 13th Five-Year Plan for Strengthening the Construction of a National Science and Technology Innovation Centre [EB/OL].[2020-11-20]. http://kw.beijing.gov.cn/art/2016/10/9/art_2410_57010.html.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|