|
|
Domain Ambiguous Collocation Dictionary for Real-Time Financial Sentimental Analysis |
Zhao Youlin1,2(),Xu Jingnan1,Lu Yingjun3 |
1Business School, Hohai University, Nanjing 211100, China 2School of Information Management, Nanjing University, Nanjing 210023, China 3School of Information Management, Wuhan University, Wuhan 430072, China |
|
|
Abstract [Objective] This study tries to address the problem of inaccurate sentiment analysis due to ignoring the dynamic polarity in ambiguous words. It aims to effectively identify sentiment-ambiguous words with economic characteristics and their collocations. [Methods] The study takes dynamic financial news information as the research object. First, we calculated the positive and negative sentiment scores of words in phrases to extract ambiguous seed words. Then, we retrieved their strongly related collocations with algorithms such as association rules and PMI. Third, we labeled the sentiment polarity of collocation pairs to build an ambiguous collocation lexicon. Finally, we measured the performance of sentiment mining on real-time updated news texts from a dynamic perspective. [Results] The accuracy, recall, and F-value of the sentiment analysis of the financial information text were 89.62%, 87.52%, and 88.57%, respectively, which were 5.79%, 15.89%, and 10.84% higher than the traditional models. [Limitations] Some collocation words cannot be identified due to their significant distance from the seed words. [Conclusions] The ambiguous collocation dictionary constructed in this paper effectively expands the sentiment lexicon in economics. It optimizes the lexicon in granularity and depth, significantly improving sentiment analysis accuracy.
|
Received: 07 July 2022
Published: 21 March 2023
|
|
Fund:National Social Science Fund of China(21BTQ055) |
Corresponding Authors:
Zhao Youlin, ORCID: 0000-0002-3028-437X, E-mail: sobzyl@hhu.edu.cn。
|
[1] |
Pang B, Lee L. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts[C]// Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. 2004: 271-278.
|
[2] |
Ferré P, Haro J, Huete-Pérez D, et al. Emotionality Effects in Ambiguous Word Recognition: The Crucial Role of the Affective Congruence Between Distinct Meanings of Ambiguous Words[J]. Quarterly Journal of Experimental Psychology, 2021, 74(7): 1234-1243.
doi: 10.1177/1747021821990003
|
[3] |
Freifeld C C, Mandl K D, Reis B Y, et al. HealthMap: Global Infectious Disease Monitoring Through Automated Classification and Visualization of Internet Media Reports[J]. Journal of the American Medical Informatics Association, 2008, 15(2): 150-157.
doi: 10.1197/jamia.M2544
pmid: 18096908
|
[4] |
陈俊鹏, 虞为. 基于实时新闻分析的馆藏资源推荐方法研究[J]. 中国图书馆学报, 2015, 41(6): 86-96.
|
[4] |
(Chen Junpeng, Yu Wei. Library Resource Recommendation Based on Analysis on Newswires[J]. Journal of Library Science in China, 2015, 41(6): 86-96.)
|
[5] |
Tang X Y, Yang C Y, Zhou J. Stock Price Forecasting by Combining News Mining and Time Series Analysis[C]// Proceedings of 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. 2009: 279-282.
|
[6] |
刘宇鹏. 新闻实时话题分析系统的研究与实现[D]. 沈阳: 辽宁大学, 2019.
|
[6] |
(Liu Yupeng. Research and Implementation of News Real-Time Topic Analysis System[D]. Shenyang: Liaoning University, 2019.)
|
[7] |
Mitra G, Mitra L. The Handbook of News Analytics in Finance[M]. New Jersey: Wiley, 2011.
|
[8] |
Schumaker R P, Zhang Y L, Huang C N, et al. Evaluating Sentiment in Financial News Articles[J]. Decision Support Systems, 2012, 53(3): 458-464.
doi: 10.1016/j.dss.2012.03.001
|
[9] |
Hajek P, Barushka A. Integrating Sentiment Analysis and Topic Detection in Financial News for Stock Movement Prediction[C]// Proceedings of the 2nd International Conference on Business and Information Management. 2018: 158-162.
|
[10] |
Ederington L H, Lee J H. How Markets Process Information: News Releases and Volatility[J]. Journal of Finance, 1993, 48(4): 1161-1191.
doi: 10.1111/j.1540-6261.1993.tb04750.x
|
[11] |
Shiller R J. Irrational Exuberance[M]. Princeton University Press, 2016.
|
[12] |
Loughran T, McDonald B. When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks[J]. The Journal of Finance, 2011, 66(1): 35-65.
doi: 10.1111/j.1540-6261.2010.01625.x
|
[13] |
陈可嘉, 陈荣晖. 股市情感词典自动构建与优化[J]. 科学技术与工程, 2020, 20(21): 8683-8689.
|
[13] |
(Chen Kejia, Chen Ronghui. Automatic Construction and Optimization of Stock Market Sentiment Dictionary[J]. Science Technology and Engineering, 2020, 20(21): 8683-8689.)
|
[14] |
Sun F, Belatreche A, Coleman S, et al. Pre-processing Online Financial Text for Sentiment Classification: A Natural Language Processing Approach[C]// Proceedings of Conference on Computational Intelligence for Financial Engineering & Economics. 2014: 122-129.
|
[15] |
沈艳, 陈赟, 黄卓. 文本大数据分析在经济学和金融学中的应用: 一个文献综述[J]. 经济学(季刊), 2019, 18(4): 1153-1186.
|
[15] |
(Shen Yan, Chen Yun, Huang Zhuo. A Literature Review of Textual Analysis in Economics and Financial Research[J]. China Economic Quarterly, 2019, 18(4): 1153-1186.)
|
[16] |
姜富伟, 孟令超, 唐国豪. 媒体文本情绪与股票回报预测[J]. 经济学((季刊)), 2021, 21(4): 1323-1344.
|
[16] |
(Jiang Fuwei, Meng Lingchao, Tang Guohao. Media Textual Sentiment and Chinese Stock Return Predictability[J]. China Economic Quarterly, 2021, 21(4): 1323-1344.)
|
[17] |
García D. Sentiment During Recessions[J]. The Journal of Finance, 2013, 68(3): 1267-1300.
doi: 10.1111/jofi.12027
|
[18] |
Jiang F W, Lee J, Martin X, et al. Manager Sentiment and Stock Returns[J]. Journal of Financial Economics, 2017, 132(1): 126-149.
doi: 10.1016/j.jfineco.2018.10.001
|
[19] |
曾庆生, 周波, 张程, 等. 年报语调与内部人交易: “表里如一”还是“口是心非”?[J]. 管理世界, 2018(9): 143-160.
|
[19] |
(Zeng Qingsheng, Zhou Bo, Zhang Cheng, et al. Annual Report Tone and Insider Trading: Do Insiders Act as What They Said?[J]. Management World, 2018(9): 143-160.)
|
[20] |
Qi Y J, Li H J, Liu N R, et al. Transmission Characteristics of Investor Sentiment for Energy Stocks from the Perspective of a Complex Network[J]. Journal of Statistical Mechanics: Theory and Experiment, 2018. DOI: 10.1088/1742-5468/aac916.
doi: 10.1088/1742-5468/aac916
|
[21] |
Chen S A, Guo S Y. IPO Underpricing and Investor Sentiment—Base on the SME Board Under the Circumstance of the Full Circulation[J]. International Journal of Systems and Control, 2008, 3(3): 158-168.
|
[22] |
Sohangir S, Wang D D, Pomeranets A, et al. Big Data: Deep Learning for Financial Sentiment Analysis[J]. Journal of Big Data, 2018, 5(1): 3.
doi: 10.1186/s40537-017-0111-6
|
[23] |
Nguyen T H, Shirai K, Velcin J. Sentiment Analysis on Social Media for Stock Movement Prediction[J]. Expert Systems with Applications, 2015, 42(24): 9603-9611.
doi: 10.1016/j.eswa.2015.07.052
|
[24] |
Fung G P C, Yu J X, Lam W. News Sensitive Stock Trend Prediction[C]// Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2002: 481-493.
|
[25] |
Mittermayer M A. Forecasting Intraday Stock Price Trends with Text Mining Techniques[C]// Proceedings of the 37th Annual Hawaii International Conference on System Sciences. 2004. DOI: 10.1109/HICSS.2004.1265201.
doi: 10.1109/HICSS.2004.1265201
|
[26] |
Bollen J, Mao H N, Zeng X J. Twitter Mood Predicts the Stock Market[J]. Journal of Computational Science, 2011, 2(1): 1-8.
doi: 10.1016/j.jocs.2010.12.007
|
[27] |
Agarwal B. Financial Sentiment Analysis Model Utilizing Knowledge-Base and Domain-Specific Representation[J]. Multimedia Tools and Applications, 2023, 82(6): 8899-8920.
doi: 10.1007/s11042-022-12181-y
|
[28] |
刘小虎, 李生. 基于语料库的译文选择[J]. 情报学报, 1997, 16(3): 189-194.
|
[28] |
(Liu Xiaohu, Li Sheng. Target Word Selection Based on Corpus[J]. Journal of the China Society for Scientific and Technical Information, 1997, 16(3): 189-194.)
|
[29] |
窦玉萌. 基于网络协作标注的标签消歧方法述评[J]. 现代图书情报技术, 2010(3): 27-32.
|
[29] |
(Dou Yumeng. Review on Tag Meaning Disambiguation Methods Based on Web Collaborative Tagging[J]. New Technology of Library and Information Service, 2010(3): 27-32.)
|
[30] |
Deshmukh K V, Shiravale S S. Ambiguity Resolution in English Language for Sentiment Analysis[C]// Proceedings of 2018 IEEE PuneCon. 2018: 1-6.
|
[31] |
颜端武, 任婷, 陶志恒. 基于双语词典和歧义消解的中英双语专利信息检索研究[J]. 情报理论与实践, 2018, 41(2): 138-142.
|
[31] |
(Yan Duanwu, Ren Ting, Tao Zhiheng. Research on Chinese-English Bilingual Patent Information Retrieval Based on Bilingual Dictionary and Disambiguation[J]. Information Studies:Theory & Application, 2018, 41(2): 138-142.)
|
[32] |
Boon E, Botha E. Dealing with Ambiguity in Online Customer Reviews: The Topic-Sentiment Method for Automated Content Analysis[C]// Proceedings of Academy of Marketing Science World Marketing Congress. 2019: 227-238.
|
[33] |
Subasic P, Huettner A. Affect Analysis of Text Using Fuzzy Semantic Typing[J]. IEEE Transactions on Fuzzy Systems, 2000, 9(4): 483-496.
doi: 10.1109/91.940962
|
[34] |
Filik R, Țurcan A, Thompson D, et al. Sarcasm and Emoticons: Comprehension and Emotional Impact[J]. Quarterly Journal of Experimental Psychology, 2016, 69(11): 2130-2146.
doi: 10.1080/17470218.2015.1106566
|
[35] |
Aldunate N, Villena-González M, Rojas-Thomas F, et al. Mood Detection in Ambiguous Messages: The Interaction Between Text and Emoticons[J]. Frontiers in Psychology, 2018, 9: 423.
doi: 10.3389/fpsyg.2018.00423
pmid: 29670554
|
[36] |
Bolshakov I A, Gelbukh A. Heuristics-Based Replenishment of Collocation Databases[C]// Proceedings of International Conference for Natural Language Processing. 2002: 25-32.
|
[37] |
车万翔, 刘挺, 秦兵, 等. 面向依存文法分析的搭配抽取方法研究[C]// 全国第六届计算语言学联合学术会议论文集. 2001.
|
[37] |
(Che Wanxiang, Liu Ting, Qin Bing. A Method to Fetch Collocations Orienting Dependency Grammar[C]// Proceedings of the 6th China National Conference on Computational Linguistics. 2001.)
|
[38] |
万常选, 江腾蛟, 钟敏娟, 等. 基于词性标注和依存句法的Web金融信息情感计算[J]. 计算机研究与发展, 2013, 50(12) :2554-2569.
|
[38] |
(Wan Changxuan, Jiang Tengjiao, Zhong Minjuan, et al. Sentiment Computing of Web Financial Information Based on the Part-of-Speech Tagging and Dependency Parsing[J]. Journal of Computer Research and Development, 2013, 50(12): 2554-2569.)
|
[39] |
宋艳雪. 基于关联规则和图排序的句子情感倾向性研究[D]. 大连: 大连理工大学, 2011.
|
[39] |
(Song Yanxue. Research of Sentence-level Sentiment Analysis Based on Association Rules and Graph Ranking[D]. Dalian: Dalian University of Technology, 2011.)
|
[40] |
蔡肖红, 刘培玉, 王智昊. 基于语境情感消岐的评论倾向性分析[J]. 郑州大学学报(理学版), 2017, 49(2): 48-53.
|
[40] |
(Cai Xiaohong, Liu Peiyu, Wang Zhihao. Sentiment Analysis of Comments Based on Contextual Emotional Disambiguation[J]. Journal of Zhengzhou University (Natural Science Edition), 2017, 49(2): 48-53.)
|
[41] |
Tang D Y, Bing Q, Zhou L J, et al. Domain-Specific Sentiment Word Extraction by Seed Expansion and Pattern Generation[OL]. arXiv Preprint, arXiv: 1309.6722.
|
[42] |
Agrawal R, Imieliński T, Swami A. Mining Association Rules Between Sets of Items in Large Database[C]// Proceedings of the ACM SIGMOD International Conference on Management of Data. 1993: 24-27.
|
[43] |
Church K W, Hanks P. Word Association Norms, Mutual Information, and Lexicography[J]. Computational Linguistics, 1990, 16(1): 22-29.
|
[44] |
孙虹. 基于新闻信息的经济学领域负面情感词典构建及其实证研究[D]. 南京: 河海大学, 2021.
|
[44] |
(Sun Hong. Construction of Negative Sentiment Dictionary in Economics Based on News Information and Empirical Research[D]. Nanjing: Hohai University, 2021.)
|
[45] |
黄昌宁, 姜自霞, 李玉梅. 形容词直接修饰动词的 “a+v” 结构歧义[J]. 中国语文, 2009(1): 54-63.
|
[45] |
(Huang Changning, Jiang Zixia, Li Yumei. Adjectives Directly Modify the Structural Ambiguity of “a+v” of Verbs[J]. Studies of the Chinese Language, 2009(1): 54-63.)
|
[46] |
王长安. 英语名词前后置修饰语与中心词关系紧密度研究——以ICA为例[J]. 内江师范学院学报, 2018, 33(5): 77-80.
|
[46] |
(Wang Chang'an. A Study on the Relationship Between English Noun Prepositive Modifiers and Headwords—Taking ICA as an Example[J]. Journal of Neijiang Normal University, 2018, 33(5): 77-80.)
|
[47] |
蔺璜, 郭姝慧. 程度副词的特点范围与分类[J]. 山西大学学报(哲学社会科学版), 2003, 26(2): 71-74.
|
[47] |
(Lin Huang, Guo Shuhui. On the Characteristics, Range and Classification of Adverbs of Degree[J]. Journal of Shanxi University (Philosophy & Social Science), 2003, 26(2): 71-74.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|