Domain Ambiguous Collocation Dictionary for Real-Time Financial Sentimental Analysis
Zhao Youlin1,2(),Xu Jingnan1,Lu Yingjun3
1Business School, Hohai University, Nanjing 211100, China 2School of Information Management, Nanjing University, Nanjing 210023, China 3School of Information Management, Wuhan University, Wuhan 430072, China
[Objective] This study tries to address the problem of inaccurate sentiment analysis due to ignoring the dynamic polarity in ambiguous words. It aims to effectively identify sentiment-ambiguous words with economic characteristics and their collocations. [Methods] The study takes dynamic financial news information as the research object. First, we calculated the positive and negative sentiment scores of words in phrases to extract ambiguous seed words. Then, we retrieved their strongly related collocations with algorithms such as association rules and PMI. Third, we labeled the sentiment polarity of collocation pairs to build an ambiguous collocation lexicon. Finally, we measured the performance of sentiment mining on real-time updated news texts from a dynamic perspective. [Results] The accuracy, recall, and F-value of the sentiment analysis of the financial information text were 89.62%, 87.52%, and 88.57%, respectively, which were 5.79%, 15.89%, and 10.84% higher than the traditional models. [Limitations] Some collocation words cannot be identified due to their significant distance from the seed words. [Conclusions] The ambiguous collocation dictionary constructed in this paper effectively expands the sentiment lexicon in economics. It optimizes the lexicon in granularity and depth, significantly improving sentiment analysis accuracy.
Despite the losses,Morgan Stanley managed to report record revenue and profit during the quarter.
正面误判为负面
record revenue and profit
the losses
But that income is still down from the million it earned a year earlier as revenue plunged.
负面误判为 无情感倾向
/
income is still down;revenue plunged
For some time now many in the salvage industry have warned that container ships are getting too big for situations like this to be resolved efficiently and economically he added.
负面误判为正面
/
too big for situations to be solved
While the successful refloating of the vessel on Monday was met with relief the backlog will take days to clear according to major shipping lines.
无情感倾向 误判为正面
/
/
Last week the Chicago suburb of Evanston Illinois approved the nation's first reparations program for black residents.
无情感倾向 误判为负面
/
/
The fed could start removing stimulus next year Bostic who is a voting member on the fed's policy-setting committee this year expressed optimism about the economic recovery from the pandemic predicting robust job growth.
正面误判为 无情感倾向
expressed optimism;robust job growth;economic recovery from the pandemic
/
Table 1 种子词挖掘语料数据内容示例
积极种子词
消极种子词
词
SPS得分
词
SNS得分
satisfied
0.974
sacrifice
0.877
strength
0.917
scandal
0.968
strong
0.598
strong
0.402
stable
0.845
shut
0.789
succeed
0.989
stress
0.942
Table 2 基于SPS和SNS得分的情感词列表(部分)
方案
情感词典
负面情感词
正面情感词
合计
方案一(不加入情感歧义搭配词典)
McDonald Financial Dictionary
2 341
354
2 703
方案二(加入情感歧义搭配词典)
McDonald Financial Dictionary&情感歧义搭配词典
2 610
696
3 306
Table 3 验证词典及情感词统计
人工标注
机器标注
正面评论
负面评论
正面评论
TP
FP
负面评论
FN
TN
Table 4 混合矩阵
词典选择方案
正面情感句
负面情感句
准确率
召回率
F1值
准确率
召回率
F1值
方案一
0.663 7
0.522 9
0.584 9
0.792 9
0.824 0
0.808 2
方案二
0.763 6
0.827 7
0.794 4
0.824 0
0.902 8
0.861 6
提升
0.099 9
0.304 8
0.209 5
0.031 1
0.078 8
0.053 4
Table 5 正面及负面情感句对比实验结果
准确率
召回率
F1值
包含歧义搭配词典
0.838 3
0.716 3
0.777 3
不包含歧义搭配词典
0.896 2
0.875 2
0.885 7
提升
0.057 9
0.158 9
0.108 4
Table 6 语料情感分析实验结果
[1]
Pang B, Lee L. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts[C]// Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. 2004: 271-278.
[2]
Ferré P, Haro J, Huete-Pérez D, et al. Emotionality Effects in Ambiguous Word Recognition: The Crucial Role of the Affective Congruence Between Distinct Meanings of Ambiguous Words[J]. Quarterly Journal of Experimental Psychology, 2021, 74(7): 1234-1243.
doi: 10.1177/1747021821990003
[3]
Freifeld C C, Mandl K D, Reis B Y, et al. HealthMap: Global Infectious Disease Monitoring Through Automated Classification and Visualization of Internet Media Reports[J]. Journal of the American Medical Informatics Association, 2008, 15(2): 150-157.
doi: 10.1197/jamia.M2544
pmid: 18096908
(Chen Junpeng, Yu Wei. Library Resource Recommendation Based on Analysis on Newswires[J]. Journal of Library Science in China, 2015, 41(6): 86-96.)
[5]
Tang X Y, Yang C Y, Zhou J. Stock Price Forecasting by Combining News Mining and Time Series Analysis[C]// Proceedings of 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. 2009: 279-282.
[6]
刘宇鹏. 新闻实时话题分析系统的研究与实现[D]. 沈阳: 辽宁大学, 2019.
[6]
(Liu Yupeng. Research and Implementation of News Real-Time Topic Analysis System[D]. Shenyang: Liaoning University, 2019.)
[7]
Mitra G, Mitra L. The Handbook of News Analytics in Finance[M]. New Jersey: Wiley, 2011.
[8]
Schumaker R P, Zhang Y L, Huang C N, et al. Evaluating Sentiment in Financial News Articles[J]. Decision Support Systems, 2012, 53(3): 458-464.
doi: 10.1016/j.dss.2012.03.001
[9]
Hajek P, Barushka A. Integrating Sentiment Analysis and Topic Detection in Financial News for Stock Movement Prediction[C]// Proceedings of the 2nd International Conference on Business and Information Management. 2018: 158-162.
[10]
Ederington L H, Lee J H. How Markets Process Information: News Releases and Volatility[J]. Journal of Finance, 1993, 48(4): 1161-1191.
doi: 10.1111/j.1540-6261.1993.tb04750.x
[11]
Shiller R J. Irrational Exuberance[M]. Princeton University Press, 2016.
[12]
Loughran T, McDonald B. When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks[J]. The Journal of Finance, 2011, 66(1): 35-65.
doi: 10.1111/j.1540-6261.2010.01625.x
(Chen Kejia, Chen Ronghui. Automatic Construction and Optimization of Stock Market Sentiment Dictionary[J]. Science Technology and Engineering, 2020, 20(21): 8683-8689.)
[14]
Sun F, Belatreche A, Coleman S, et al. Pre-processing Online Financial Text for Sentiment Classification: A Natural Language Processing Approach[C]// Proceedings of Conference on Computational Intelligence for Financial Engineering & Economics. 2014: 122-129.
(Shen Yan, Chen Yun, Huang Zhuo. A Literature Review of Textual Analysis in Economics and Financial Research[J]. China Economic Quarterly, 2019, 18(4): 1153-1186.)
(Jiang Fuwei, Meng Lingchao, Tang Guohao. Media Textual Sentiment and Chinese Stock Return Predictability[J]. China Economic Quarterly, 2021, 21(4): 1323-1344.)
[17]
García D. Sentiment During Recessions[J]. The Journal of Finance, 2013, 68(3): 1267-1300.
doi: 10.1111/jofi.12027
[18]
Jiang F W, Lee J, Martin X, et al. Manager Sentiment and Stock Returns[J]. Journal of Financial Economics, 2017, 132(1): 126-149.
doi: 10.1016/j.jfineco.2018.10.001
(Zeng Qingsheng, Zhou Bo, Zhang Cheng, et al. Annual Report Tone and Insider Trading: Do Insiders Act as What They Said?[J]. Management World, 2018(9): 143-160.)
[20]
Qi Y J, Li H J, Liu N R, et al. Transmission Characteristics of Investor Sentiment for Energy Stocks from the Perspective of a Complex Network[J]. Journal of Statistical Mechanics: Theory and Experiment, 2018. DOI: 10.1088/1742-5468/aac916.
doi: 10.1088/1742-5468/aac916
[21]
Chen S A, Guo S Y. IPO Underpricing and Investor Sentiment—Base on the SME Board Under the Circumstance of the Full Circulation[J]. International Journal of Systems and Control, 2008, 3(3): 158-168.
[22]
Sohangir S, Wang D D, Pomeranets A, et al. Big Data: Deep Learning for Financial Sentiment Analysis[J]. Journal of Big Data, 2018, 5(1): 3.
doi: 10.1186/s40537-017-0111-6
[23]
Nguyen T H, Shirai K, Velcin J. Sentiment Analysis on Social Media for Stock Movement Prediction[J]. Expert Systems with Applications, 2015, 42(24): 9603-9611.
doi: 10.1016/j.eswa.2015.07.052
[24]
Fung G P C, Yu J X, Lam W. News Sensitive Stock Trend Prediction[C]// Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2002: 481-493.
[25]
Mittermayer M A. Forecasting Intraday Stock Price Trends with Text Mining Techniques[C]// Proceedings of the 37th Annual Hawaii International Conference on System Sciences. 2004. DOI: 10.1109/HICSS.2004.1265201.
doi: 10.1109/HICSS.2004.1265201
[26]
Bollen J, Mao H N, Zeng X J. Twitter Mood Predicts the Stock Market[J]. Journal of Computational Science, 2011, 2(1): 1-8.
doi: 10.1016/j.jocs.2010.12.007
[27]
Agarwal B. Financial Sentiment Analysis Model Utilizing Knowledge-Base and Domain-Specific Representation[J]. Multimedia Tools and Applications, 2023, 82(6): 8899-8920.
doi: 10.1007/s11042-022-12181-y
(Liu Xiaohu, Li Sheng. Target Word Selection Based on Corpus[J]. Journal of the China Society for Scientific and Technical Information, 1997, 16(3): 189-194.)
(Dou Yumeng. Review on Tag Meaning Disambiguation Methods Based on Web Collaborative Tagging[J]. New Technology of Library and Information Service, 2010(3): 27-32.)
[30]
Deshmukh K V, Shiravale S S. Ambiguity Resolution in English Language for Sentiment Analysis[C]// Proceedings of 2018 IEEE PuneCon. 2018: 1-6.
(Yan Duanwu, Ren Ting, Tao Zhiheng. Research on Chinese-English Bilingual Patent Information Retrieval Based on Bilingual Dictionary and Disambiguation[J]. Information Studies:Theory & Application, 2018, 41(2): 138-142.)
[32]
Boon E, Botha E. Dealing with Ambiguity in Online Customer Reviews: The Topic-Sentiment Method for Automated Content Analysis[C]// Proceedings of Academy of Marketing Science World Marketing Congress. 2019: 227-238.
[33]
Subasic P, Huettner A. Affect Analysis of Text Using Fuzzy Semantic Typing[J]. IEEE Transactions on Fuzzy Systems, 2000, 9(4): 483-496.
doi: 10.1109/91.940962
[34]
Filik R, Țurcan A, Thompson D, et al. Sarcasm and Emoticons: Comprehension and Emotional Impact[J]. Quarterly Journal of Experimental Psychology, 2016, 69(11): 2130-2146.
doi: 10.1080/17470218.2015.1106566
[35]
Aldunate N, Villena-González M, Rojas-Thomas F, et al. Mood Detection in Ambiguous Messages: The Interaction Between Text and Emoticons[J]. Frontiers in Psychology, 2018, 9: 423.
doi: 10.3389/fpsyg.2018.00423
pmid: 29670554
[36]
Bolshakov I A, Gelbukh A. Heuristics-Based Replenishment of Collocation Databases[C]// Proceedings of International Conference for Natural Language Processing. 2002: 25-32.
(Che Wanxiang, Liu Ting, Qin Bing. A Method to Fetch Collocations Orienting Dependency Grammar[C]// Proceedings of the 6th China National Conference on Computational Linguistics. 2001.)
(Wan Changxuan, Jiang Tengjiao, Zhong Minjuan, et al. Sentiment Computing of Web Financial Information Based on the Part-of-Speech Tagging and Dependency Parsing[J]. Journal of Computer Research and Development, 2013, 50(12): 2554-2569.)
[39]
宋艳雪. 基于关联规则和图排序的句子情感倾向性研究[D]. 大连: 大连理工大学, 2011.
[39]
(Song Yanxue. Research of Sentence-level Sentiment Analysis Based on Association Rules and Graph Ranking[D]. Dalian: Dalian University of Technology, 2011.)
(Cai Xiaohong, Liu Peiyu, Wang Zhihao. Sentiment Analysis of Comments Based on Contextual Emotional Disambiguation[J]. Journal of Zhengzhou University (Natural Science Edition), 2017, 49(2): 48-53.)
[41]
Tang D Y, Bing Q, Zhou L J, et al. Domain-Specific Sentiment Word Extraction by Seed Expansion and Pattern Generation[OL]. arXiv Preprint, arXiv: 1309.6722.
[42]
Agrawal R, Imieliński T, Swami A. Mining Association Rules Between Sets of Items in Large Database[C]// Proceedings of the ACM SIGMOD International Conference on Management of Data. 1993: 24-27.
[43]
Church K W, Hanks P. Word Association Norms, Mutual Information, and Lexicography[J]. Computational Linguistics, 1990, 16(1): 22-29.
(Sun Hong. Construction of Negative Sentiment Dictionary in Economics Based on News Information and Empirical Research[D]. Nanjing: Hohai University, 2021.)
(Huang Changning, Jiang Zixia, Li Yumei. Adjectives Directly Modify the Structural Ambiguity of “a+v” of Verbs[J]. Studies of the Chinese Language, 2009(1): 54-63.)
(Wang Chang'an. A Study on the Relationship Between English Noun Prepositive Modifiers and Headwords—Taking ICA as an Example[J]. Journal of Neijiang Normal University, 2018, 33(5): 77-80.)
(Lin Huang, Guo Shuhui. On the Characteristics, Range and Classification of Adverbs of Degree[J]. Journal of Shanxi University (Philosophy & Social Science), 2003, 26(2): 71-74.)