Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (7): 100-110     https://doi.org/10.11925/infotech.2096-3467.2022.0696
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
面向实时财经信息的领域情感歧义搭配词典构建研究*
赵又霖1,2(),徐竟楠1,陆颖隽3
1河海大学商学院 南京 211100
2南京大学信息管理学院 南京 210023
3武汉大学信息管理学院 武汉 430072
Domain Ambiguous Collocation Dictionary for Real-Time Financial Sentimental Analysis
Zhao Youlin1,2(),Xu Jingnan1,Lu Yingjun3
1Business School, Hohai University, Nanjing 211100, China
2School of Information Management, Nanjing University, Nanjing 210023, China
3School of Information Management, Wuhan University, Wuhan 430072, China
全文: PDF (799 KB)   HTML ( 16
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 针对由于忽略歧义词的动态极性而导致情感分析有误的问题,有效识别具有经济学特征的情感歧义词并提取其搭配词,解决该领域歧义词适配性问题。【方法】 以动态财经新闻信息为研究对象,计算短语中词汇正负情感值以识别提取歧义种子词,通过关联规则、点互信息等算法挖掘其强相关搭配词,标注搭配词对情感极性后构建歧义搭配词典,从动态维度对实时更新的新闻文本进行情感挖掘测评。【结果】 实证结果表明,加入歧义搭配词典后对财经信息文本情感分析的准确率为89.62%,召回率为87.52%,F1值为88.57%,较未加入歧义搭配词典分别提高5.79、15.89和10.84个百分点。【局限】 在利用情感歧义搭配词典进行文本情感挖掘过程中,存在设置种子词与其搭配词检索字符间隔较远而未被有效识别的情况。【结论】 本文构建的歧义搭配词典有效扩充了经济学领域情感词典,在细粒度和深度上对领域情感词典进行完善及优化,显著提升了领域文本情感挖掘的准确性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
赵又霖
徐竟楠
陆颖隽
关键词 财经信息歧义搭配词典关联规则情感词典    
Abstract

[Objective] This study tries to address the problem of inaccurate sentiment analysis due to ignoring the dynamic polarity in ambiguous words. It aims to effectively identify sentiment-ambiguous words with economic characteristics and their collocations. [Methods] The study takes dynamic financial news information as the research object. First, we calculated the positive and negative sentiment scores of words in phrases to extract ambiguous seed words. Then, we retrieved their strongly related collocations with algorithms such as association rules and PMI. Third, we labeled the sentiment polarity of collocation pairs to build an ambiguous collocation lexicon. Finally, we measured the performance of sentiment mining on real-time updated news texts from a dynamic perspective. [Results] The accuracy, recall, and F-value of the sentiment analysis of the financial information text were 89.62%, 87.52%, and 88.57%, respectively, which were 5.79%, 15.89%, and 10.84% higher than the traditional models. [Limitations] Some collocation words cannot be identified due to their significant distance from the seed words. [Conclusions] The ambiguous collocation dictionary constructed in this paper effectively expands the sentiment lexicon in economics. It optimizes the lexicon in granularity and depth, significantly improving sentiment analysis accuracy.

Key wordsFinancial Information    Ambiguous Collocation Dictionary    Association Rules    Emotional Lexicon
收稿日期: 2022-07-07      出版日期: 2023-03-21
ZTFLH:  G350  
基金资助:*国家社会科学基金一般项目的研究成果之一(21BTQ055)
通讯作者: 赵又霖, ORCID: 0000-0002-3028-437X, E-mail: sobzyl@hhu.edu.cn。   
引用本文:   
赵又霖, 徐竟楠, 陆颖隽. 面向实时财经信息的领域情感歧义搭配词典构建研究*[J]. 数据分析与知识发现, 2023, 7(7): 100-110.
Zhao Youlin, Xu Jingnan, Lu Yingjun. Domain Ambiguous Collocation Dictionary for Real-Time Financial Sentimental Analysis. Data Analysis and Knowledge Discovery, 2023, 7(7): 100-110.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0696      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I7/100
Fig.1  歧义搭配词典构建框架
句子内容 机器判断错误情况 正面短语 负面短语
Despite the losses,Morgan Stanley managed to report record revenue and profit during the quarter. 正面误判为负面 record revenue and profit the losses
But that income is still down from the million it earned a year earlier as revenue plunged. 负面误判为
无情感倾向
/ income is still down;revenue plunged
For some time now many in the salvage industry have warned that container ships are getting too big for situations like this to be resolved efficiently and economically he added. 负面误判为正面 / too big for situations to be solved
While the successful refloating of the vessel on Monday was met with relief the backlog will take days to clear according to major shipping lines. 无情感倾向
误判为正面
/ /
Last week the Chicago suburb of Evanston Illinois approved the nation's first reparations program for black residents. 无情感倾向
误判为负面
/ /
The fed could start removing stimulus next year Bostic who is a voting member on the fed's policy-setting committee this year expressed optimism about the economic recovery from the pandemic predicting robust job growth. 正面误判为
无情感倾向
expressed optimism;robust job growth;economic recovery from the pandemic /
Table 1  种子词挖掘语料数据内容示例
积极种子词 消极种子词
SPS得分 SNS得分
satisfied 0.974 sacrifice 0.877
strength 0.917 scandal 0.968
strong 0.598 strong 0.402
stable 0.845 shut 0.789
succeed 0.989 stress 0.942
Table 2  基于SPS和SNS得分的情感词列表(部分)
方案 情感词典 负面情感词 正面情感词 合计
方案一(不加入情感歧义搭配词典) McDonald Financial Dictionary 2 341 354 2 703
方案二(加入情感歧义搭配词典) McDonald Financial Dictionary&情感歧义搭配词典 2 610 696 3 306
Table 3  验证词典及情感词统计
人工标注

机器标注
正面评论 负面评论
正面评论 TP FP
负面评论 FN TN
Table 4  混合矩阵
词典选择方案 正面情感句 负面情感句
准确率 召回率 F1值 准确率 召回率 F1值
方案一 0.663 7 0.522 9 0.584 9 0.792 9 0.824 0 0.808 2
方案二 0.763 6 0.827 7 0.794 4 0.824 0 0.902 8 0.861 6
提升 0.099 9 0.304 8 0.209 5 0.031 1 0.078 8 0.053 4
Table 5  正面及负面情感句对比实验结果
准确率 召回率 F1值
包含歧义搭配词典 0.838 3 0.716 3 0.777 3
不包含歧义搭配词典 0.896 2 0.875 2 0.885 7
提升 0.057 9 0.158 9 0.108 4
Table 6  语料情感分析实验结果
[1] Pang B, Lee L. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts[C]// Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. 2004: 271-278.
[2] Ferré P, Haro J, Huete-Pérez D, et al. Emotionality Effects in Ambiguous Word Recognition: The Crucial Role of the Affective Congruence Between Distinct Meanings of Ambiguous Words[J]. Quarterly Journal of Experimental Psychology, 2021, 74(7): 1234-1243.
doi: 10.1177/1747021821990003
[3] Freifeld C C, Mandl K D, Reis B Y, et al. HealthMap: Global Infectious Disease Monitoring Through Automated Classification and Visualization of Internet Media Reports[J]. Journal of the American Medical Informatics Association, 2008, 15(2): 150-157.
doi: 10.1197/jamia.M2544 pmid: 18096908
[4] 陈俊鹏, 虞为. 基于实时新闻分析的馆藏资源推荐方法研究[J]. 中国图书馆学报, 2015, 41(6): 86-96.
[4] (Chen Junpeng, Yu Wei. Library Resource Recommendation Based on Analysis on Newswires[J]. Journal of Library Science in China, 2015, 41(6): 86-96.)
[5] Tang X Y, Yang C Y, Zhou J. Stock Price Forecasting by Combining News Mining and Time Series Analysis[C]// Proceedings of 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. 2009: 279-282.
[6] 刘宇鹏. 新闻实时话题分析系统的研究与实现[D]. 沈阳: 辽宁大学, 2019.
[6] (Liu Yupeng. Research and Implementation of News Real-Time Topic Analysis System[D]. Shenyang: Liaoning University, 2019.)
[7] Mitra G, Mitra L. The Handbook of News Analytics in Finance[M]. New Jersey: Wiley, 2011.
[8] Schumaker R P, Zhang Y L, Huang C N, et al. Evaluating Sentiment in Financial News Articles[J]. Decision Support Systems, 2012, 53(3): 458-464.
doi: 10.1016/j.dss.2012.03.001
[9] Hajek P, Barushka A. Integrating Sentiment Analysis and Topic Detection in Financial News for Stock Movement Prediction[C]// Proceedings of the 2nd International Conference on Business and Information Management. 2018: 158-162.
[10] Ederington L H, Lee J H. How Markets Process Information: News Releases and Volatility[J]. Journal of Finance, 1993, 48(4): 1161-1191.
doi: 10.1111/j.1540-6261.1993.tb04750.x
[11] Shiller R J. Irrational Exuberance[M]. Princeton University Press, 2016.
[12] Loughran T, McDonald B. When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks[J]. The Journal of Finance, 2011, 66(1): 35-65.
doi: 10.1111/j.1540-6261.2010.01625.x
[13] 陈可嘉, 陈荣晖. 股市情感词典自动构建与优化[J]. 科学技术与工程, 2020, 20(21): 8683-8689.
[13] (Chen Kejia, Chen Ronghui. Automatic Construction and Optimization of Stock Market Sentiment Dictionary[J]. Science Technology and Engineering, 2020, 20(21): 8683-8689.)
[14] Sun F, Belatreche A, Coleman S, et al. Pre-processing Online Financial Text for Sentiment Classification: A Natural Language Processing Approach[C]// Proceedings of Conference on Computational Intelligence for Financial Engineering & Economics. 2014: 122-129.
[15] 沈艳, 陈赟, 黄卓. 文本大数据分析在经济学和金融学中的应用: 一个文献综述[J]. 经济学(季刊), 2019, 18(4): 1153-1186.
[15] (Shen Yan, Chen Yun, Huang Zhuo. A Literature Review of Textual Analysis in Economics and Financial Research[J]. China Economic Quarterly, 2019, 18(4): 1153-1186.)
[16] 姜富伟, 孟令超, 唐国豪. 媒体文本情绪与股票回报预测[J]. 经济学((季刊)), 2021, 21(4): 1323-1344.
[16] (Jiang Fuwei, Meng Lingchao, Tang Guohao. Media Textual Sentiment and Chinese Stock Return Predictability[J]. China Economic Quarterly, 2021, 21(4): 1323-1344.)
[17] García D. Sentiment During Recessions[J]. The Journal of Finance, 2013, 68(3): 1267-1300.
doi: 10.1111/jofi.12027
[18] Jiang F W, Lee J, Martin X, et al. Manager Sentiment and Stock Returns[J]. Journal of Financial Economics, 2017, 132(1): 126-149.
doi: 10.1016/j.jfineco.2018.10.001
[19] 曾庆生, 周波, 张程, 等. 年报语调与内部人交易: “表里如一”还是“口是心非”?[J]. 管理世界, 2018(9): 143-160.
[19] (Zeng Qingsheng, Zhou Bo, Zhang Cheng, et al. Annual Report Tone and Insider Trading: Do Insiders Act as What They Said?[J]. Management World, 2018(9): 143-160.)
[20] Qi Y J, Li H J, Liu N R, et al. Transmission Characteristics of Investor Sentiment for Energy Stocks from the Perspective of a Complex Network[J]. Journal of Statistical Mechanics: Theory and Experiment, 2018. DOI: 10.1088/1742-5468/aac916.
doi: 10.1088/1742-5468/aac916
[21] Chen S A, Guo S Y. IPO Underpricing and Investor Sentiment—Base on the SME Board Under the Circumstance of the Full Circulation[J]. International Journal of Systems and Control, 2008, 3(3): 158-168.
[22] Sohangir S, Wang D D, Pomeranets A, et al. Big Data: Deep Learning for Financial Sentiment Analysis[J]. Journal of Big Data, 2018, 5(1): 3.
doi: 10.1186/s40537-017-0111-6
[23] Nguyen T H, Shirai K, Velcin J. Sentiment Analysis on Social Media for Stock Movement Prediction[J]. Expert Systems with Applications, 2015, 42(24): 9603-9611.
doi: 10.1016/j.eswa.2015.07.052
[24] Fung G P C, Yu J X, Lam W. News Sensitive Stock Trend Prediction[C]// Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2002: 481-493.
[25] Mittermayer M A. Forecasting Intraday Stock Price Trends with Text Mining Techniques[C]// Proceedings of the 37th Annual Hawaii International Conference on System Sciences. 2004. DOI: 10.1109/HICSS.2004.1265201.
doi: 10.1109/HICSS.2004.1265201
[26] Bollen J, Mao H N, Zeng X J. Twitter Mood Predicts the Stock Market[J]. Journal of Computational Science, 2011, 2(1): 1-8.
doi: 10.1016/j.jocs.2010.12.007
[27] Agarwal B. Financial Sentiment Analysis Model Utilizing Knowledge-Base and Domain-Specific Representation[J]. Multimedia Tools and Applications, 2023, 82(6): 8899-8920.
doi: 10.1007/s11042-022-12181-y
[28] 刘小虎, 李生. 基于语料库的译文选择[J]. 情报学报, 1997, 16(3): 189-194.
[28] (Liu Xiaohu, Li Sheng. Target Word Selection Based on Corpus[J]. Journal of the China Society for Scientific and Technical Information, 1997, 16(3): 189-194.)
[29] 窦玉萌. 基于网络协作标注的标签消歧方法述评[J]. 现代图书情报技术, 2010(3): 27-32.
[29] (Dou Yumeng. Review on Tag Meaning Disambiguation Methods Based on Web Collaborative Tagging[J]. New Technology of Library and Information Service, 2010(3): 27-32.)
[30] Deshmukh K V, Shiravale S S. Ambiguity Resolution in English Language for Sentiment Analysis[C]// Proceedings of 2018 IEEE PuneCon. 2018: 1-6.
[31] 颜端武, 任婷, 陶志恒. 基于双语词典和歧义消解的中英双语专利信息检索研究[J]. 情报理论与实践, 2018, 41(2): 138-142.
[31] (Yan Duanwu, Ren Ting, Tao Zhiheng. Research on Chinese-English Bilingual Patent Information Retrieval Based on Bilingual Dictionary and Disambiguation[J]. Information Studies:Theory & Application, 2018, 41(2): 138-142.)
[32] Boon E, Botha E. Dealing with Ambiguity in Online Customer Reviews: The Topic-Sentiment Method for Automated Content Analysis[C]// Proceedings of Academy of Marketing Science World Marketing Congress. 2019: 227-238.
[33] Subasic P, Huettner A. Affect Analysis of Text Using Fuzzy Semantic Typing[J]. IEEE Transactions on Fuzzy Systems, 2000, 9(4): 483-496.
doi: 10.1109/91.940962
[34] Filik R, Țurcan A, Thompson D, et al. Sarcasm and Emoticons: Comprehension and Emotional Impact[J]. Quarterly Journal of Experimental Psychology, 2016, 69(11): 2130-2146.
doi: 10.1080/17470218.2015.1106566
[35] Aldunate N, Villena-González M, Rojas-Thomas F, et al. Mood Detection in Ambiguous Messages: The Interaction Between Text and Emoticons[J]. Frontiers in Psychology, 2018, 9: 423.
doi: 10.3389/fpsyg.2018.00423 pmid: 29670554
[36] Bolshakov I A, Gelbukh A. Heuristics-Based Replenishment of Collocation Databases[C]// Proceedings of International Conference for Natural Language Processing. 2002: 25-32.
[37] 车万翔, 刘挺, 秦兵, 等. 面向依存文法分析的搭配抽取方法研究[C]// 全国第六届计算语言学联合学术会议论文集. 2001.
[37] (Che Wanxiang, Liu Ting, Qin Bing. A Method to Fetch Collocations Orienting Dependency Grammar[C]// Proceedings of the 6th China National Conference on Computational Linguistics. 2001.)
[38] 万常选, 江腾蛟, 钟敏娟, 等. 基于词性标注和依存句法的Web金融信息情感计算[J]. 计算机研究与发展, 2013, 50(12) :2554-2569.
[38] (Wan Changxuan, Jiang Tengjiao, Zhong Minjuan, et al. Sentiment Computing of Web Financial Information Based on the Part-of-Speech Tagging and Dependency Parsing[J]. Journal of Computer Research and Development, 2013, 50(12): 2554-2569.)
[39] 宋艳雪. 基于关联规则和图排序的句子情感倾向性研究[D]. 大连: 大连理工大学, 2011.
[39] (Song Yanxue. Research of Sentence-level Sentiment Analysis Based on Association Rules and Graph Ranking[D]. Dalian: Dalian University of Technology, 2011.)
[40] 蔡肖红, 刘培玉, 王智昊. 基于语境情感消岐的评论倾向性分析[J]. 郑州大学学报(理学版), 2017, 49(2): 48-53.
[40] (Cai Xiaohong, Liu Peiyu, Wang Zhihao. Sentiment Analysis of Comments Based on Contextual Emotional Disambiguation[J]. Journal of Zhengzhou University (Natural Science Edition), 2017, 49(2): 48-53.)
[41] Tang D Y, Bing Q, Zhou L J, et al. Domain-Specific Sentiment Word Extraction by Seed Expansion and Pattern Generation[OL]. arXiv Preprint, arXiv: 1309.6722.
[42] Agrawal R, Imieliński T, Swami A. Mining Association Rules Between Sets of Items in Large Database[C]// Proceedings of the ACM SIGMOD International Conference on Management of Data. 1993: 24-27.
[43] Church K W, Hanks P. Word Association Norms, Mutual Information, and Lexicography[J]. Computational Linguistics, 1990, 16(1): 22-29.
[44] 孙虹. 基于新闻信息的经济学领域负面情感词典构建及其实证研究[D]. 南京: 河海大学, 2021.
[44] (Sun Hong. Construction of Negative Sentiment Dictionary in Economics Based on News Information and Empirical Research[D]. Nanjing: Hohai University, 2021.)
[45] 黄昌宁, 姜自霞, 李玉梅. 形容词直接修饰动词的 “a+v” 结构歧义[J]. 中国语文, 2009(1): 54-63.
[45] (Huang Changning, Jiang Zixia, Li Yumei. Adjectives Directly Modify the Structural Ambiguity of “a+v” of Verbs[J]. Studies of the Chinese Language, 2009(1): 54-63.)
[46] 王长安. 英语名词前后置修饰语与中心词关系紧密度研究——以ICA为例[J]. 内江师范学院学报, 2018, 33(5): 77-80.
[46] (Wang Chang'an. A Study on the Relationship Between English Noun Prepositive Modifiers and Headwords—Taking ICA as an Example[J]. Journal of Neijiang Normal University, 2018, 33(5): 77-80.)
[47] 蔺璜, 郭姝慧. 程度副词的特点范围与分类[J]. 山西大学学报(哲学社会科学版), 2003, 26(2): 71-74.
[47] (Lin Huang, Guo Shuhui. On the Characteristics, Range and Classification of Adverbs of Degree[J]. Journal of Shanxi University (Philosophy & Social Science), 2003, 26(2): 71-74.)
[1] 张腾, 倪渊, 莫同, 吕学强. 弹幕视频的情感时间曲线聚类与传播效果*[J]. 数据分析与知识发现, 2022, 6(6): 32-45.
[2] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[3] 李铁军,颜端武,杨雄飞. 基于情感加权关联规则的微博推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 27-33.
[4] 魏伟,郭崇慧,邢小宇. 基于语义关联规则的试题知识点标注及试题推荐*[J]. 数据分析与知识发现, 2020, 4(2/3): 182-191.
[5] 黄名选,卢守东,徐辉. 基于加权关联模式挖掘与规则后件扩展的跨语言信息检索 *[J]. 数据分析与知识发现, 2019, 3(9): 77-87.
[6] 张勇,李树青,程永上. 基于频次有效长度的加权关联规则挖掘算法研究 *[J]. 数据分析与知识发现, 2019, 3(7): 85-93.
[7] 蒋翠清,郭轶博,刘尧. 基于中文社交媒体文本的领域情感词典构建方法研究*[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
[8] 何跃, 丰月, 赵书朋, 马玉凤. 基于知乎问答社区的内容推荐研究——以物流话题为例[J]. 数据分析与知识发现, 2018, 2(9): 42-49.
[9] 胡家珩, 岑咏华, 吴承尧. 基于深度学习的领域情感词典自动构建*——以金融领域为例[J]. 数据分析与知识发现, 2018, 2(10): 95-102.
[10] 何跃, 王爱欣, 丰月, 王莉. 基于关联规则的门诊药房布局优化[J]. 数据分析与知识发现, 2018, 2(1): 99-108.
[11] 首欢容, 邓淑卿, 徐健. 基于情感分析的网络谣言识别方法*[J]. 数据分析与知识发现, 2017, 1(7): 44-51.
[12] 程翠琼, 徐健. 面向网络游记时间特征的情感分析模型*[J]. 数据分析与知识发现, 2017, 1(2): 87-95.
[13] 魏星, 胡德华, 易敏寒, 朱启贞, 朱文婕. 基于数据立方体挖掘疾病-基因-药物新关联*[J]. 数据分析与知识发现, 2017, 1(10): 94-104.
[14] 黄名选. 基于矩阵加权关联模式的印尼中跨语言信息检索模型*[J]. 数据分析与知识发现, 2017, 1(1): 26-36.
[15] 王晓耘,袁媛,史玲玲. 基于微博的电影首映周票房预测建模*[J]. 现代图书情报技术, 2016, 32(4): 31-39.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn