[Objective] This paper aims to compare the impacts of Chinese word segmenters on the degree of matching between the corpus and the sentiment lexicons. [Methods] We used six Chinese segmenters to process the self-built corpus of book reviews, which were also filtered with four Sentiment Lexicons. Then, we calculated the coverage and the matchings of corpus to each sentiment lexicon, the negative word list and the degree word list. Finally, we computed the ratio of neutral corpus and low-frequency words to the lexicons. [Results] For different sentiment lexicons, the segmenters yielded various results in corpus-lexicon matching, proportion of low-frequency in lexicons, as well as proportion of neutral part in corpus. [Limitations] The corpus size needs to be expanded, and the sentence-level and rule-based testing need to be added. [Conclusions] The word segmenter has significant impacts on the matching between the corpus and sentiment lexicons.
尤众喜,华薇娜,潘雪莲. 中文分词器对图书评论和情感词典匹配程度的影响 *[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
Zhongxi You,Weina Hua,Xuelian Pan. Matching Book Reviews and Essential Sentiment Lexicons with Chinese Word Segmenters. Data Analysis and Knowledge Discovery, 2019, 3(7): 23-33.
( Yang Chao, Feng Shi, Wang Daling , et al. Analysis on Web Public Opinion Orientation Based on Extending Sentiment Lexicon[J]. Journal of Chinese Computer Systems, 2010,31(4):691-695.)
( Guo Shunli, Zhang Xiangxian . Building Sentiment Analysis Dictionary for Chinese Book Reviews[J]. New Technology of Library and Information Service, 2016(2):67-74.)
( Yang Xiaoping, Zhang Zhongxia, Wang Liang , et al. Automatic Construction and Optimization of Sentiment Lexicon Based on Word2Vec[J]. Computer Science, 2017,44(1):42-47, 74.)
( Zhao Yanyan, Qin Bing, Shi Qiuhui , et al. Large-Scale Sentiment Lexicon Collection and Its Application in Sentiment Classification[J]. Journal of Chinese Information Processing, 2017,31(2):187-193.)
( Zhang Yangsen, Sun Kuangyi, Du Cuilan , et al. A Cascaded Construction of Sentiment Classifier for Micro-Blogs[J]. Journal of Chinese Information Processing, 2017,31(5):178-184.)
[7]
黄翼彪 . 开源中文分词器的比较研究[D]. 郑州: 郑州大学, 2013.
[7]
( Huang Yibiao . Comparative Research on Open-Source Chinese Word Segmentation Machines[D]. Zhengzhou: Zhengzhou University, 2013.)
( Yang Haifeng, Chen Mingliang, Zhao Zhen . Analysis on Applicability of Common Chinese Word Segmentation Software in Literature Study of Traditional Chinese Medicine Text[J]. World Science and Technology: Modernization of Traditional Chinese Medicine and Materia Medica, 2017,19(3):536-541.)
( Li Xiangdong, Gao Fan, Ding Cong . Study on Influences of Different Chinese Word Segmentation Methods to Text Automatic Classification Based on LDA Model[J]. Application Research of Computers, 2017,34(1):62-66.)
[10]
Zeng Y, Yang H, Feng Y , et al. A Convolution BiLSTM Neural Network Model for Chinese Event Extraction[J]. Natural Language Understanding and Intelligent Applications, 2016: 275-287.
[11]
Peng H, Cambria E, Hussain A . A Review of Sentiment Analysis Research in Chinese Language[J]. Cognitive Computation, 2017,9(4):423-435.
[12]
Zhang S, Zhang X, Wang H , et al. Chinese Medical Question Answer Matching Using End-to-End Character-Level Multi-Scale CNNs[J]. Applied Sciences, 2017,7(8):767.
( Ni Weijian, Sun Haohao, Liu Tong , et al. An Unsupervised Approach to Optimize Chinese Word Segmentation on Domain Literature[J]. Data Analysis and Knowledge Discovery, 2018,2(2):96-104.)
( Chen Zhao, Xu Ruifeng, Gui Lin , et al. Combining Convolutional Neural Networks and Word Sentiment Sequence Features for Chinese Text Sentiment Analysis[J]. Journal of Chinese Information Processing, 2015,29(6):172-178.)
( Liu Dexi, Nie Jianyun, Zhang Jing , et al. Extracting Sentimental Lexicons from Chinese Microblog: A Classification Method Using N-Gram Features[J]. Journal of Chinese Information Processing, 2016,30(4):193-205.)
( Chen Ke, Liang Bin, Ke Wende , et al. Chinese Micro-Blog Sentiment Analysis Based on Multi-Channels Convolutional Neural Networks[J]. Journal of Computer Research and Development, 2018,55(5):945-957.)
( Cheng Cuiqiong, Xu Jian . A Sentiment Analysis Model Based on Temporal Characteristics of Travel Blogs[J]. Data Analysis and Knowledge Discovery, 2017,1(2):87-95.)
( Liu Xiangchen, Ding Chongming . A Review of the Researches of Modern Chinese Negative Adverbs in the Recent 100 Years[J]. Journal of Jiangxi Normal University: Philosophy and Social Sciences Edition, 2014(6):91-100.)
( Zhang Chenggong, Liu Peiyu, Zhu Zhenfang , et al. A Sentiment Analysis Method Based on a Polarity Lexicon[J]. Journal of Shandong University: Natural Science, 2012,47(3):50-53.)
[21]
Taboada M, Brooke J, Tofiloski M , et al. Lexicon-Based Methods for Sentiment Analysis[J]. Computational Linguistics, 2011,37(2):267-307.
[22]
Liu B . Sentiment Analysis and Opinion Mining[M]. Morgan & Claypool Publishers, 2012.