Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (1): 99-110    DOI: 10.11925/infotech.2096-3467.2019.0702
Current Issue | Archive | Adv Search |
Identifying Implicit Features with Word Embedding
Hui Nie(),Huan He
School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China
Download: PDF(1126 KB)   HTML ( 6
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] The paper tries to extract implicit features from online reviews, aiming to obtain complete product-specific information and users’ evaluation from reviews.[Methods] We compared the performance of two leading methods for implicit feature extraction, relationship-based inference and classification.Then, we introduced the word embedding model, an online review corpus, and semantic-related words to improve each algorithm’s effectiveness. Finally, we examined the impacts of dataset equilibrium on the algorithms.[Results] To idenfity implicit features, the classification-based methods performed better than those based on relation inference with the non-equilibrium dataset. Word embedding significantly improved the quality of sentence model, which increased the recall and F1 scores by 5.91% and 2.48% respectively. With the equilibrium dataset, the relation-inference methods did a better job and the best F1-score was 0.7503 (word embedding).[Limitations] The size of corpus for training word embedding and the balanced dataset needs to be expanded.[Conclusions] The appropriate modeling schemes based on the target datasets and the equilibrium datasets yield better results. Word embedding helps us optimize the methods for classification.

Key wordsImplicit Feature      Word Embedding      Feature Extraction      Sentiment Analysis     
Received: 18 June 2019      Published: 14 March 2020
ZTFLH:  TP391.1  
Corresponding Authors: Hui Nie     E-mail: issnh@mail.sysu.edu.cn

Cite this article:

Hui Nie,Huan He. Identifying Implicit Features with Word Embedding. Data Analysis and Knowledge Discovery, 2020, 4(1): 99-110.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.0702     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I1/99

Framework of the Research
24]
">
WordVec-Based CBOW and Skip-gram Model[24]
词项 语义关联度 词项 语义关联度
地铁口 0.875 商业街 0.685
地铁 0.869 解放碑 0.685
轻轨站 0.739 春熙路 0.681
公交站 0.727 夫子庙 0.664
王府井 0.694 车站 0.661
Semantic-related Terms of “Metro”
特征类别 特征指示词
员工素质 热情 阿姨 亲切 礼貌 帮忙 客人 行李 顾客 询问 友好
地理位置 方便 地铁站 地铁 分钟 距离 远 步行 走 出行 外滩
酒店服务 入住 退 办理 房 时间 升级 慢 免费 长 现金 手续 安排
清洁程度 烟味 打扫 干净 灰 脏 蟑螂 清洗 清洁 臭 灰尘 整洁
舒适程度/居住环境 吵 安静 舒适 睡 晚上 舒服 休息 静 睡眠 闹 半夜
基本设施/用品配置 洗澡 水 淋浴 旧 床头 公共 方便 无线 提供 双人床 沐浴
价格 元 贵 便宜 收费 收 钱 税 物超所值 高 物有所值
食物 吃 餐 餐點 餐飲 面食 粗粮 套餐 用餐 餐 好吃 Brunch
Feature Indicators in Hotel-Specific Reviews
A Hotel-Specifc Review on Booking.com
特征类别 特征词
员工素质 态度 前台 员工 服务员 店员 服务生 接待员
地理位置 位置 交通 地点 周边 景点 地段
舒适程度/居住环境 氛围 噪音 舒适度 隔音 安全感 安全性 舒适感
基本设施/用品配置 设施 空调 卫生间 浴室 用品 设备 热水 装饰 摆设
价格 价格 性价比 服务费 价位 房价 价钱 费用
食物 早餐 餐饮 美食 早饭 中餐 食品 餐点
清洁程度 味道 气味 清洁度 卫生
酒店服务 效率 服务 服務
A Set of Feature-Terms About Hotel
规则 规则说明
规则1 若标注人员认为评论句评价了酒店的某特征F,则明确标注为F;如果F不属于上述8类中的任何一类,标注为“其他”;
规则2 若是针对酒店整体的评价或表达用户自身感受的评论句,如“大致上都还是很满意的。”,则认为语句不包含评价特征,标注为“无”;
规则3 对于不包含任何特征或含有多个评价特征的评论句,标注为“无”。
Annotation Rules
特征类属 含隐性特征语句量 分布占比(%)
地理位置 694 33.936
酒店服务 416 20.342
基本设施/用品配置 347 16.968
舒适程度/居住环境 259 12.665
员工素质 120 5.867
价格 84 4.108
清洁程度 72 3.521
食物 53 2.593
Distribution of Implicit Feature Dataset
Results of Implicit Feature Identification(Relationship-based Inference Method)
F1-score Based on Three Feature Selection Strategies
Performance of Classifiers Based on Different Feature Selection Models
方法 方案 宏平均
Accuracy P R F1
关系推断法 Coo_score 0.713 0.701 0.499 0.535
Ind_index 0.658 0.597 0.574 0.564
Coo_score+Ind_index 0.729 0.728 0.533 0.576
分类法 词频 0.724 0.603 0.571 0.576
χ2统计 0.725 0.678 0.582 0.607
信息增益 0.742 0.648 0.609 0.619
Experiment Results of Relation-Inference Method and Classification
方法 宏平均
P R F1
信息增益 0.648 0.609 0.619
信息增益+引入关联词 0.638 0.668 0.644
Global Classifying Performance
特征类别 样本占比(%) 引入语义关联词前 引入语义关联词后
P R F1 P R F1
地理位置 33.936 0.886 0.909 0.897 0.943 0.896 0.918
酒店服务 20.342 0.820 0.754 0.783 0.840 0.699 0.761
基本设施/用品配置 16.968 0.572 0.736 0.642 0.605 0.705 0.650
舒适程度/居住环境 12.665 0.702 0.595 0.6435 0.6344 0.6117 0.622
员工素质 5.867 0.655 0.593 0.621 0.549 0.6876 0.609
价格 4.108 0.656 0.567 0.599 0.685 0.7251 0.700
清洁程度 3.521 0.533 0.435 0.454 0.474 0.4657 0.459
食物 2.593 0.363 0.279 0.312 0.374 0.5511 0.430
宏平均 - 0.649 0.609 0.619 0.638 0.668 0.644
Feature-specific Classifiers Performances
Index Changes After Semantic-related Terms Introduced into the Model
A Set of Examples of Implication Feature Identification
Experiment Results in Different Datasets (F1)
数据集 方案 P R F1
不均衡数据集 Coo_score+Ind_index 0.728 0.533 0.576
信息增益法 0.648 0.609 0.619
Coo_score+Ind_index +引入关联词 0.730 0.547 0.590
信息增益+引入关联词 0.638 0.668 0.643
均衡数据集 Coo_score+Ind_index 0.717 0.717 0.705
信息增益法 0.655 0.630 0.630
Coo_score+Ind_index+引入关联词 0.754 0.759 0.750
信息增益+引入关联词 0.726 0.718 0.717
The Results of Different Models in Two Datasets
[1] 刘兵 .情感分析: 挖掘观点、情感和情绪[M]. 刘康, 赵军,译.北京:机械工业出版社, 2017.
[1] ( Liu Bing. Sentiment Analysis: Mining Opinions, Sentiments, and Emotions[M]. Translated by Liu Kang, Zhao Jun. Beijing: China Machine Press, 2017.
[2] Tubishat M, Idris N, Abushariah M A M . Implicit Aspect Extraction in Sentiment Analysis: Review, Taxonomy, Opportunities, and Open Challenges[J]. Information Processing & Management, 2018,54(4):545-563.
[3] Qiu G, Liu B, Bu J , et al. Expanding Domain Sentiment Lexicon Through Double Propagation [C]// Proceedings of the 21st International Joint Conference on Artificial Intelligence. 2009: 1199-1204.
[4] Song H, Fan Y, Liu X , et al. Extracting Product Features from Online Reviews for Sentimental Analysis [C]// Proceedings of the 6th International Conference on Computer Sciences and Convergence Information Technology. 2011: 745-750.
[5] Zhu J, Wang H, Zhu M , et al. Aspect-Based Opinion Polling from Customer Reviews[J]. IEEE Transactions on Affective Computing, 2011,2(1):37-49.
[6] 王伟, 王洪伟, 盛小宝 . 中文在线评论的产品特征与观点识别:跨领域的比较研究[J]. 管理工程学报, 2017,31(4):52-62.
[6] ( Wang Wei, Wang Hongwei, Sheng Xiaobao . Extracting Product Features and Opinions from Chinese Online Reviews: A Comparative Study on Multi-domains[J]. Journal of Industrial Engineering and Engineering Management, 2017,31(4):52-62.)
[7] 唐晓波, 刘广超 . 细粒度情感分析研究综述[J]. 图书情报工作, 2017,61(5):132-140.
[7] ( Tang Xiaobo, Liu Guangchao . Research Review on Fine-grained Sentiment Analysis[J]. Library and Information Service, 2017,61(5):132-140.)
[8] Zhang Y, Zhu W. Extracting Implicit Features in Online Customer Reviews for Opinion Mining [C]// Proceedings of the 22nd International Conference on World Wide Web. ACM, 2013: 103-104.
[9] Sun L, Li S, Li J Y, et al. A Novel Context-based Implicit Feature Extracting Method [C]// Proceedings of the 2014 International Conference on Data Science and Advanced Analytics(DSAA). IEEE, 2014: 420-424.
[10] Schouten K, Frasincar F. Finding Implicit Features in Consumer Reviews for Sentiment Analysis [C]// Proceedings of the 14th International Conference on Web Engineering. Springer, Cham, 2014: 130-144.
[11] Hai Z, Chang K, Kim J J. Implicit Feature Identification via Co-occurrence Association Rule Mining [C]// Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing. 2011: 393-404.
[12] Wang W, Xu H, Wan W . Implicit Feature Identification via Hybrid Association Rule Mining[J]. Expert System with Application, 2013,40(9):3518-3531.
[13] 张莉, 许鑫 . 现代图书情报技术[J].现代图书情报技术,2015(12):42-47.
[13] ( Zhang Li, Xu Xin . Implicit Feature Identification in Product Reviews[J]. New Technology of Library and Information Service, 2015(12):42-47.)
[14] Hai Z, Chang K, Cong G , et al. An Association-Based Unified Framework for Mining Features and Opinion Words[J]. ACM Transactions on Intelligent Systems and Technology, 2015, 6(2): Article No. 26.
[15] Xu H, Zhang F, Wang W . Implicit Feature Identification in Chinese Reviews Using Explicit Topic Mining Model[J]. Knowledge-Based Systems, 2015,76:166-175.
[16] Hajar E H, Mohammed B. Hybrid Approach to Extract Adjectives for Implicit Aspect Identification in Opinion Mining [C]// Proceedings of the 11th International Conference on Intelligent Systems: Theories and Applications(SITA). IEEE, 2016: 1-5.
[17] 邱云飞, 倪学峰, 邵良杉 . 商品隐式评价对象提取的方法研究[J]. 计算机工程与应用, 2015,51(19):114-118.
[17] ( Qiu Yunfei, Ni Xuefeng, Shao Liangshan . Research on Extracting Method of Commodities Implicit Opinion Targets[J]. Computer Engineering and Applications, 2015,51(19):114-118.)
[18] Yan Z, Xing M, Zhang D , et al. EXPRS: An Extended PageRank Method for Product Feature Extraction from Online Consumer Reviews[J]. Information & Management, 2015,52(7):850-858.
[19] 仇光, 郑淼, 张晖 , 等. 基于正则化主题建模的隐式产品属性抽取[J]. 浙江大学学报:工学版, 2011,45(2):288-294.
[19] ( Qiu Guang, Zheng Miao, Zhang Hui , et al. Implicit Product Feature Extraction Through Regularized Topic Modeling[J]. Journal of Zhejiang University: Engineering Science, 2011,45(2):288-294.)
[20] 周清清, 章成志 . 在线用户评论细粒度属性抽取[J]. 情报学报, 2017,36(5):484-493.
[20] ( Zhou Qingqing, Zhang Chengzhi . Fine-grained Aspect Extraction from Online Customer Reviews[J]. Journal of the China Society for Scientific and Technical Information, 2017,36(5):484-493.)
[21] 李良强, 袁华, 叶开 , 等. 基于在线评论词向量表征的产品属性提取[J]. 系统工程学报, 2018,33(5):687-697.
[21] ( Li Liangqiang, Yuan Hua, Ye Kai , et al. Extraction Product Features from Online Reviews Based on Word-Vector-Representation[J]. Journal of Systems Engineering, 2018,33(5):687-697.)
[22] 林江豪, 周咏梅, 阳爱民 , 等. 基于词向量的领域情感词典构建[J]. 山东大学学报:工学版, 2018,48(3):40-47.
[22] ( Lin Jianghao, Zhou Yongmei, Yang Aimin , et al. Building of Domain Sentiment Lexicon Based on Word2Vec[J]. Journal of Shandong University: Engineering Science, 2018,48(3):40-47.)
[23] Bengio Y, Ducharme R, Vincent P , et al. A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003,3(6):1137-1155.
[24] Mikolov T, Chen K, Corrado G , et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301. 3781.
[25] 丁晟春, 孟美任, 李霄 . 面向中文微博的观点句识别研究[J]. 情报学报, 2014,33(2):175-182.
[25] ( Ding Shengchun, Meng Meiren, Li Xiao . Study of Subjective Sentence Identification Oriented to Chinese Microblog[J]. Journal of the China Society for Scientific and Technical Information, 2014,33(2):175-182.)
[26] Che W, Li Z, Liu T. LTP: A Chinese Language Technology Platform [C]// Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations. Association for Computational Linguistics, 2010: 13-16.
[1] Ying Tan,Jin Zhang,Lixin Xia. A Survey of Sentiment Analysis on Social Media[J]. 数据分析与知识发现, 2020, 4(1): 1-11.
[2] Xinyu Zai,Xuedong Tian. Retrieving Scientific Documents with Formula Description Structure and Word Embedding[J]. 数据分析与知识发现, 2020, 4(1): 131-138.
[3] Yan Yu,Lei Chen,Jinde Jiang,Naixuan Zhao. Measuring Patent Similarity with Word Embedding and Statistical Features[J]. 数据分析与知识发现, 2019, 3(9): 53-59.
[4] Yonghua Cen,Zhihao Tan,Chengyao Wu. Impacts of Financial Media Information on Stock Market: An Empirical Study of Sentiment Analysis[J]. 数据分析与知识发现, 2019, 3(9): 98-114.
[5] Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[6] Weicong Lu,Jian Xu. Sentiment Analysis for Online User Reviews Based on Tripartite Network[J]. 数据分析与知识发现, 2019, 3(8): 10-20.
[7] Zhongxi You,Weina Hua,Xuelian Pan. Matching Book Reviews and Essential Sentiment Lexicons with Chinese Word Segmenters[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
[8] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[9] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[10] Peiyao Zhang,Dongsu Liu. Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM[J]. 数据分析与知识发现, 2019, 3(3): 95-101.
[11] Cuiqing Jiang,Yibo Guo,Yao Liu. Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
[12] Jiao Yan,Jing Ma,Kang Fang. Computing Text Semantic Similarity with Syntactic Network of Co-occurrence Distance[J]. 数据分析与知识发现, 2019, 3(12): 93-100.
[13] Fen Chen,Xiaohuan Gao,Yue Peng,Yuan He,Chunxiang Xue. Identifying Weibo Opinion Leaders with Text Sentiment Analysis[J]. 数据分析与知识发现, 2019, 3(11): 120-128.
[14] Qinghong Zhong,Xiaodong Qiao,Yunliang Zhang,Mengjuan Weng. Cross-media Fusion Method Based on LDA2Vec and Residual Network[J]. 数据分析与知识发现, 2019, 3(10): 78-88.
[15] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn