Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (5): 54-65    DOI: 10.11925/infotech.2096-3467.2019.1006
Current Issue | Archive | Adv Search |
Extracting Product Properties with Dependency Relationship Embedding and Conditional Random Field
Li Chengliang,Zhao Zhongying(),Li Chao,Qi Liang,Wen Yan
College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
Download: PDF(1028 KB)   HTML ( 10
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper designs multiple word representation methods, aiming to obtain the latent semantic features and extract product properties from reviews.[Methods] First, we used word properties, dependency relationship and embedding techniques to construct three types of word representations, which included basic, structural and category semantic information. Then, we applied conditional random field model to extract product properties with these semantic information.[Results] The accuracy of the proposed method was 3.97% higher than that of the DepREm-CRF.Its F1 value was up to 7.65% better than the popular ones.[Limitations] More research is needed to investigate the relationship between online sentiments and properties.[Conclusions] The proposed method is able to effectively extract properties from product reviews, and lays good foundation for fine-grained sentiment analysis research.

Key wordsAspect Extraction      Dependency Relationship      Conditional Random Field      Comments Analysis      Relationship Embedding     
Received: 05 September 2019      Published: 15 June 2020
ZTFLH:  TP393 G35  
Corresponding Authors: Zhao Zhongying     E-mail: zzysuin@163.com

Cite this article:

Li Chengliang,Zhao Zhongying,Li Chao,Qi Liang,Wen Yan. Extracting Product Properties with Dependency Relationship Embedding and Conditional Random Field. Data Analysis and Knowledge Discovery, 2020, 4(5): 54-65.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.1006     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I5/54

The Framework of DepREm-CRF
符号 含义
D 数据集
Ss D中的第s条评论
wsn s条评论中的第n个单词
wsnpos 单词wsn的词性
wsnlemm 单词wsn的词形
(wsm,wsn) 单词wsm和单词wsn之间存在依存关系
wsn_w 单词wsn的依存关系权重
Gs 基于依存关系得到的第s条评论的依存关系图
SubSentsi 基于Gs得到的第s条评论的第i条依存关系子句
ewsn 单词wsn的依存关系词向量
Cwsn1 单词wsn的依存关系词向量的聚类类别
b 依存类别向量
Cwsn2 单词wsn的多义性聚类类别
The Notations and Descriptions
语义信息类别 内容
基本语义信息 词性标注、词形还原、依存关系权重
结构语义信息 依存关系词向量-聚类
类别语义信息 单词语义类别
The Semantic Representation of Words
类型 文本
原始数据 I love the operating system and the preloaded software
词性标注 PRP VBP DT VBG NN CC DT JJ NN
An Example of Part-of-Speech Tagging
类型 文本
原始数据 I love the operating system and the preloaded software
词形还原 I love the operate system and the load software
An Example of Lemmatization
An Example of BIO
数据集名称 训练集规模(条) 测试集规模(条) 属性规模(条)
L-14 3 045 800 3 012
R-15 1 315 685 2 499
R-16 2 000 676 3 367
Yelp 800 000 200 000 5 867 511
Training Sets and Testing Sets for DepREm-CRF
模型 L-14数据集
P(%) R(%) F1(%)
CRF 83.89 69.42 75.97
CRF+基本 87.02 76.73 81.55
CRF+结构 87.48 76.48 81.61
CRF+类别 86.66 76.19 81.09
DepREm-CRF 87.86 78.31 82.81
The Results of DepREm-CRF with Different Semantic Information
属性类别 模型 属性词集
高频 CRF+基本 price/features/performance/OS/screen/operating system/USB ports/hard drive/speed
CRF+结构 price/features/performance/OS/screen/operating system/USB ports/hard drive/speed/battery life/works
CRF+类别 price/features/performance/OS/screen/operating system/USB ports/hard drive
DepREm-CRF price/features/performance/OS/screen/operating system/USB ports/hard drive/speed/battery life/works/retina display
低频 CRF+基本 battery/Keyboard/itune/screen display/configure/components
CRF+结构 battery/Keyboard/itune/screen display/configure/
components/Microsoft Windows/Microsoft Office
CRF+类别 battery/Keyboard/itune/screen display
DepREm-CRF battery/Keyboard/itune/screen display/configure/components/Microsoft Windows/Microsoft Office/aluminum casing/Screen resolution
An Example of Term Extraction with Different Semantic Information
模型 L-14 R-15 R-16 Yelp
BiLSTM+CRF 80.57 70.83 74.49 80.45
Unsupervised-CRF 75.16 69.73 - -
DE-CNN 81.59 - 74.37 -
MFE-CRF 76.53 70.31 73.81 79.38
DepREm-CRF 82.81 71.96 74.67 84.29
Comparison of DepREm-CRF with Other Competitive Models (F1:%)
The Accuracy of Term Extraction with Different Typed and Scaled Auxiliary Corpora
[1] Luo H, Li T, Liu B, et al. Improving Aspect Term Extraction with Bidirectional Dependency Tree Representation[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2019,27(7):1201-1212.
[2] Yin Y, Wei F, Dong L , et al. Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction [C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016: 2979-2985.
[3] Hu M, Liu B . Mining and Summarizing Customer Reviews [C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004: 168-177.
[4] Liu C L, Hsaio W H, Lee C H, et al. Movie Rating and Review Summarization in Mobile Environment[J]. IEEE Transactions on Systems Man and Cybernetics Part C:Applications and Reviews, 2012,42(3):397-407.
[5] Ghadery E, Movahedi S, Faili H, et al. An Unsupervised Approach for Aspect Category Detection Using Soft Cosine Similarity Measure[OL]. arXivPreprint, arXiv:1812.03361.
[6] Zhang J, Chen D, Lu M. Combining Sentiment Analysis with a Fuzzy Kano Model for Product Aspect Preference Recommendation[J]. IEEE Access, 2018,6:59163-59172.
[7] 郭博, 李守光, 王昊, 等. 电商评论综合分析系统的设计与实现——情感分析与观点挖掘的研究与应用[J]. 数据分析与知识发现, 2017,1(12):1-9.
[7] ( Guo Bo, Li Shouguang, Wang Hao, et al. Examining Product Reviews with Sentiment Analysis and Opinion Mining[J]. Data Analysis and Knowledge Discovery, 2017,1(12):1-9.)
[8] 李伟卿, 王伟军. 基于大规模评论数据的产品特征词典构建方法研究[J]. 数据分析与知识发现, 2018,2(1):41-50.
[8] ( Li Weiqing, Wang Weijun. Building Product Feature Dictionary with Large-Scale Review Data[J]. Data Analysis and Knowledge Discovery, 2018,2(1):41-50.)
[9] 张震, 曾金. 面向用户评论的关键词抽取研究——以美团为例[J]. 数据分析与知识发现, 2019,3(3):36-44.
[9] ( Zhang Zhen, Zeng Jin. Extracting Keywords from User Comments: Case Study of Meituan[J]. Data Analysis and Knowledge Discovery, 2019,3(3):36-44.)
[10] Poria S, Cambria E, Ku L W , et al. A Rule-Based Approach to Aspect Extraction from Product Reviews [C]// Proceedings of the 2nd Workshopon Natural Language Processing for Social Media. 2014: 28-37.
[11] 彭云, 万常选, 江腾蛟, 等. 基于语义约束LDA的商品特征和情感词抽取[J]. 软件学报, 2017,28(3):676-693.
[11] ( Peng Yun, Wan Changxuan, Jiang Tengjiao, et al. Extracting Product Aspects and User Opinions Based on Semantic Constrained LDA Model[J]. Journal of Software, 2017,28(3):676-693.)
[12] Mukherjee A, Liu B . Aspect Extraction ThroughSemi-Supervised Modeling [C]// Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 2012,1:339-348.
[13] Li Y, Qin Z, Xu W, et al. A Holistic Model of Mining Product Aspects and Associated Sentiments from Online Reviews[J]. Multimedia Tools and Applications, 2015,74(23):10177-10194.
[14] Liu Q, Gao Z, Liu B , et al. Automated Rule Selection for Aspect Extraction in Opinion Mining [C]// Proceedings of the 24th International Joint Conference on Artificial Intelligence. 2015: 1291-1297.
[15] 周清清, 章成志. 在线用户评论细粒度属性抽取[J]. 情报学报, 2017,36(5):484-493.
[15] ( Zhou Qingqing, Zhang Chengzhi. Fine-grained Aspect Extraction from Online Customer Reviews[J]. Journal of the China Society for Scientific and Technical Information, 2017,36(5):484-493.)
[16] Peng H, Ma Y, Li Y, et al. Learning Multi-Grained Aspect Target Sequence for Chinese Sentiment Analysis[J]. Knowledge-Based Systems, 2018,148:167-176.
[17] 赵杨, 李齐齐, 陈雨涵, 等. 基于在线评论情感分析的海淘APP用户满意度研究[J]. 数据分析与知识发现, 2018,2(11):19-27.
[17] ( Zhao Yang, Li Qiqi, Chen Yuhan, et al. Examining Consumer Reviews of Overseas Shopping APP with Sentiment Analysis[J]. Data Analysis and Knowledge Discovery, 2018,2(11):19-27.)
[18] Xu H, Liu B, Shu L , et al. Double Embeddings and CNN-Based Sequence Labeling for Aspect Extraction [C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 592-598.
[19] Lafferty J D, McCallum A, Pereira F C N . Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [C]// Proceedings of the 18th International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., 2001: 282-289.
[20] Xiang Y, He H, Zheng J. Aspect Term Extraction Based on MFE-CRF[J]. Information, 2018,9(8):198-213.
[21] Le Q, Mikolov T . Distributed Representations of Sentences and Documents [C]// Proceedings of the 31st International Conference on Machine Learning. 2014: 1188-1196.
[22] Dhingra B, Zhou Z, Fitzpatrick D , et al. Tweet2Vec: Character-Based Distributed Representations for Social Media [C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 269-274.
[23] Moody C E. Mixing DirichletTopic Models and Word Embeddings to Make LDA2Vec[OL]. arXivPreprint,arXiv:1605.02019.
[24] 曾庆田, 戴明弟, 李超, 等. 轨迹数据融合用户表示方法的重要位置发现[J]. 数据分析与知识发现, 2019,3(6):75-82.
[24] ( Zeng Qingtian, Dai Mingdi, Li Chao, et al. Discovering Important Locations with User Representation and Trace Data[J]. Data Analysis and Knowledge Discovery, 2019,3(6):75-82.)
[25] MacAvaney S, Zeldes A . A Deeper Look into Dependency-Based Word Embeddings [C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop. 2018: 40-45.
[26] Ye Z, Zhao H. Syntactic Word Embedding Based on Dependency Syntax and PolysemousAnalysis[J]. Frontiers of Information Technology & Electronic Engineering, 2018,19(4):524-535.
[27] Levy O, Goldberg Y . Dependency-Based Word Embeddings [C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014: 302-308.
[28] Zhao Y, Qin B, Liu T. Encoding Syntactic Representations with a Neural Network for Sentiment Collocation Extraction[J]. Science China-Information Sciences, 2017, 60(11): Article No. 110101.
[29] Li C, Li J, Song Y , et al. Training and Evaluating Improved Dependency-Based Word Embeddings [C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018: 5836-5843.
[30] Blei D M, Ng A Y, Jordan M I. Latent DirichletAllocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
[1] Zhao Ping,Sun Lianying,Tu Shuai,Bian Jianling,Wan Ying. Identifying Scenic Spot Entities Based on Improved Knowledge Transfer[J]. 数据分析与知识发现, 2020, 4(5): 118-126.
[2] Han Huang,Hongyu Wang,Xiaoguang Wang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[3] Huihui Tang,Hao Wang,Zixuan Zhang,Xueying Wang. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[4] Xiaoyu Wang,Bin Li. Automatically Segmenting Middle Ancient Chinese Words with CRFs[J]. 数据分析与知识发现, 2017, 1(5): 62-70.
[5] Dongbo Wang,Yi Wu,Wenhao Ye,Ruilun Liu. Extracting Events of Food Safety Emergencies with Characteristics Knowledge[J]. 数据分析与知识发现, 2017, 1(3): 54-61.
[6] Yue Zhang,Dongbo Wang,Danhao Zhu. Segmenting Chinese Words from Food Safety Emergencies[J]. 数据分析与知识发现, 2017, 1(2): 64-72.
[7] Lin Zhang,Ce Qin,Wenhao Ye. Automatic Recognition of Legal Language Entities Based on Conditional Random Fields[J]. 数据分析与知识发现, 2017, 1(11): 46-52.
[8] He Huixin,Liu Lijuan. A Scientific Research Object Labeling System Based on Active earning[J]. 现代图书情报技术, 2016, 32(3): 67-73.
[9] Jiang Chuntao. Automatic Annotation of Bibliographical References in Chinese Patent Documents[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[10] He Yu, Lv Xueqiang, Xu Liping. A Chinese Term Extraction System in New Energy Vehicles Domain[J]. 现代图书情报技术, 2015, 31(10): 88-94.
[11] Zeng Zhen, Lv Xueqiang, Li Zhuo. The Automatic Identification of Chinese Names in Query Logs[J]. 现代图书情报技术, 2014, 30(12): 71-77.
[12] Tang Yafen. Research of Automatically Recognizing Name in Pre-Qin Ancient Chinese Classics[J]. 现代图书情报技术, 2013, 29(7/8): 63-68.
[13] Lin Chen, Wang Lancheng. Object Recognition of Network Comments Based on Conditional Random Fields[J]. 现代图书情报技术, 2013, (6): 63-67.
[14] Gao Qiang, You Hongliang. Study on Named Entity Recognition Based on Cascaded Model for Field of Defense[J]. 现代图书情报技术, 2012, (11): 47-52.
[15] Wang Hao, Deng Sanhong, Su Xinning. Research on Chinese Keywords Extraction Based on Characters Sequence Annotation[J]. 现代图书情报技术, 2011, 27(12): 39-45.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn