Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (5): 54-65    DOI: 10.11925/infotech.2096-3467.2019.1006
Extracting Product Properties with Dependency Relationship Embedding and Conditional Random Field
Li Chengliang,Zhao Zhongying(),Li Chao,Qi Liang,Wen Yan
College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
Download: PDF(1028 KB)   HTML ( 9
[Objective] This paper designs multiple word representation methods, aiming to obtain the latent semantic features and extract product properties from reviews.[Methods] First, we used word properties, dependency relationship and embedding techniques to construct three types of word representations, which included basic, structural and category semantic information. Then, we applied conditional random field model to extract product properties with these semantic information.[Results] The accuracy of the proposed method was 3.97% higher than that of the DepREm-CRF.Its F1 value was up to 7.65% better than the popular ones.[Limitations] More research is needed to investigate the relationship between online sentiments and properties.[Conclusions] The proposed method is able to effectively extract properties from product reviews, and lays good foundation for fine-grained sentiment analysis research.

Key wordsAspect Extraction      Dependency Relationship      Conditional Random Field      Comments Analysis      Relationship Embedding     
Received: 05 September 2019      Published: 15 June 2020
ZTFLH:  TP393 G35  
Corresponding Authors: Zhao Zhongying

Li Chengliang,Zhao Zhongying,Li Chao,Qi Liang,Wen Yan. Extracting Product Properties with Dependency Relationship Embedding and Conditional Random Field. Data Analysis and Knowledge Discovery, 2020, 4(5): 54-65.

The Framework of DepREm-CRF
符号 含义
D 数据集
Ss D中的第s条评论
wsn s条评论中的第n个单词
wsnpos 单词wsn的词性
wsnlemm 单词wsn的词形
(wsm,wsn) 单词wsm和单词wsn之间存在依存关系
wsn_w 单词wsn的依存关系权重
Gs 基于依存关系得到的第s条评论的依存关系图
SubSentsi 基于Gs得到的第s条评论的第i条依存关系子句
ewsn 单词wsn的依存关系词向量
Cwsn1 单词wsn的依存关系词向量的聚类类别
b 依存类别向量
Cwsn2 单词wsn的多义性聚类类别
The Notations and Descriptions
语义信息类别 内容
基本语义信息 词性标注、词形还原、依存关系权重
结构语义信息 依存关系词向量-聚类
类别语义信息 单词语义类别
The Semantic Representation of Words
类型 文本
原始数据 I love the operating system and the preloaded software
An Example of Part-of-Speech Tagging
类型 文本
原始数据 I love the operating system and the preloaded software
词形还原 I love the operate system and the load software
An Example of Lemmatization
An Example of BIO
数据集名称 训练集规模(条) 测试集规模(条) 属性规模(条)
L-14 3 045 800 3 012
R-15 1 315 685 2 499
R-16 2 000 676 3 367
Yelp 800 000 200 000 5 867 511
Training Sets and Testing Sets for DepREm-CRF
模型 L-14数据集
P(%) R(%) F1(%)
CRF 83.89 69.42 75.97
CRF+基本 87.02 76.73 81.55
CRF+结构 87.48 76.48 81.61
CRF+类别 86.66 76.19 81.09
DepREm-CRF 87.86 78.31 82.81
The Results of DepREm-CRF with Different Semantic Information
属性类别 模型 属性词集
高频 CRF+基本 price/features/performance/OS/screen/operating system/USB ports/hard drive/speed
CRF+结构 price/features/performance/OS/screen/operating system/USB ports/hard drive/speed/battery life/works
CRF+类别 price/features/performance/OS/screen/operating system/USB ports/hard drive
DepREm-CRF price/features/performance/OS/screen/operating system/USB ports/hard drive/speed/battery life/works/retina display
低频 CRF+基本 battery/Keyboard/itune/screen display/configure/components
CRF+结构 battery/Keyboard/itune/screen display/configure/
components/Microsoft Windows/Microsoft Office
CRF+类别 battery/Keyboard/itune/screen display
DepREm-CRF battery/Keyboard/itune/screen display/configure/components/Microsoft Windows/Microsoft Office/aluminum casing/Screen resolution
An Example of Term Extraction with Different Semantic Information
模型 L-14 R-15 R-16 Yelp
BiLSTM+CRF 80.57 70.83 74.49 80.45
Unsupervised-CRF 75.16 69.73 - -
DE-CNN 81.59 - 74.37 -
MFE-CRF 76.53 70.31 73.81 79.38
DepREm-CRF 82.81 71.96 74.67 84.29
Comparison of DepREm-CRF with Other Competitive Models (F1:%)
The Accuracy of Term Extraction with Different Typed and Scaled Auxiliary Corpora
