Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (12): 30-40    DOI: 10.11925/infotech.2096-3467.2019.0494
Modeling Users with Word Vector and Term-Graph Algorithm
Hui Nie()
School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China
[Objective] This paper proposes a review-based user modeling method, aiming to improve the personalized information pushing services. [Methods] Firstly, we identified product feature-specific terms from reviews with the help of pre-trained word embedding model. Then, we built a term-specific graph based on semantic correlation among feature-specific words. Finally, we used the TextRank algorithm to compute user’s interest in product features, and model their preferences for products. [Results] User model generated by our new algorithm was consistent with the manually created ones (with nearly 90% semantic correlation). Our F1-score was 0.55, better than those of the classic TF-based word bag models. [Limitations] More manually labeled data and research is needed to improve the domain-specific analysis. [Conclusions] The proposed model helps us better analyze online reviews and develop new application for recommendation system.

Received: 10 May 2019      Published: 25 January 2020
Hui Nie. Modeling Users with Word Vector and Term-Graph Algorithm. Data Analysis and Knowledge Discovery, 2019, 3(12): 30-40.

特征观点抽取规则模板 覆盖率 示例 说明
a(评价)←SBV←n(特征项) 73% 像素(n)挺高(a)的 SBV: 主谓关系
VOB: 动宾关系
ATT: 定中关系
COO: 并列关系
a: 形容词
v: 动词
n: 名词
a(评价)→VOB→v←SBV←n(特征项) 13.8% 就是价钱(n)有(v)点小贵(a)
a(评价)→COO→a(评价)←SBV←n(特征项) 5.6% 屏幕(n)精致(a)漂亮(a)
a(评价)←SBV←v(特征项) 4.2% 运行(v)挺流畅(a)的
a(评价)←SBV←v←ATT←n(特征项) 1.9% 电池(n)续航(v)很给力(a)
未登录词 语义相关特征词/相似度 特征词平均语义关联度 是否归并特征词库
菜单 按钮/0.625, 闪屏/0.619, 截屏/0.591, 图标/0.565, 屏保/0.552 0.591
人脸 人脸识别/0.607, 图像/0.563, 截屏/0.535, 照片/0.488, 成像/0.485 0.536
物美价廉 性价比/0.586, 国产货/0.550, 回头率/0.504, 价钱/0.502, 正品/0.493 0.527
水货 行货/0.741, 国产货/0.603, 换货/0.586, 正品/0.581, 国产机/0.577 0.618
京东 商城/0.348, 物流/0.247, android/0.239, 新品/0.238, 国产/0.236 0.261
华为 ?手机/0.393, 网络/0.330, 电信/0.329, 三星/0.328, IOS/0.324 0.341
用户兴趣模型 模型描述 正确率P (均值) 召回率R (均值) F1(均值)
Semantic_Model 基于Word2Vec的词图模型, $\varepsilon $=0.5 0.4564 0.7582 0.5505
Feature_Model 面向评论内容中的特征词, 基于词频建立的用户兴趣模型 0.4336 0.7339 0.5269
Term_Model 面向评论内容中的词项(名词, 动名词, 动词), 基于词频建立的用户兴趣模型 0.2278 0.7327 0.3322
