Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (7): 42-51    DOI: 10.11925/infotech.2096-3467.2018.1017
Current Issue | Archive | Adv Search |
Research on Product Characteristics Extraction and Hedonic Price Based on User Comments
Xiuxian Wen,Jian Xu()
School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China
Download: PDF (703 KB)   HTML ( 16
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a method to extract product characteristics from user comments, aiming to address the issues facing hedonic price research. [Methods] First, we extracted keywords from user comments. Then, we retrieved the product characteristics favored by consumers through keywords clustering, and established the hedonic price model. Finally, we examined the proposed model with the sales of new properties in Guangzhou. [Results] We found seven real estate characteristics of significant consumer preferences from the user comments. The degree of fitting of the model reached 0.760, the DW statistic was 2.013, and the correlation coefficient between user preferences and price of the real estates was 0.989. [Limitations] The experimental data was collected from real estate website only. [Conclusions] The new model based on users comments could accurately evaluate the price of products. It also helps us effectively avoid multiple collinearity problems between independent variables and further explore business and consumer behaviors.

Key wordsHedonic Price      Characteristic Extraction      User CommentsWord      Keywords      Word Vectors     
Received: 11 September 2018      Published: 06 September 2019
ZTFLH:  G350.7  
Corresponding Authors: Jian Xu     E-mail: issxj@mail.sysu.edu.cn

Cite this article:

Xiuxian Wen,Jian Xu. Research on Product Characteristics Extraction and Hedonic Price Based on User Comments. Data Analysis and Knowledge Discovery, 2019, 3(7): 42-51.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.1017     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2019/V3/I7/42

特征 说明
辖区 认为人口密度是衡量某区域居住吸引力的指标, 依据2016年广州市各辖区的常住人口密度, 将广州周边地区(如清远、佛山)编号为1, 广州下辖11区按照常住人口密度越大编号越大的原则编号2-12
地铁 楼盘2km半径范围内的地铁站数量(单位: 个)
户型 由于楼盘是拥有众多户型住宅的集合体, 选取楼盘的最小户型和最大户型分别进行量化(单位: m2)
商业便利设施 楼盘2km为半径范围内的购物点数量、银行数量以及餐饮数量之和(单位: 个)
绿化环境 楼盘自身绿化率, 以百分比表示
学校 楼盘2km为半径范围内的学校数量(单位: 个)
公交 楼盘2km半径范围内的公交站数量(单位: 个)
楼盘 价格(元/m2) 辖区编码 最小户型(m2) 最大户型(m2) 地铁站(个) 商业便利设施(个) 绿化率 学校(个) 公交站(个)
金融街融御 60 000 11 135 140 2 75 40% 25 25
路劲天隽峰 25 500 5 96 227 5 75 45% 25 25
珠江金茂府 49 667 9 109 171 9 75 35% 25 25
保利·中航城 23 000 3 79 126 0 26 30% 2 4
自变量 非标准化系数 标准系数 显著性(双侧) VIF
(常量) 8.840 0.000
辖区(XQ) 0.106 0.560 0.000 2.224
地铁(DT) 0.038 0.237 0.000 1.971
最小户型(XH) 0.002 0.267 0.000 2.121
最大户型(DH) 0.000 -0.028 0.583 2.333
商业便利设施(SY) 0.008 0.198 0.001 3.227
绿化率(LH) 0.057 0.007 0.841 1.278
学校(XX) -0.005 -0.070 0.301 4.265
公交(GJ) -8.751E-005 -0.001 0.985 2.434
模型R2: 0.760 调整后R2: 0.752 Durbin-Watson: 2.013
[1] Lancaster K J . A New Approach to Consumer Theory[J]. Journal of Political Economy, 1966,74(2):132-157.
[2] Rosen S . Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition[J]. Journal of Political Economy, 1974,82(1):34-55.
[3] Laurice J, Bhattacharya R . Prediction Performance of a Hedonic Pricing Model for Housing[J]. The Appraisal Journal, 2005,73(2):198-209.
[4] Belcher R N, Chisholm R A . Tropical Vegetation and Residential Property Value: A Hedonic Pricing Analysis in Singapore[J]. Ecological Economics, 2018,149:149-159.
[5] 袁建文 . 关于房价定价模型的探讨[J]. 统计与决策, 2009(15):10-13.
[5] ( Yuan Jianwen . Discussion on the Pricing Model of House Price[J]. Statistics and Decision, 2009(15):10-13.)
[6] 汤庆园, 徐伟, 艾福利 . 基于地理加权回归的上海市房价空间分异及其影响因子研究[J]. 经济地理, 2012,32(2):52-58.
[6] ( Tang Qingyuan, Xu Wei, Ai Fuli . A GWR-Based Study on Spatial Patten and Structural Determinants of Shanghai’s Housing Price[J]. Economic Geography, 2012,32(2):52-58.)
[7] 李欣点, 朱恩伟, 刘洪玉 , 等. 城市同质化住房价格空间分布研究——基于半参数特征价格模型的分析[J]. 价格理论与实践, 2018(1):61-64.
[7] ( Li Xindian, Zhu Enwei, Liu Hongyu , et al. Spatial Distribution of Constant-quality Housing Price —— Based on a Semi-Parametric Hedonic Pricing Model[J]. Price Theory and Practice, 2018(1):61-64.)
[8] Gibbs C, Guttentag D, Gretzel U , et al. Pricing in the Sharing Economy: A Hedonic Pricing Model Applied to Airbnb Listings[J]. Journal of Travel & Tourism Marketing, 2018,35(1):46-56.
[9] Liebelt V, Bartke S, Schwarz N . Hedonic Pricing Analysis of the Influence of Urban Green Spaces onto Residential Prices: The Case of Leipzig, Germany[J]. European Planning Studies, 2018,26(1):133-157.
[10] 温海珍, 贾生华 . 住宅的特征与特征的价格——基于特征价格模型的分析[J]. 浙江大学学报: 工学版, 2004,38(10):101-105, 112.
[10] ( Wen Haizhen, Jia Shenghua . Housing Characteristics and Hedonic Price: Analysis Based on Hedonic Price Model[J]. Journal of Zhejiang University: Engineering Science, 2004,38(10):101-105, 112.)
[11] 蔡真, 汪利娜 . 住宅市场的价格特征: 以北京为例[J]. 金融评论, 2012,4(6):11-33, 121.
[11] ( Cai Zhen, Wang Lina . Price Dispersion in Beijing Housing Market: An Estimation Based on Hedonic Method[J]. Chinese Review of Financial Studies, 2012,4(6):11-33, 121.)
[12] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality [C]// Proceedings of the 2013 Neural Information Processing Systems. 2013: 3111-3119.
[13] Chen X, Zhang Y, Cao L, et al. An Improved Feature Selection Method for Chinese Short Texts Clustering Based on HowNet [C]// Proceedings of the 2013 International Conference on Computer Engineering and Network. 2014: 635-642.
[14] 常鹏 . 基于词共现的文本主题挖掘模型和算法研究[D]. 天津: 天津大学, 2010.
[14] ( Chang Peng . Research on Terms Co-occurrence Based Models and Algorithms for Text Mining[D]. Tianjin: Tianjin University, 2010.)
[15] 彭云, 万常选, 江腾蛟 , 等. 基于语义约束LDA的商品特征和情感词提取[J]. 软件学报, 2017,28(3):676-693.
[15] ( Peng Yun, Wan Changxuan, Jiang Tengjiao , et al. Extracting Product Aspects and User Opinions Based on Semantic Constrained LDA Model[J]. Journal of Software, 2017,28(3):676-693.)
[16] Liu Y, Pi D, Cheng Q . Ensemble Kernel Method: SVM Classification Based on Game Theory[J]. Journal of Systems Engineering and Electronics, 2016,27(1):251-259.
[17] 李伟卿, 王伟军 . 基于大规模评论数据的产品特征词典构建方法研究[J]. 数据分析与知识发现, 2018,2(1):41-50.
[17] ( Li Weiqing, Wang Weijun . Building Product Feature Dictionary with Large-Scale Review Data[J]. Data Analysis and Knowledge Discovery, 2018,2(1):41-50.)
[18] Comparison of the K-Means and MiniBatchKMeans Clustering Algorithms[EB/OL]. [ 2018- 03- 29]. .
[19] Sculley D. Web-Scale K-Means Clustering[EB/OL]. [ 2018- 03- 29]. .
[20] Sirmans G S, Macpherson D A, Zietz E N . The Composition of Hedonic Pricing Models[J]. Journal of Real Estate Literature, 2005,13(1):3-34.
[1] Mingzhu Sun,Jing Ma,Lingfei Qian. Extracting Keywords Based on Topic Structure and Word Diagram Iteration[J]. 数据分析与知识发现, 2019, 3(8): 68-76.
[2] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[3] Zhen Zhang,Jin Zeng. Extracting Keywords from User Comments: Case Study of Meituan[J]. 数据分析与知识发现, 2019, 3(3): 36-44.
[4] Yuman Li,Zhibo Chen,Fu Xu. Classifying Texts with KACC Model[J]. 数据分析与知识发现, 2019, 3(10): 89-97.
[5] Xu Deshan, Li Hui, Zhang Yunliang. A Method of Keywords Annotation Based on Linked Triples[J]. 现代图书情报技术, 2015, 31(9): 31-37.
[6] Zhang Yingyi, Zhang Chengzhi, Chi Xuehua, Li Lei. Difference Research on Keywords Tagging Behavior for Academic User Blog——A Case Study of ScienceNet.cn[J]. 现代图书情报技术, 2015, 31(10): 13-21.
[7] Wang Hao, Zou Jieli, Deng Sanhong. Model Construction and Experiment Analysis of Automatic Indexing for Chinese Books[J]. 现代图书情报技术, 2013, 29(7/8): 55-62.
[8] Yang He, Yang Yihong, Li Ning. Construction of Keywords-Chinese Library Classification Codes Integrated Thesaurus[J]. 现代图书情报技术, 2013, 29(7/8): 107-113.
[9] Xing Meifeng. Study on Solution to Redundancy of Scientific Literature Keywords[J]. 现代图书情报技术, 2012, 28(1): 34-39.
[10] Teng Guangqing, Bi Qiang, Bao Yulai. An Analysis on Keywords of Literature Based on Granularity Concept Analysis ——A Case Study of Ontology[J]. 现代图书情报技术, 2011, 27(9): 1-6.
[11] Wang Hao, Deng Sanhong, Su Xinning. Research on Chinese Keywords Extraction Based on Characters Sequence Annotation[J]. 现代图书情报技术, 2011, 27(12): 39-45.
[12] Chang Chun, Lai Yuangen. Research on Machine-aided Classification Methods of Domain Concepts[J]. 现代图书情报技术, 2011, 27(10): 34-39.
[13] Cheng Xiao, Lu Bei, Chen Zhiqun. Research on Extraction of Hot Keywords[J]. 现代图书情报技术, 2010, 26(10): 43-48.
[14] Fu Zhenzhen,Lu Wei. The Search Engine Optimizing Strategy and Evaluation Based on Keywords[J]. 现代图书情报技术, 2009, 25(6): 61-65.
[15] Zhang Chengmin,Xu Xin,Zhang Chengzhi. Analysis of the Factors Affecting the Performance of CRF-based Keywords Extraction Model[J]. 现代图书情报技术, 2008, 24(6): 34-40.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn