Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (7): 14-22    DOI: 10.11925/infotech.2096-3467.2018.1098
Current Issue | Archive | Adv Search |
Analyzing Sentiment Distribution with Spatial-textual Data of Multi-dimensional Clustering
Ke Li1(),Yuya Sasaki2
1(School of Information Management, Nanjing University, Nanjing 210046, China)
2(Graduate School of Information Science and Technology, Osaka University, Osaka 565-0871, Japan)
Download: PDF(2259 KB)   HTML ( 12
Export: BibTeX | EndNote (RIS)      

[Objective] This paper builds a spatial-textual sentiment analyzing model based on multi-dimensional WaveCluster, aiming to analyze text sentiment and spatial position effectively. [Methods] First, we integrated several datasets from Yelp to build spatial-textual database. Then, we used lexicon-based sentiment analysis to generate feature vector. Third, we proposed a new method using Hybrid model, Textual-Spatial model, as well as multi-dimensional clustering model to analyze the data. [Results] We found that multi-dimensional clustering based on db2 or bior2.2 wavelet can recognize clusters more accurately than DBSCAN and K-means on spatial-textual feature mining. It also achieved the highest speed for data at 100 thousand to 10 million levels. [Limitations] We used unigram model for sentiment analysis, which cannot analyze sentences. [Conclusions] The proposed Textual-Spatial model could find out sentiment tendency distribution from spatial-textual data effectively. The Hybrid model provides a new approach for spatial-textual recommend system to calculate sentiment similarity and spatial proximity simultaneously.

Key wordsSpatial-Textual Data      Sentiment Distribution Analysis      Wavelet Transform      Clustering     
Received: 08 October 2018      Published: 06 September 2019
:  G35  
Corresponding Authors: Ke Li     E-mail:

Cite this article:

Ke Li,Yuya Sasaki. Analyzing Sentiment Distribution with Spatial-textual Data of Multi-dimensional Clustering. Data Analysis and Knowledge Discovery, 2019, 3(7): 14-22.

URL:     OR

[1] Sheikholeslami G, Chatterjee S, Zhang A. WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases [C]// Proceedings of the 24th International Conference on Very Large Data Bases. 1998: 428-439.
[2] Hu H, Li G, Bao Z , et al. Top-k Spatio-Textual Similarity Join[J]. IEEE Transactions on Knowledge and Data Engineering, 2015,28(2):551-565.
[3] 胡卉芪 . 空间文本数据的量质融合与推送[D]. 北京: 清华大学, 2016.
[3] ( Hu Huiqi . A Study on Effective Spatio-Textual Data Integration and Delivery[D]. Beijing: Tsinghua University, 2016.)
[4] 刘思彤 . 空间文本数据的查询处理技术研究[D]. 北京: 清华大学, 2015.
[4] ( Liu Sitong . Key Techniques of Spatio-Textual Query Processing[D]. Beijing: Tsinghua University, 2015.)
[5] Vaid S, Jones C B, Joho H, et al. Spatio-Textual Indexing for Geographical Search on the Web [C]// Proceeding of the 9th International Symposium on Advances in Spatial and Temporal Databases. 2005: 218-235.
[6] Wu D, Jensen C S. A Density-Based Approach to the Retrieval of Top-K Spatial Textual Clusters [C]// Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 2016: 2095-2100.
[7] Arya K K, Goyal V, Navathe S B, et al. Mining Frequent Spatial-Textual Sequence Patterns [C]// Proceedings of the 20th International Conference on Database Systems for Advanced Applications. 2015: 123-138.
[8] Agrawal R, Srikant R. Mining Sequential Patterns [C]// Proceedings of the 11th International Conference on Data Engineering. 1995: 3-14.
[9] Huang W, Li S, Xu S . A Three-Step Spatial-Temporal- Semantic Clustering Method for Human Activity Pattern Analysis[J]. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2016,41(B2):549-552.
[10] Cambria E, Schuller B, Xia Y , et al. New Avenues in Opinion Mining and Sentiment Analysis[J]. IEEE Intelligent Systems, 2013,28(2):15-21.
[11] 周立柱, 贺宇凯, 王建勇 . 情感分析研究综述[J]. 计算机应用, 2008,28(11):2725-2728.
[11] ( Zhou Lizhu, He Yukai, Wang Jianyong . Survey on Research of Sentiment Analysis[J]. Computer Applications, 2008,28(11):2725-2728.)
[12] Turney P D. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews [C]// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002: 417-424.
[13] Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment Classification Using Machine Learning Techniques [C]// Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing. 2002: 79-86.
[14] Mohammad S M, Turney P D. Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon [C]// Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Workshop on Computational Approaches to Analysis and Generation of Emotion in Text. 2010: 26-34.
[15] Liu B, Hu M, Cheng J. Opinion Observer: Analyzing and Comparing Opinions on the Web [C]// Proceedings of the 14th International Conference on World Wide Web. 2005: 342-351.
[16] Zhang T, Ramakrishnan R, Livny M. Birch: An Efficient Data Clustering Method for Very Large Databases [C]// Proceedings of the 1996 ACM Special Interest Group on Management of Data International Conference on Management of Data. 1996,25(2):103-114.
[17] Ester M, Kriegel H P, Sander J, et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise [C]// Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. 1996: 226-231.
[18] Wang W, Yang J, Muntz R R. STING: A Statistical Information Grid Approach to Spatial Data Mining [C]// Proceedings of the 23rd International Conference on Very Large Data Bases. 1997: 186-195.
[19] Horn B K P . Robot Vision[M]. MIT Press, 1986.
[1] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[2] Quan Lu,Anqi Zhu,Jiyue Zhang,Jing Chen. Research on User Information Requirement in Chinese Network Health Community: Taking Tumor-forum Data of Qiuyi as an Example[J]. 数据分析与知识发现, 2019, 3(4): 22-32.
[3] Jiang Wu,Yinghui Zhao,Jiahui Gao. Research on Weibo Opinion Leaders Identification and Analysis in Medical Public Opinion Incidents[J]. 数据分析与知识发现, 2019, 3(4): 53-62.
[4] Lianjie Xiao,Mengrui Gao,Xinning Su. An Under-sampling Ensemble Classification Algorithm Based on Fuzzy C-Means Clustering for Imbalanced Data[J]. 数据分析与知识发现, 2019, 3(4): 90-96.
[5] Jiaxin Ye,Huixiang Xiong. Recommending Personalized Contents from Cross-Domain Resources Based on Tags[J]. 数据分析与知识发现, 2019, 3(2): 21-32.
[6] Tao Zhang,Haiqun Ma. Clustering Policy Texts Based on LDA Topic Model[J]. 数据分析与知识发现, 2018, 2(9): 59-65.
[7] Xiangdong Li,Fan Gao,Youhai Li. Categorizing Documents Automatically within Common Semantic Space[J]. 数据分析与知识发现, 2018, 2(9): 66-73.
[8] Xiufang Wang,Shu Sheng,Yan Lu. Analyzing Public Opinion from Microblog with Topic Clustering and Sentiment Intensity[J]. 数据分析与知识发现, 2018, 2(6): 37-47.
[9] Zhen Yang,Hongjun Wang,Yu Zhou. A Clustering Algorithm with Adaptive Cut-off Distance and Cluster Centers[J]. 数据分析与知识发现, 2018, 2(3): 39-48.
[10] Xiaoting Jia,Mingyang Wang,Yu Cao. Automatic Abstracting of Chinese Document with Doc2Vec and Improved Clustering Algorithm[J]. 数据分析与知识发现, 2018, 2(2): 86-95.
[11] Huixiang Xiong,Jiaxin Ye,Wuxuan Jiang. Clustering Social Tags with Improved DBSCAN Algorithm[J]. 数据分析与知识发现, 2018, 2(12): 77-88.
[12] Minghui Liu. Risk Assessment of Civil Aviation Terrorism Based on K-means Clustering[J]. 数据分析与知识发现, 2018, 2(10): 21-26.
[13] Tingting Wang,Man Han,Yu Wang. Optimizing LDA Model with Various Topic Numbers: Case Study of Scientific Literature[J]. 数据分析与知识发现, 2018, 2(1): 29-40.
[14] Yu Wang,Xiuxiu Li. Evaluating Business Reputation with E-Commerce Comments[J]. 数据分析与知识发现, 2017, 1(8): 59-67.
[15] Xueying Wang,Zixuan Zhang,Hao Wang,Sanhong Deng. Evaluating Brands of Agriculture Products: A Literature Review[J]. 数据分析与知识发现, 2017, 1(7): 13-21.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938