Data Analysis and Knowledge Discovery

Select

Is Big Data Analytics Beyond the Reach of Small Companies?

Yang Cao,Wenfei Fan,Tengfei Yuan

Data Analysis and Knowledge Discovery. 2017, 1(9): 1-7. https://doi.org/10.11925/infotech.2096-3467.2017.0723

Abstract ( ) HTML ( )

Knowledge map

Save

Big data analytics is often prohibitively costly. It is typically conducted by parallel processing with a cluster of machines, and is considered a privilege of big companies that can afford the resources. This position paper argues that big data analytics is accessible to small companies with constrained resources. As an evidence, we present BEAS, a framework for querying big relations with constrained resources, based on bounded evaluation and data-driven approximation.

Select

Extracting Entity Relationship with Word Embedding Representation Features

Zhang Qin,Guo Hongmei,Zhang Zhixiong

Data Analysis and Knowledge Discovery. 2017, 1(9): 8-15. https://doi.org/10.11925/infotech.2096-3467.2017.09.01

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study explores the word embedding representation features for entity relationship extraction, aiming to add semantic message to the existing methods. [Methods] First, we used the feature characteristics at word embedding representation, the vocabulary and the grammar levels to extract relations using Naive Bayesian, Decision Tree and Random Forest models. Then, we obtained the optimal subset of the full features. [Results] The accuracy of the Decision Tree algorithm was 0.48 with full features, which was the best. The F₁ score of Member-Collection (E₂, E₁) was 0.70, and the dependency could help us extract the relations. [Limitations] We need to improve the relation extraction results with small sample size and complex situation. The word vector training method could be further optimized. [Conclusions] This study proves the effectiveness of three types of features. And the word embedding representation level feature plays an important role to extract relations.

Select

Identifying Reviews with More Positive Votes——Case Study of Amazon.cn

Wu Jiang,Liu Wanwan

Data Analysis and Knowledge Discovery. 2017, 1(9): 16-27. https://doi.org/10.11925/infotech.2096-3467.2017.09.02

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This article examines online reviews attracting more positive votes from consumers, aiming to identify those high quality reviews based on the information adoption and negative bias theories. [Methods] First, we retrieved 12 393 reviews on cellphones from Amazon.cn. Then, we investigated the impacts of the review’s characteristics on the numbers of positive votes with the help of zero inflated negative binomial regression and text analysis methods. The characteristics we studied include reviewer’s credibility, review’s quality and extremity. [Results] The usefulness of the reviewer’s previous posting, the information quality of the reviews, the number of comments, the extreme ratings, and the negative level of the reviews helped them receive more positive votes. However, the reviewers bought the products or not, and the number of the previously posted reviews had negative influence on the number of votes. [Limitations] Only investigated cellphones in this study. [Conclusions] This paper helps E-commerce websites improve their review ranking algorithms.

Select

Building Product Recommendation Model Based on Tags

Tu Haili,Tang Xiaobo

Data Analysis and Knowledge Discovery. 2017, 1(9): 28-39. https://doi.org/10.11925/infotech.2096-3467.2017.09.03

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a personalized product recommendation model based on tags in the social e-commerce environment. [Methods] First, we calculated users’ interests and preferences with the help of tagging frequency and time. Then, we constructed a product ontology of the commercial community based on the tag features and searching conditions of the e-commerce website. Third, we used the ontology to standardize tag semantics, and to classify goods. Fourth, we found clusters containing user preferences, and calculated the similarity between their tags of goods and user preference in the cluster. Finally, we identified the goods which were not tagged but preferred by a specific user. [Results] We examined the model with information of 200 randomly selected active users of popular items from the website of FanDongXi. [Limitations] Only used the frequency and time factor of the users’ tags to calculate their interests and preferences. [Conclusions] The proposed method has better performance than the collaborative filtering recommendation based methods.

Select

Evaluating the Influence of China’s Webcast Platforms Based on Link Analysis

Shi Yutian,Zhu Qinghua,Zhao Yuxiang,Chen Xiaowei

Data Analysis and Knowledge Discovery. 2017, 1(9): 40-48. https://doi.org/10.11925/infotech.2096-3467.2017.09.04

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] The article tries to objectively evaluate the influence of China’s webcast Platforms with the help of link analysis. [Methods] First, we used Google search engine and Alexa.com to collect the link data of 20 popular webcast platforms in China. Then, we examined their influence with a modified grey correlation analysis method. [Results] We obtained the ranking of 20 webcast platforms and analyzed their characteristics. [Limitations] We could not obtain comprehensive data from the webcast platforms and the smaple size was limited. [Conclusions] The overall level of current webcast platform is not so good. This article proposes strategies to increase the influence of webcast platforms.

Select

Detecting Community in Scientific Collaboration Network with Bayesian Symmetric NMF

Shi Xiaohua,Lu Hongtao

Data Analysis and Knowledge Discovery. 2017, 1(9): 49-56. https://doi.org/10.11925/infotech.2096-3467.2017.09.05

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study proposes and examines a new method to identify the communities in collaboration network of scientific researchers. [Methods] First, we retrieved the need data from information science journal articles published from 2012 to 2016. Then, we used the Automatic Relevance Determination to find the target community with the Bayesian Symmetric Non-negative Matrix Factorization method. Finally, we compared the performance of our method with the existing ones. [Results] The proposed method got better results than others. [Limitations] Did not optimize our data with the researcher identifications. [Conclusions] The proposed method could effectively find communities from the scientific collaboration network.

Select

Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words

Gao Yongbing,Yang Guipeng,Zhang Di,Ma Zhanfei

Data Analysis and Knowledge Discovery. 2017, 1(9): 57-64. https://doi.org/10.11925/infotech.2096-3467.2017.09.06

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to remove the unrelated information from the official Weibo (micro-blog) profiles, and then retrieves the posts on official events. [Methods] First, we used the word2vec machine learning model to train the official Weibo datasets. Then, we proposed an official micro burst words detection method based on the influence of Weibo posts, the base weight and the related official profiles. Third, we calculated the similarity of blog posts with the burst words, and used hierarchical clustering algorithm to select burst words for the target events. [Results] The proposed algorithm had better precision (63.5%), recall (85.5%) and F values (0.73) than the traditional TF-IDF and TextRank algorithms. [Limitations] The official profiles did not have enough historical data on the events. [Conclusions] The burst words help us detect official events effectively from the official Weibo profiles.

Select

Sentiment Analysis of Weibo Opinion Leaders——Case Study of “Illegal Vaccine” Event

He Yue,Zhu Can

Data Analysis and Knowledge Discovery. 2017, 1(9): 65-73. https://doi.org/10.11925/infotech.2096-3467.2017.09.07

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to identify the opinion leaders of Weibo and examines their roles in information dissemination. [Methods] We adopted, a method of two-step clustering to identify opinion leaders of the “illegal vaccine” event. Then, we created a network matrix for these opinion leaders based on their relationship. Finally, we analyzed the sentiments of the Weibo users to evaluate the role of opinion leaders’ network. [Results] The overall users’ sentiments was negative. The opinion leaders’ network posed significant impacts on the sentiments of average users. [Limitations] Only examined our method with one event. [Conclusions] The celebrities and opinion leaders play important role to sway the public opinion online.

Select

Analyzing Online Reviews with Dynamic Sentiment Topic Model

Li Hui,Hu Yunfeng

Data Analysis and Knowledge Discovery. 2017, 1(9): 74-82. https://doi.org/10.11925/infotech.2096-3467.2017.09.08

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper analyzes online reviews to identify the patterns of their topic contents and sentiments. [Methods] First, we obtained the sentiment of the reviews with the SSTM model. Then, we proposed a DSTM model based on the document, document sentiment distribution and words. Finally, we estimated the distribution of sentiment-topic and the keywords. [Results] We modeled the review datasets by time slice and found the changing trends of contents and sentiments over time. [Limitations] The proposed model did not include the relationship among different subjects, which might generate errors. [Conclusions] The DSTM model, which integrates the external time features, can effectively analyze the evolution of online review topics.

Select

Extracting Product Features with Weight-based Apriori Algorithm

Li Changbing,Pang Chongpeng,Li Meiping

Data Analysis and Knowledge Discovery. 2017, 1(9): 83-89. https://doi.org/10.11925/infotech.2096-3467.2017.09.09

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to reduce the noises while extracting product features from customer comments. [Methods] We used the TF-IDF and variance selection methods to extracted the needed data. Then, we set the thresholds to filter the extracted words and obtain the product feature set. Third, we generated frequent item sets with the Apriori algorithm. Finally, we defined various thresholds to obtain the optimal sets, which automatically extracted product features from user comments. [Results] We examined the effectiveness of the proposed method with comment texts on mobile phone products. Comparing the automatically extracted characteristics with the manually identified characteristics, we found that the precision P value was 72.44%, the recall R value was 77.59%, and the comprehensive F value reached 74.93%. [Limitations] The precision needs to be improved and there might be some human errors involving the manually identified terms. [Conclusions] The Apriori algorithm could help us extract product features effectively.

Select

Expanding Support Ability of CSpace for Audios and Videos Resources

Wu Zhiqiang,Zhu Zhongming,Yao Xiaona,Wang Sili

Data Analysis and Knowledge Discovery. 2017, 1(9): 90-96. https://doi.org/10.11925/infotech.2096-3467.2017.09.10

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] The paper aims to expand the supporting ability of the CSpace Institutional Repository for audios and videos. [Context] The ever-growing audios and videos resources, require us to expand the Institutional Repository’s supporting ability, which help us retrieve knowledge and increase their academic values more effectively. [Methods] First, we analyzed the needs of users and the developments of Institutional Repository’s audios and videos supporting services at home and abroad. Then, we constructed an extension framework for the supporting functions. Finally, we chose the key technologies and methods to build the experimental platform, and explored its feasibility in CSpace. [Results] The proposed method helped us change audios and videos clips’ formats, analyze video scenes and develop a video player with scene navigation functions. [Conclusions] The transcoding technology for audios and videos works effectively. However, other supporting functions could be further improved. The format conversion technology for audios and videos in CSpace could expand its supporting services.

Please choose a citation manager

Content to export

25 September 2017, Volume 1 Issue 9

模态框（Modal）标题

Please choose a citation manager

Content to export

25 September 2017, Volume 1 Issue 9