Current Issue
    , Volume 1 Issue 9 Previous Issue    Next Issue
    For Selected: View Abstracts Toggle Thumbnails
    Orginal Article
    Is Big Data Analytics Beyond the Reach of Small Companies?
    Yang Cao,Wenfei Fan,Tengfei Yuan
    2017, 1 (9): 1-7.  DOI: 10.11925/infotech.2096-3467.2017.0723
    Abstract   HTML ( 11

    Big data analytics is often prohibitively costly. It is typically conducted by parallel processing with a cluster of machines, and is considered a privilege of big companies that can afford the resources. This position paper argues that big data analytics is accessible to small companies with constrained resources. As an evidence, we present BEAS, a framework for querying big relations with constrained resources, based on bounded evaluation and data-driven approximation.

    References | Related Articles | Metrics
    Extracting Entity Relationship with Word Embedding Representation Features
    Zhang Qin,Guo Hongmei,Zhang Zhixiong
    2017, 1 (9): 8-15.  DOI: 10.11925/infotech.2096-3467.2017.09.01
    Abstract   HTML ( 9 PDF(464KB) ( 399 )  

    [Objective] This study explores the word embedding representation features for entity relationship extraction, aiming to add semantic message to the existing methods. [Methods] First, we used the feature characteristics at word embedding representation, the vocabulary and the grammar levels to extract relations using Naive Bayesian, Decision Tree and Random Forest models. Then, we obtained the optimal subset of the full features. [Results] The accuracy of the Decision Tree algorithm was 0.48 with full features, which was the best. The F1 score of Member-Collection (E2, E1) was 0.70, and the dependency could help us extract the relations. [Limitations] We need to improve the relation extraction results with small sample size and complex situation. The word vector training method could be further optimized. [Conclusions] This study proves the effectiveness of three types of features. And the word embedding representation level feature plays an important role to extract relations.

    Figures and Tables | References | Related Articles | Metrics
    Identifying Reviews with More Positive Votes——Case Study of Amazon.cn
    Wu Jiang,Liu Wanwan
    2017, 1 (9): 16-27.  DOI: 10.11925/infotech.2096-3467.2017.09.02
    Abstract   HTML ( 5 PDF(550KB) ( 263 )  

    [Objective] This article examines online reviews attracting more positive votes from consumers, aiming to identify those high quality reviews based on the information adoption and negative bias theories. [Methods] First, we retrieved 12 393 reviews on cellphones from Amazon.cn. Then, we investigated the impacts of the review’s characteristics on the numbers of positive votes with the help of zero inflated negative binomial regression and text analysis methods. The characteristics we studied include reviewer’s credibility, review’s quality and extremity. [Results] The usefulness of the reviewer’s previous posting, the information quality of the reviews, the number of comments, the extreme ratings, and the negative level of the reviews helped them receive more positive votes. However, the reviewers bought the products or not, and the number of the previously posted reviews had negative influence on the number of votes. [Limitations] Only investigated cellphones in this study. [Conclusions] This paper helps E-commerce websites improve their review ranking algorithms.

    Figures and Tables | References | Related Articles | Metrics
    Building Product Recommendation Model Based on Tags
    Tu Haili,Tang Xiaobo
    2017, 1 (9): 28-39.  DOI: 10.11925/infotech.2096-3467.2017.09.03
    Abstract   HTML ( 4 PDF(1110KB) ( 374 )  

    [Objective] This paper proposes a personalized product recommendation model based on tags in the social e-commerce environment. [Methods] First, we calculated users’ interests and preferences with the help of tagging frequency and time. Then, we constructed a product ontology of the commercial community based on the tag features and searching conditions of the e-commerce website. Third, we used the ontology to standardize tag semantics, and to classify goods. Fourth, we found clusters containing user preferences, and calculated the similarity between their tags of goods and user preference in the cluster. Finally, we identified the goods which were not tagged but preferred by a specific user. [Results] We examined the model with information of 200 randomly selected active users of popular items from the website of FanDongXi. [Limitations] Only used the frequency and time factor of the users’ tags to calculate their interests and preferences. [Conclusions] The proposed method has better performance than the collaborative filtering recommendation based methods.

    Figures and Tables | References | Related Articles | Metrics
    Evaluating the Influence of China’s Webcast Platforms Based on Link Analysis
    Shi Yutian,Zhu Qinghua,Zhao Yuxiang,Chen Xiaowei
    2017, 1 (9): 40-48.  DOI: 10.11925/infotech.2096-3467.2017.09.04
    Abstract   HTML ( 3 PDF(490KB) ( 188 )  

    [Objective] The article tries to objectively evaluate the influence of China’s webcast Platforms with the help of link analysis. [Methods] First, we used Google search engine and Alexa.com to collect the link data of 20 popular webcast platforms in China. Then, we examined their influence with a modified grey correlation analysis method. [Results] We obtained the ranking of 20 webcast platforms and analyzed their characteristics. [Limitations] We could not obtain comprehensive data from the webcast platforms and the smaple size was limited. [Conclusions] The overall level of current webcast platform is not so good. This article proposes strategies to increase the influence of webcast platforms.

    Figures and Tables | References | Related Articles | Metrics
    Detecting Community in Scientific Collaboration Network with Bayesian Symmetric NMF
    Shi Xiaohua,Lu Hongtao
    2017, 1 (9): 49-56.  DOI: 10.11925/infotech.2096-3467.2017.09.05
    Abstract   HTML ( 3 PDF(2845KB) ( 166 )  

    [Objective] This study proposes and examines a new method to identify the communities in collaboration network of scientific researchers. [Methods] First, we retrieved the need data from information science journal articles published from 2012 to 2016. Then, we used the Automatic Relevance Determination to find the target community with the Bayesian Symmetric Non-negative Matrix Factorization method. Finally, we compared the performance of our method with the existing ones. [Results] The proposed method got better results than others. [Limitations] Did not optimize our data with the researcher identifications. [Conclusions] The proposed method could effectively find communities from the scientific collaboration network.

    Figures and Tables | References | Related Articles | Metrics
    Detecting Events from Official Weibo Profiles Based on Post Clustering with Burst Words
    Gao Yongbing,Yang Guipeng,Zhang Di,Ma Zhanfei
    2017, 1 (9): 57-64.  DOI: 10.11925/infotech.2096-3467.2017.09.06
    Abstract   HTML ( 6 PDF(961KB) ( 132 )  

    [Objective] This paper aims to remove the unrelated information from the official Weibo (micro-blog) profiles, and then retrieves the posts on official events. [Methods] First, we used the word2vec machine learning model to train the official Weibo datasets. Then, we proposed an official micro burst words detection method based on the influence of Weibo posts, the base weight and the related official profiles. Third, we calculated the similarity of blog posts with the burst words, and used hierarchical clustering algorithm to select burst words for the target events. [Results] The proposed algorithm had better precision (63.5%), recall (85.5%) and F values (0.73) than the traditional TF-IDF and TextRank algorithms. [Limitations] The official profiles did not have enough historical data on the events. [Conclusions] The burst words help us detect official events effectively from the official Weibo profiles.

    Figures and Tables | References | Related Articles | Metrics
    Sentiment Analysis of Weibo Opinion Leaders——Case Study of “Illegal Vaccine” Event
    He Yue,Zhu Can
    2017, 1 (9): 65-73.  DOI: 10.11925/infotech.2096-3467.2017.09.07
    Abstract   HTML ( 2 PDF(1681KB) ( 292 )  

    [Objective] This paper tries to identify the opinion leaders of Weibo and examines their roles in information dissemination. [Methods] We adopted, a method of two-step clustering to identify opinion leaders of the “illegal vaccine” event. Then, we created a network matrix for these opinion leaders based on their relationship. Finally, we analyzed the sentiments of the Weibo users to evaluate the role of opinion leaders’ network. [Results] The overall users’ sentiments was negative. The opinion leaders’ network posed significant impacts on the sentiments of average users. [Limitations] Only examined our method with one event. [Conclusions] The celebrities and opinion leaders play important role to sway the public opinion online.

    Figures and Tables | References | Related Articles | Metrics
    Analyzing Online Reviews with Dynamic Sentiment Topic Model
    Li Hui,Hu Yunfeng
    2017, 1 (9): 74-82.  DOI: 10.11925/infotech.2096-3467.2017.09.08
    Abstract   HTML ( 7 PDF(1197KB) ( 282 )  

    [Objective] This paper analyzes online reviews to identify the patterns of their topic contents and sentiments. [Methods] First, we obtained the sentiment of the reviews with the SSTM model. Then, we proposed a DSTM model based on the document, document sentiment distribution and words. Finally, we estimated the distribution of sentiment-topic and the keywords. [Results] We modeled the review datasets by time slice and found the changing trends of contents and sentiments over time. [Limitations] The proposed model did not include the relationship among different subjects, which might generate errors. [Conclusions] The DSTM model, which integrates the external time features, can effectively analyze the evolution of online review topics.

    Figures and Tables | References | Related Articles | Metrics
    Extracting Product Features with Weight-based Apriori Algorithm
    Li Changbing,Pang Chongpeng,Li Meiping
    2017, 1 (9): 83-89.  DOI: 10.11925/infotech.2096-3467.2017.09.09
    Abstract   HTML ( 3 PDF(622KB) ( 215 )  

    [Objective] This paper aims to reduce the noises while extracting product features from customer comments. [Methods] We used the TF-IDF and variance selection methods to extracted the needed data. Then, we set the thresholds to filter the extracted words and obtain the product feature set. Third, we generated frequent item sets with the Apriori algorithm. Finally, we defined various thresholds to obtain the optimal sets, which automatically extracted product features from user comments. [Results] We examined the effectiveness of the proposed method with comment texts on mobile phone products. Comparing the automatically extracted characteristics with the manually identified characteristics, we found that the precision P value was 72.44%, the recall R value was 77.59%, and the comprehensive F value reached 74.93%. [Limitations] The precision needs to be improved and there might be some human errors involving the manually identified terms. [Conclusions] The Apriori algorithm could help us extract product features effectively.

    Figures and Tables | References | Related Articles | Metrics
    Expanding Support Ability of CSpace for Audios and Videos Resources
    Wu Zhiqiang,Zhu Zhongming,Yao Xiaona,Wang Sili
    2017, 1 (9): 90-96.  DOI: 10.11925/infotech.2096-3467.2017.09.10
    Abstract   HTML ( 1 PDF(1486KB) ( 128 )  

    [Objective] The paper aims to expand the supporting ability of the CSpace Institutional Repository for audios and videos. [Context] The ever-growing audios and videos resources, require us to expand the Institutional Repository’s supporting ability, which help us retrieve knowledge and increase their academic values more effectively. [Methods] First, we analyzed the needs of users and the developments of Institutional Repository’s audios and videos supporting services at home and abroad. Then, we constructed an extension framework for the supporting functions. Finally, we chose the key technologies and methods to build the experimental platform, and explored its feasibility in CSpace. [Results] The proposed method helped us change audios and videos clips’ formats, analyze video scenes and develop a video player with scene navigation functions. [Conclusions] The transcoding technology for audios and videos works effectively. However, other supporting functions could be further improved. The format conversion technology for audios and videos in CSpace could expand its supporting services.

    Figures and Tables | References | Related Articles | Metrics
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn