Home Table of Contents

25 October 2017, Volume 1 Issue 10
    

  • Select all
    |
    Orginal Article
  • Li Hui,Chai Yaqing
    Data Analysis and Knowledge Discovery. 2017, 1(10): 1-11. https://doi.org/10.11925/infotech.2096-3467.2017.0338
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This article tries to quantitatively study the sentiment polarity of online comments base on the targets’ attributes. [Methods] First, we analyzed the comments by their objects, attributes and contents. Then, we extracted the attribute words and the corresponding comment sets. Third, we introduced the attribute factors and calculated their values with the modified TFIDF formula. Finally, we developed a quantitative analysis algorithm based on the attribute features with Python. [Results] Compared to the traditional machine learning classification algorithms (e.g., NB and SVM), our method improved the accuracy of sentiment classification, when the attribute factor was set to equal weight. [Limitations] The comments selection method and the coefficients parameters of the proposed algorithm need to be improved. [Conclusions] Our method could effectively improve the accuracy of the sentiment classification.

  • He Yue,Yin Xiaojia,Zhu Chao
    Data Analysis and Knowledge Discovery. 2017, 1(10): 12-20. https://doi.org/10.11925/infotech.2096-3467.2017.0313
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This study tries to identify the characteristics of consumers, aiming to improve the performance of accurate marketing. [Methods] First, we conducted sentiment analysis of the Weibo texts. Then, we divided the Weibo users into nine groups with Ward clustering technique, and identified their influences. Thirdly, we analyzed each user group from the perspectives of sentiment and influence. Finally, we extracted the users’ characteristics with a modified customer value matrix. [Results] We found significant differences among users’ sentiments on a specific cell phone brand. The fashion-chasing women and IT industry workers were in favor of this brand. They could also convince members of other groups choose the same brand. [Limitations] We only included the common indicators to examine Weibo users’ influences. [Conclusions] The proposed method could effectively identify consumers’ characteristics and promote accurate marketing.

  • Bao Chuhan,Jia Danping,He Lin,Ma Xiaowen,Ai Yuxi
    Data Analysis and Knowledge Discovery. 2017, 1(10): 21-31. https://doi.org/10.11925/infotech.2096-3467.2017.0491
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper studies the figures of Chinese articles in the field of library and information science (LIS), aiming to establish new principles to summarize them. [Methods] We proposed the framework and rules for figure summarization based on manual indexing and features of LIS papers. Then, we evaluated the performance of the new system with the help of SPSS. [Results] Compared with the existing figure-text model, our method could more effectively process information from the figures. [Limitations] We need to extract more information from the figures, analyze the influences of different charts, and add automatic indexing functions to the new system. [Conclusions] The proposed method could effectively summarize figures from the scholarly articles.

  • Wang Zhongqun,Wu Dongsheng,Jiang Sheng,Huang Subin
    Data Analysis and Knowledge Discovery. 2017, 1(10): 32-42. https://doi.org/10.11925/infotech.2096-3467.2017.0408
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper tries to choose credible comments from a large number of online product reviews, aiming to help consumers make purchasing decisions. [Methods] First, we proposed a concept of mainstream feature-opinion pair with the help of big data. Then, we established the credibility ranking model based on the recognition level of feature-opinion pair from different users’ comments. [Results] We found that the mainstream feature-opinions of online product reviews were relatively stable among the users of Taobao, TMall and Jingdong. Compared with existing models, the reviews sorted by our method covered more product features, and their helpfulness was increased by 7.5%. [Limitations] We did not consider the specific semantic situation of the comments while ranking their credibility. [Conclusions] The more mainstream feature-opinion pairs each comment contains, the more credible it is.

  • Li Xiangdong,Ruan Tao,Liu Kang
    Data Analysis and Knowledge Discovery. 2017, 1(10): 43-52. https://doi.org/10.11925/infotech.2096-3467.2017.0702
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper aims to improve the performance of text classification systems with the help of Wikipedia’s feature expansion function. [Methods] First, we established the CDFmax-IDF method based on the modified TF-IDF, which helped retrieve the candidate word list. Then, we used the Wikipedia to extend the document features and calculated the relationship among direct links, categories and indirect links, which decided the semantic relevance of the words. Finally, we proposed an improved LDA model, the wLDA, for the extended feature and text modeling. [Results] The proposed method improved the value of marco-F1 and micro-F1 on Naive Bayes, KNN and SVM classifiers by 1.6%-2.8% and 1.4%-2.7%. [Limitations] We did not include the properties of the words and relationship among them. [Conclusions] The feature expansion method based on the Wikipedia improves the effectiveness of automatic document classification methods.

  • Han Pu,Wang Peng
    Data Analysis and Knowledge Discovery. 2017, 1(10): 53-63. https://doi.org/10.11925/infotech.2096-3467.2017.0503
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This article tries to explore the information dissemination status and process in the social network systems, aiming to reveal the online information evolution mechanism. [Methods] First, we added adjustable parameters to the scale-free network model and the infectious disease model. Then, we executed the modified model on the NetLogo platform to simulate the evolution of public opinion. [Results] We found that the changing propagation rate was a better way to describe the online information dissemination process. We could effectively guide and control the information flow in a large network at the stage with increasing propagation rate. [Limitations] We need better classification method for the target population. [Conclusions] The proposed model could simulate information evolution and then support the online public opinion monitoring, guidance and control.

  • Wu Cong,Zhao Yuxiang,Zhu Qinghua
    Data Analysis and Knowledge Discovery. 2017, 1(10): 64-76. https://doi.org/10.11925/infotech.2096-3467.2017.0475
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper analyzes the characteristics of crowdfunding videos from the perspectives of their status quo, contents and features. [Methods] Based on the theory of task presentation affordance, we first constructed a two-dimensional (initiator and participant) framework to analyze the contents and formats of the videos. Then, we examined the proposed method with zhongchou.com, a Chinese crowdfunding platform. [Results] We found that (I) videos could significantly improve the effectiveness of crowdfunding by attracting more supporters. However, few Chinese crowdfunding cases used videos; (II) the contents and characteristics of the crowdfunding videos had some similarities and differences. [Limitations] The analytical framework needs to be extended, refined, and then examined with other crowdfunding platforms. [Conclusions] This study helps us optimize the design of crowdfunding platforms and improve the performance of related programs.

  • Jia Junzhi,Li Xiao
    Data Analysis and Knowledge Discovery. 2017, 1(10): 77-84. https://doi.org/10.11925/infotech.2096-3467.2017.0366
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper examines the application of the owl:sameAs link in the Web of Data. [Methods] First, we extracted owl:sameAs links from the BTC 2014 dataset. Then, we analyzed the structure of the sample data, as well as their domain names and instance types. [Results] The retrieved links of owl:sameAs were sparse, and most entities only had single connection between each other. [Limitations] The size of our sample data was small, and more comprehensive analysis was needed. [Conclusions] Our study lays some foundations for data integration, ontology alignment, knowledge discovery of the Web of Data.

  • Wang Sili,Liu Wei,Zhu Zhongming,Wu Zhiqiang,Wang Jinping
    Data Analysis and Knowledge Discovery. 2017, 1(10): 85-93. https://doi.org/10.11925/infotech.2096-3467.2017.0783
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper proposes a new system to automatically track, acquire, store and manage scientific information, aiming to support research in related fields. [Methods] We developed the new system based on the CSpace and then solve many technical issues. Then, we examined the new system with marine information. [Results] The proposed system could automatically retrieve multi-source heterogeneous scientific information, which supported the construction of science and technology platform. [Limitations] The information acquisition procedure of the new system was complex, and it cannot retrieve documents from password-protected sites. [Conclusions] The proposed method could expand the CSpace’s data acquisition and integration functions, and might be transferred to other fields.

  • Wei Xing,Hu Dehua,Yi Minhan,Zhu Qizhen,Zhu Wenjie
    Data Analysis and Knowledge Discovery. 2017, 1(10): 94-104. https://doi.org/10.11925/infotech.2096-3467.2017.0641
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This study aims to construct a disease-gene-drug correlation network for diabetes mellitus (DM). [Methods] First, we proposed a new data cube-based approach to construct a disease-gene-drug correlations network for the DM. Then, we measured the associations among the biological entities. [Results] We retrieved the needed data from the PubMed database and constructed three 1-D vertex cubes, three 2-D square cubes and one 3-D disease-gene-drug network, which revealed 411 associations among the 14 subclasses of DM, 23 genes, and 24 drugs. We also constructed 8 optimal disease-gene-drug subnetworks of DM. [Limitations] There were some subjective issues with the data analysis. The changing of user behaviors may also influence the results. [Conclusions] The proposed algorithm is better than the existing ones, which provides new directions for research on customized medical treatments.