Data Analysis and Knowledge Discovery

Select

Analyzing Interaction of MOOC Users with Iteration Super Centrality

Wu Jiang,He Chaocheng,Ma Panhao

Data Analysis and Knowledge Discovery. 2017, 1(8): 1-8. https://doi.org/10.11925/infotech.2096-3467.2017.08.01

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper evaluates the activity level of the MOOC forum participants and the quality of the forum themes, aiming to improve the participation of the forum users and increase their social impacts. [Methods] We proposed a new concept and algorithm based on “Iterative Super Centricity” with several iterations till convergence. We used nodes of the entire network to determine their importance and influence. [Results] The proposed ISCen (Iterative Super Centrality) algorithm could measure the importance of nodes and their ability to disseminate knowledge. [Limitations] We only examined one course and did not analyze those super-network indicators. [Conclusions] “Iterative Super Centrality” can reveal the activity level of the forum participants and the quality of the online contents, and then improve the MOOC services.

Select

Personalized Book Recommendation Based on User Preferences and Commodity Features

Hou Yinxiu,Li Weiqing,Wang Weijun,Zhang Tingting

Data Analysis and Knowledge Discovery. 2017, 1(8): 9-17. https://doi.org/10.11925/infotech.2096-3467.2017.08.02

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper identifies the fine-grained preferences of online bookstore users, aiming to optimize the personalized book recommendation service. [Methods] First, we conducted sentiment analysis of the book features through readers’ comments, which indicated their preferences. Then, we calculated the books’ sentiment scores based on the readers’ comments. Finally, the user preferences matrix and the sentiment scores matrix were matched to personalize the book recommendation. [Results] We retrieved the needed data from Amazon’s book comments, and then conducted an experiment to compare the results of our new method with those of the traditional collaborative filtering methods. We found that the proposed method improved the precision, recall and coverage by 0.030, 0.097, 0.2812. [Limitations] We did not consider the impacts of time on user’s preferences, and the feature types might not be comprehensive due to the limited number and quality of Amazon’s book comments. [Conclusions] The proposed method improves the performance of personalized book recommendation service.

Select

Identifying Topics of Online Public Opinion

Li Zhen,Ding Shengchun,Wang Nan

Data Analysis and Knowledge Discovery. 2017, 1(8): 18-30. https://doi.org/10.11925/infotech.2096-3467.2017.08.03

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to identify the topics of online public opinion. [Methods] We constructed a model to extract public opinion based on the information content of the Weibo posts, the relationship among the users, and user behaviors. [Results] We built a public opinion network, extracted and clustered relevant topics, constructed a two-mode network of “user-topic” and evolution of the opinion topics. The proposed method could identify topics of online public opinion effectively. [Limitations] The influence of users’ attributes on topic identification needed to be investigated. [Conclusions] We could identify the topics of online public opinion based on the social network analysis with the help of LDA model.

Select

Analyzing Private College Students’ Online Lifestyle with Web-logs

Chen Runwen,Qiu Yong,Huang Wenbin,Wang Jun

Data Analysis and Knowledge Discovery. 2017, 1(8): 31-38. https://doi.org/10.11925/infotech.2096-3467.2017.0511

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study reveals the private colleage students’ typical online life styles based on their usage of a navigational Web portal. [Methods] First, we collected the click and search data of the navigation page specifically designed for students. Then, we modeled the data and applied the K-means cluster algorithm to categorize the student behaviors. [Results] We found six major behaviors among private college students. However, these students mainly use the Web to watch videos, while only a small number of students use the Web to learn. [Limitations] The size and dimensions of the data need to be expanded. [Conclusions] This study identifies typical online life styles of private college students, which could help schools improve their administraion and services.

Select

Collaboration Recommendation of Finance Research Based on Multi-feature Fusion

Yu Chuanming,Gong Yutian,Zhao Xiaoli,An Lu

Data Analysis and Knowledge Discovery. 2017, 1(8): 39-47. https://doi.org/10.11925/infotech.2096-3467.2017.08.05

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] Research collaboration builds an important social network system. This paper proposes a new recommendation model for research collaboration in finance, aiming to promote the scientific collaboration and improve research productivity. [Methods] First, we established the scientific collaboration networks at individuals, institutions and regions levels. Then, we established a recommendation model based on network neighbors and paths. Finally, we conducted empirical study to examine the model at three levels. [Results] A total of 68 905 articles published from 2000 to 2014 on finance were analyzed to construct their research collaboration networks. The AUC values ??of the proposed model at individual, institutional and regional levels were 84.25%, 87.34%, and 91.84%, respectively, which were higher than those of the traditional algorithms. [Limitations] The training and testing sets were only classified by time. More segmentation methods were needed to optimize the new model. [Conclusions] This study helps researchers find collaboration opportunities, and provides new directions for studies on scientific collaboration networks.

Select

Predicting Online Users’ Ratings with Comments

Zhang Hongli,Liu Jiying,Yang Sinan,Xu Jian

Data Analysis and Knowledge Discovery. 2017, 1(8): 48-58. https://doi.org/10.11925/infotech.2096-3467.2017.08.06

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study aims to build an effective prediction mechanism for online ratings, with the help of Web surfers’ comments. [Methods] We proposed a model with the following modules: Web users’comment acquisition, predictive variable acquisition, prediction analysis and the prediction results evaluation. We retrieved 30 movies of different types and user’s comments from the Web. 27 movies were used to build the model, which were then examined with the remaining movies. [Results] We employed the stepwise regression to select variables, which included the number of raters, the number of participants posting comments, the number of people who wanted to watch the moive and the sentiment value of the positive comments. The prediction results were quite close to the IMDb scores, and the maximum and the minimum differences were 0.0644 and 0.0227. [Limitations] The sample size, the accuracy of sentiment features, and compatibility of the model could be improved. [Conclusions] The proposed model effectively predicts movie scores and detects the “water army” online.

Select

Evaluating Business Reputation with E-Commerce Comments

Wang Yu,Li Xiuxiu

Data Analysis and Knowledge Discovery. 2017, 1(8): 59-67. https://doi.org/10.11925/infotech.2096-3467.2017.08.07

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a new method to evaluate business reputation based on e-commerce comments. [Methods] First, we modified the key word extraction and clustering algorithm based on the HNC theory and text mining methods. Then, we extracted the cluster labels and calculated the weight of each cluster of the collected comments. [Results] We established a business reputation dimension system, with cellphone users’ reviews posted on the Jingdong Online Shopping Platform. [Limitations] Some of the word symbols were generated manually due to the incomplete HNC thesaurus, which posed negative effects to larger-scale comments analysis. [Conclusions] The business reputation evaluation system can identify the commodity features that users really care about.

Select

Identifying Key Nodes in Social Network with Improved PageRank Algorithm

Chen Xiaowei,Shi Yutian

Data Analysis and Knowledge Discovery. 2017, 1(8): 68-75. https://doi.org/10.11925/infotech.2096-3467.2017.08.08

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper modifies the PageRank algorithm for signed network, aiming to identify the key nodes in social network. [Methods] Based on the theory of signed network, we proposed the KeyRank algorithm, which combined the PageRank algorithm with node centrality. We examined the new algorithm with user data from the Slashdot website to obtain every user’s ranking. [Results] The rankings of PageRank algorithm, in-degree and M-PR algorithm had significant medium level positive correlation with the rankings obtained with the KeyRank algorithm. [Limitations] The KeyRank algorithm ignored the interactions between the positive and negative links in each iteration. [Conclusions] There is difference between the rankings of nodes by traditional and KeyRank algorithms. The signed links poses important impacts on the rankings, which shows the improved algorithm’s theoretical and practical significance.

Select

Patent Classification Based on Multi-feature and Multi-classifier Integration

Jia Shanshan,Liu Chang,Sun Lianying,Liu Xiaoan,Peng Tao

Data Analysis and Knowledge Discovery. 2017, 1(8): 76-84. https://doi.org/10.11925/infotech.2096-3467.2017.08.09

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to automatically allocate correct IPC to patent applications with the help of multi-feature and multi-classifier integration method. [Methods] First, we extracted the TFIDF features of all dictionaries and information gains, as well as the vector features of document and topic models from patent applications. Then, we used the collected data to train the NB, SVM, and AdaBoost classifiers. Finally, we established the feature-class matrix and predicted the final IPC with the F1 weight matrix. [Results] We examined our new method with 10 patent classes from 2014 to 2016 in the field of engine and pump. The accuracy of top prediction, all categories, and two guesses were 78.9%, 80.1% and 91.2% respectively. [Limitations] The size of training corpus is limited, which only includes 3 years patent data. [Conclusions] The proposed method could effectively improve the accuracy of patent classification in the field of engine and pump.

Select

Analyzing Textual Sentiment Based on HNC Theory

Gao Ge,Luo Junmei,Wang Yu

Data Analysis and Knowledge Discovery. 2017, 1(8): 85-91. https://doi.org/10.11925/infotech.2096-3467.2017.08.10

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This sutdy proposes a new method to conduct sentiment analysis with comment texts, aiming to deal with the issues facing new online terms. [Methods] Based on the Hierarchical Network of Concepts (HNC) theory, we defined symbols for the new words, which could be processed more efficiently. [Results] The proposed method analyzed the sentiment of the textual message effectively. [Limitations] Our method could only process short texts, while we still need to manually create symbols for the new words. [Conclusions] We proposed an effective way to conduct sentiment analysis.

Select

Interface Services and Applications of Open Data Platform

Weng Danyu,Zhai Jun,Yuan Changfeng,Lin Yan

Data Analysis and Knowledge Discovery. 2017, 1(8): 92-99. https://doi.org/10.11925/infotech.2096-3467.2017.0492

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to find out the differences between the interface services of governmental open data platform in China and developing trends around the world. [Context] During the 13th Five-Year Plan period, China will build a national open data platform to promote the sharing and in-depth utilization of data at all levels, which demands huge amount of interface services. [Methods] We analyzed the major issues facing the open data platform interface services in China, based on the popular international open data platforms and the W3C API best practices. We also introduced the open data protocol OData to discuss key steps for launching standardized API services. [Results] Building OData services, issuing documentation and machine-readable metadata could help our API services follow best practices. [Conclusions] Adopting internationally accepted standards could improve user experience of interface services in China.

Please choose a citation manager

Content to export

25 August 2017, Volume 1 Issue 8

模态框（Modal）标题

Please choose a citation manager

Content to export

25 August 2017, Volume 1 Issue 8