Data Analysis and Knowledge Discovery

Select

Impacts of “Poster-Follower” Sentiment on Stock Market Performance

Zhang Ning,Yin Lemin,He Lifeng

Data Analysis and Knowledge Discovery. 2018, 2(6): 1-12. https://doi.org/10.11925/infotech.2096-3467.2017.1174

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] The paper investigates the relationship between the “Bullish Sentiment Index” (BSI) of online reviews/following comments and the performance of stock market. [Methods] First, we conducted sentiment classification for comments on Shanghai Stock Exchange Composite Index using semantic analysis method. Then, we built the sentiment tendencies of these reviews and constructed their “Poster-Follower” BSI. Finally, we used linear and nonlinear models to examine the proposed method empirically. [Results] The BSI based on our proposed method (text mining) could effectively predict the stock market trend, especially on its returns. [Limitations] We only consider two emotional polarities and more research is needed to enhance the sentimental strength. [Conclusions] The Bullish Sentiment Index could effectively predict the overall stock market trend by measuring investors’ sentiment.

Select

Incentive Investments on Information Security for Libraries: An Evolutionary Game-theory Approach

Zhu Guang,Feng Mining,Zhang Weiwei

Data Analysis and Knowledge Discovery. 2018, 2(6): 13-24. https://doi.org/10.11925/infotech.2096-3467.2017.1101

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper analyzes the library’s investment on information security from the benefit and cost perspectives, aiming to improve the effectiveness and efficiency of library security management. [Methods] First, we used the evolutionary game theory to define two players: library and technical enterprise. Then, we explored the intentions of investments on information security. Third, we analyzed the benefits and costs of investments, the payoff matrices and evolutionarily stable strategies (ESS). Finally, we designed an incentive mechanism to enhance the investment on information security. [Results] The investments from libraries and enterprises were correlated with benefit growth and cost reduction. If the benefit growth was small, the game players are less likely to invest. Once the profit growth became big, the game players tend to invest and then generated different strategies. [Limitations] We did not design the nonlinear profit function. Other factors, such as user’s demands and advertisement effects should also be included. [Conclusions] This study promotes the development of information security management in library.

Select

Identifying Competitive Intelligence Based on Knowledge Element

Sun Lin,Wang Yanzhang

Data Analysis and Knowledge Discovery. 2018, 2(6): 25-36. https://doi.org/10.11925/infotech.2096-3467.2017.0996

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study tries to identify competitive intelligence based on implicit correlated knowledge, aiming to help enterprises have upper hands in the fierce competition. [Methods] First, we constructed a knowledge system for competitive intelligence based on the metadata. Then we generated a network with the help of relationship among the attributes of these metadata. Finally we identifed competitive intelligencey through similarity analysis and merging multi-attributes. [Results] We successfully established a network for the properties of knowledge metadata from the enterprise’s financial and sales index, R&D ability and other resources. We identified the business ties based on the intelligence metadata of product HS, and merged the metadata of MGIS market planning. [Limitations] The proposed system could be improved with larger sample size. [Conclusions] This study solves the issues facing complex relation identification and intelligence analysis demands. It also benefits the competitive advantage evaluation, crisis warning, and decision making.

Select

Analyzing Public Opinion from Microblog with Topic Clustering and Sentiment Intensity

Wang Xiufang,Sheng Shu,Lu Yan

Data Analysis and Knowledge Discovery. 2018, 2(6): 37-47. https://doi.org/10.11925/infotech.2096-3467.2017.1107

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper builds a model to monitor the trending topics from microblogs, aiming to deal with the issues of text drifting and quantitation of sentimental polarity. [Methods] First, we proposed a public opinion analysis model based on topic clustering and sentiment intensity. Then, we used the time series regression analysis to predict the sentimental changes among the trending topics. [Results] The prediction accuracy of our model reached 88.97%, which was about 7% higher than the iLab-Edinburgh model. [Limitations] More research is needed to study the early warning mechanisms for emergency events. [Conclusions] The proposed model could improve the prediction accuracy of sentimental changes, which provides an effective way to analyze the public opinion from microblogs.

Select

Impacts and Corrections of Natural Weight on Nonlinear Sci-tech Reviews——Case Study of TOPSIS Method

Yu Liping,Song Xiayun,Wang Zuogong

Data Analysis and Knowledge Discovery. 2018, 2(6): 48-57. https://doi.org/10.11925/infotech.2096-3467.2017.1124

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper explores the implicit natural weight issues facing the scientific and technology review indexes, and then proposes a method to address them. [Methods] First, we analyzed data from the JCR2016 mathematics journals with the help of TOPSIS method, aiming to find the influence of natural weights on the nonlinear evaluation method. Then, we proposed a method increasing the dynamic maximum mean to the standardized level, aiming to eliminate the impacts. [Results] We found that the natural weights posed significant effects to the Nonlinear Evaluation methods. For the weighted method, the design weights, the natural weights and the evaluation methods all affected the actual weights. For the non-weighted method, the natural weights and the evaluation methods affected the actual weights. Eliminating the natural weights could effectively reduce the influence of the evaluation method on the actual weights, which helps the design weights play a bigger role. The distribution of index data also affected the actual weights. [Limitations] The proposed method is still an approximation algorithm, which could not yield the exactly equal means. [Conclusions] To achieve the fair review for the science and technology products, we must pay attention to the natural weights issues, which is a systematic error.

Select

The Correlation Between Altmetrics and Citations

Wu Pengmin,Chen Ting,Wang Xiaomei

Data Analysis and Knowledge Discovery. 2018, 2(6): 58-69. https://doi.org/10.11925/infotech.2096-3467.2018.0354

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper studies the characteristics of the Altmetrics for high quality journal articles, including their correlations with citation numbers, differences in disciplines, and the contribution of sub-indicators. These Altmetrics are also compared with previous results. [Methods] We selected 68 journals from Nature Index as data sources, and used machine learning method to classify papers published by them. Then, we used Spearman correlation test to find relationship between Altmetrics and traditional citation indexes, as well as the contributions of sub-indicators in various disciplines. Finally, we evaluated the effectiveness of using Altmetrics to identify highly-cited papers, with the help of ROC curve analysis. [Results] There were significant differences in the performance of Altmetrics among disciplines. In high-quality journals, the correlation between Altmetrics and citations were enhanced, and the contributions of News, Blog, and Twitter to the Altmetrics were also increased. Altmetrics could help us identify highly cited papers. [Limitations] The data collection period is short, and the data set needs to be expanded based on the characteristics of the disciplines. [Conclusions] Compared with previous research results of full data sets, Altmetrics for high-quality journal articles are unique, and the correlation between Altmetrics and citations is enhanced.

Select

Analyzing Growth Trends and Attachment Mode of Social Blog Tags

Ye Guanghui,Hu Jinglan,Xu Jian,Xia Lixin

Data Analysis and Knowledge Discovery. 2018, 2(6): 70-78. https://doi.org/10.11925/infotech.2096-3467.2017.1311

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study reveals the forming mechanism of network nodes, aiming to examine the growth trend and attachment mode of social blog tags. [Methods] Firstly, we proposed the model of tag growth with the help of statistics and network analysis. Then, we established the categories of tag links and corresponding numbers, as well as summarized the connection rules of newly added tags. Finally, we defined the indicators of degree dependency and examined the probability of tag connection following preferential attachment modes. [Results] The tag growth showed the linear growth pattern and the distribution of tags had one single peak center, the shock left side and the gentle right side, which did not meet the power-law distribution. [Limitations] We did not explain the impacts of users’ tagging behaviors on the network connections. [Conclusions] Neither the “new tag-old tag” nor the “old tag-old tag” models are not fully compliant with the preferential attachment mode.

Select

Identifying E-commerce User Types Based on Complex Network Overlapping Community

Qian Xiaodong,Li Min

Data Analysis and Knowledge Discovery. 2018, 2(6): 79-91. https://doi.org/10.11925/infotech.2096-3467.2018.0101

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper presents an algorithm to identify composite types of e-commerce users, aiming to improve e-commerce operators’ personalized marketing services. [Methods] First, we built the node distance matrix based on the characteristics of user access sequences. Then, we modified the Jaro-Winkler distance algorithm from the perspectives of redefining matching number, editing cost and rules. Third, we used the improved algorithm to calculate the user access sequence distance matrix. Based on the distance matrix, we distinguished the central and non-central users to construct a complex network for identifying user composite types. We used the improved CNM algorithm to obtain the initial user types. With the help of fuzzy membership function for user optimization, we obtained their composite types. [Results] Compared to CONGA, the NMI of the proposed algorithm was improved by 15.60%. The algorithm was also applied to examine the real user’s online data, and its overall clustering coefficient was 10.87% higher than the CONGA. The time complexity of the new algorithm was reduced too. [Limitations] The proposed algorithm needs to set three parameters subjectively. [Conclusions] The user network conforms to the characteristics of a small-world model and has the typical morphology of a complex network. The algorithm can effectively identify the composite types of e-commerce users.

Select

Extracting Topics and Their Relationship from College Student Mentoring

Pang Beibei,Gou Juanqiong,Mu Wenxin

Data Analysis and Knowledge Discovery. 2018, 2(6): 92-101. https://doi.org/10.11925/infotech.2096-3467.2018.0066

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a framework for small-scale knowledge acquisition and modeling, aiming to more effectively manage the College Students’ deep mentoring work. [Methods] Firstly, we used the LDA to identify topics of collected documents, as well as the phrases describing the topics. Secondly, we used the concept hierarchy analysis to get the relations among these topics. Finally, we encoded ontology of the modeling results for knowledge retrieval. [Results] This study further refined the granularity of topic knowledge on the basis of LDA modeling, which reduced the difficulty of topic modeling and describe their relationship. [Limitations] We did not examine the expanded knowledge base generated by the new depth mentoring documents. [Conclusions] The proposed framework supports the modeling and retrieval of multi granularity knowledge from deep counseling, such as identifying problems, communication methods, and guiding skills.

Select

Collaborative Filtering Algorithm Based on Gray Correlation Analysis and Time Factor

Wang Daoping,Jiang Zhongyang,Zhang Boqing

Data Analysis and Knowledge Discovery. 2018, 2(6): 102-109. https://doi.org/10.11925/infotech.2096-3467.2018.0017

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper presents a collaborative filtering algorithm based on gray correlation analysis and time factor, aiming to address the low similarity resolvability and user’s interest drifting issues of the traditional algorithms. [Methods] First, we proposed a new method to calculate user similarity based on gray relational degree. Then, we used the time weight function to improve the Pearson correlation coefficients. Third, we created a hybrid similarity calculation method and made recommendation based on the neighbors of the target user. Finally, we used the MovieLens dataset to examine the new algorithm. [Results] Compared with the traditional collaborative filtering algorithms and those considering gray correlation analysis or time factor alone, the proposed algorithm reduced the mean absolute error (MAE). [Limitations] It takes the proposed algorithm longer time to calculate the hybrid similarity. [Conclusions] The hybrid similarity method improves the accuracy of recommended items for the target users and has a very good commercial promotion prospect.

Please choose a citation manager

Content to export

25 June 2018, Volume 2 Issue 6

模态框（Modal）标题

Please choose a citation manager

Content to export

25 June 2018, Volume 2 Issue 6