Data Analysis and Knowledge Discovery

Select

Exploring User Mental Models of Online Music Classification System: Case Study of College Students

Xiang Xue,Yuxiang Zhao

Data Analysis and Knowledge Discovery. 2019, 3(2): 1-12. https://doi.org/10.11925/infotech.2096-3467.2018.0747

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper explores the classification system of online music platform from the perspective of user experience, aiming to optimize music classification and retrieval functions. [Methods] Based on the mental model theory, we chose Wangyi Yun Music as the experimental platform, and invited college students as participants. Then, we conducted two rounds of experiments to investigate the static structure of user mental model for music information interactions. [Results] We obtained multi-level, single-level and hybrid user mental models by clustering the experimental results. [Limitations] Due to the sample size and age issues, our results might not be representative. We did not examine the impacts of geographical and cultural factors on user mental models. [Conclusions] The study provides theoretical foundation and practical guidance for us to optimize music retrieval and user experience.

Select

Examining Reposts of Micro-bloggers with Planned Behavior Theory

Linna Xi,Yongxiang Dou

Data Analysis and Knowledge Discovery. 2019, 3(2): 13-20. https://doi.org/10.11925/infotech.2096-3467.2018.0424

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to explore the influencing factors of Microblog (Weibo) user’s reposting behaviors. [Methods] Based on the theory of planned behavior, we evaluted the sentiment of Weibo users and the impacts of the Weibo timeline on users’ reposting behaviors. [Results] The degree of similarity between the real world and online sentiments of Weibo users’, as well as the number of followers had significant impacts on Weibo user’s reposting behaviors. The timeline feature posed little effect to the user’s reposting behaviors. [Limitations] Only examined users logging in Weibo at a specific time. [Conclusions] This study could improve the performance of public opinion management, personalized recommendation, and advertising campains on Weibo.

Select

Recommending Personalized Contents from Cross-Domain Resources Based on Tags

Jiaxin Ye,Huixiang Xiong

Data Analysis and Knowledge Discovery. 2019, 3(2): 21-32. https://doi.org/10.11925/infotech.2096-3467.2018.0497

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study tries to generate personalized contents from cross-domain resources based on the relationship among online tags. [Methods] First, we proposed a cross-domain resource recommendation model. Then, we identified tags appropriate for cross-domain recommendations. Third, we combined the DBSCAN algorithm with the tag vector to obtain the initial recommendation candidates. Finally, we used the TF-IDF algorithm along with the personalized tags to improve the initial list. [Results] The recall, precision, and F-measure of the resource-based recommendation method were 0.82, 0.75, and 0.78. The recall, precision, and F-measure of the user tag based recommendation method were 0.80, 0.74, and 0.77. Our results were strongly correlated to users’ interests. [Limitations] The number of tags for the initial recommendation candidates was small, which could not fully represent the resources. It is difficult to collect tags for the second round recommendation. [Conclusions] Once tags from different domains are related to each other, we can use them to recommend contents from cross-domain resources.

Select

Conversational Topic Intensity Calculation and Evolution Analysis of WeChat Group

Hongqinling Wang,Zhichao Ba,Gang Li

Data Analysis and Knowledge Discovery. 2019, 3(2): 33-42. https://doi.org/10.11925/infotech.2096-3467.2018.0552

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to study the characteristics of WeChat user interaction and information dissemination by exploring the topic structure and evolution characteristics within the actual WeChat group. [Methods] Taking three typical WeChat group conversation samples as research objects, we introduce the conversation analysis theory in linguistics, and analyze the phenomenon and characteristics of the WeChat group conversation, and design the topic intensity calculation model based on the activeness of membership, the intensity of communication and turn density, and further explore the topic structure characteristics and evolution rules in different types of WeChat groups. [Results] The linguistic phenomena of WeChat group conversations and daily conversations have the sameness and difference. The inclusion of the turn-taking into the topic intensity calculation model has obvious advantages over the number of messages. Different types of WeChat groups respectively own their topic evolution rules. [Limitations] The richness of WeChat group type can be increased. [Conclusions] This study is conducive to grasp the development law of topics in the WeChat group, and is of great significance to the monitoring of Internet public opinion and disaster prevention.

Select

Research on Stock Market Weighted Prediction Method Based on Micro-blog Sentiment Analysis

Mingqing Zhao,Shengqiang Wu

Data Analysis and Knowledge Discovery. 2019, 3(2): 43-51. https://doi.org/10.11925/infotech.2096-3467.2018.0546

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to construct a stock market weighted prediction model based on micro-blog sentiment analysis. [Methods] Combining the Baidu index, using the time difference correlation coefficient and the random forest to select the initial keyword of micro-blog search, through the Web crawler to obtain the micro-blog information, using the text mining technology to deal with the micro-blog text, judge the emotional polarity of the micro-blog after the participle, analysis of influence factors on the influence of micro-blog, using information gain to determine the weight of micro-blog. [Results] The tendency of emotional integration is basically consistent with the trend of stock prices, and the result accuracy rate is higher. [Limitations] A better adjustment function for the frequency of words is needed. Feature selection does not take into account the relationships among features. [Conclusions] The empirical results show that the model has good prediction effect.

Select

Patent Technology Analysis of Microalgae Biofuel Industrial Chain Based on Topic Model

Jie Zhang,Junbo Zhao,Dongsheng Zhai,Ningning Sun

Data Analysis and Knowledge Discovery. 2019, 3(2): 52-64. https://doi.org/10.11925/infotech.2096-3467.2017.1319

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper analyzes microalgae biofuel industrial chain technology and the technology inheritance based on topic model, aiming at promoting technological innovations of this industry in China. [Methods] Firstly, we construct the microalgae biofuel industrial chain model, and build the mapping relationship between the industrial chain, technical topics and patents based on the improved LDA topic method. Then, we discover the R&D subjects and analyze technology development trend. Finally, to draw the patent development map under industrial chain segments, the patent-weighted citation network based on semantic similarity is constructed. [Results] In the aspect of algorithm, this paper achieves more accurate topic identification by the improved LDA method. It also find out the development trend of the microalgae biofuel industrial chain technology, and the technical inheritance of industrial chain segments. [Limitations] This paper only focus on the microalgae biofuel industrial chain technology, and a certain degree of background knowledge on the object industry for researchers is necessary when these models as well as results are applied to other industries. [Conclusions] It identifies the key technical segments and hot spots of microalgae biofuel industry chain, and shows that the achievement of technological innovations in this field needs the coordination of more than one segments.

Select

A Study on the Mechanism of Media Collaboration on the Spread of Internet Public Opinion

Yanshuang Mei,Hengmin Zhu,Jing Wei

Data Analysis and Knowledge Discovery. 2019, 3(2): 65-71. https://doi.org/10.11925/infotech.2096-3467.2018.0613

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to study the mechanism of media collaboration in topic propagation and its application in guiding and controlling the dissemination of public opinion topics. [Methods] Using the method of simulation, the model of public opinion topic propagation under the media synergy effect was constructed, and the influence mechanism of the medium that played synergy, and the time point and duration of its effect on the dissemination of public opinion topics were explored. [Results] The simulation results show that, compared to single media, the collaborative network constructed by multiple media plays a stronger role in promoting topic communication, and is affected by the media intervention time and the media duration. [Limitations] In the simulation experiment, the Internet public opinion carrier network is a real network, but the media collaboration network and topic communication data are all from simulation. [Conclusions] The collaboration between media is an important expression of media in the propagation of topics. The rational use of its synergies is conducive to controlling and guiding the dissemination of public opinion topics scientifically and efficiently.

Select

Study on a Method of Feature Classification Selection Based on χ² Statistics

Zhanglu Tan,Zhaogang Wang,Han Hu

Data Analysis and Knowledge Discovery. 2019, 3(2): 72-78. https://doi.org/10.11925/infotech.2096-3467.2018.0509

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims at improving the application effect by improving χ² statistics. The deficiency of traditional χ² statistics could not guarantee the balance of information between categories and influence the classification effect. [Methods] By analyzing the characteristics selection process of traditional χ² statistics and its limitations, a feature classification selection method based on χ² statistics was proposed, and the feature words of different classes were selected according to the correlation degree between the feature words and each class. [Results] The effect of the improved method on the text classification effect was compared with the SVM as the classification model. The results showed that the feature classification selection method based on χ² statistics made the accuracy, the average classification accuracy, the lowest classification accuracy, the stability and the system running time significantly improved. [Limitations] When the number of feature words selected was small, the difference was not obvious before and after improvement. [Conclusions] The method of feature classification selection based on χ² statistics could effectively improve the stability and generalization performance of the classification model, reduce the fluctuation of classification accuracy and improve the efficiency of classification process.

Select

Automatically Rating Query Ambiguity with Alt-Metrics

Sisi Gui,Xiaojuan Zhang,Xin Wang

Data Analysis and Knowledge Discovery. 2019, 3(2): 79-89. https://doi.org/10.11925/infotech.2096-3467.2018.0449

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to find better alt-metrics for automatically rating query ambiguity. [Methods] First, we chose several existing auto-metrics based on documents, users and queries. Then, we modified one of them with query category occurences. Finally, we examined the relationship between the modified alt-metrics and other automatic or human rating metrics. Their correlations were tested with Pearson and symmetric AP correlation coefficients. Their degrees of agreement were tested with macro average accuracy and macro average F1. [Results] The proposed method showed significant relationship with human rating, and achieved F1 of 0.623 and accuracy of 0.707. [Limitations] Only examined the proposed model with data from online directories.[Conclusions] Automatic rating metrics for query ambuiguity can hardly be replaced by other automatic counterparts. Considering the occurences of top-level categories for each query could improve the degrees of agreement for automatic metrics. Compared to the exisiting automatic metrics, the proposed method can be used to replace the human metrics for query ambiguity.

Select

Construction of an Adverse Drug Reaction Extraction Model Based on Bi-LSTM and CRF

Xiaoxiao Zhu,Zunqi Yang,Jing Liu

Data Analysis and Knowledge Discovery. 2019, 3(2): 90-97. https://doi.org/10.11925/infotech.2096-3467.2018.0617

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] To improve the performance of extracting adverse drug reactions from social media, a method is proposed to deal with non-standard texts in social media. [Methods] This method Bi-LSTM-CRF combined LSTM and CRF, and was implemented using TensorFlow. LSTM Could utilize context information, while CRF Could consider the dependence of output tags. An adverse drug reaction extraction model was constructed based on Bi-LSTM-CRF. [Results] A series of experiments were carried out on the Twitter dataset. The experimental results showed that the proposed Bi-LSTM-CRF method achieved the highest F-measure (0.7963) for adverse drug reaction extraction, compared with other methods, including CRF, forward LSTM, backward LSTM, and Bi-LSTM. [Limitations] The experiments were performed on only one corpus, and the validity of the proposed method need be verified on other data sources. [Conclusions] Combining Bi-LSTM and CRF can effectively deal with non-standard texts in social media. The constructed model in this paper can identify adverse drug reactions effectively and support relevant departments in decision-making.

Select

Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text

Cuiqing Jiang,Yibo Guo,Yao Liu

Data Analysis and Knowledge Discovery. 2019, 3(2): 98-107. https://doi.org/10.11925/infotech.2096-3467.2018.0578

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study aims to construct a domain sentiment lexicon by discovering unrecognized sentiment words from user-generated contents on Chinese social media to apply it to automotive comments sentiment analysis. [Methods] First, words in HowNet are selected as the seeds, and PMI and Word2Vec algorithm are used to calculate the sentiment polarity of the candidates respectively on real automative corpus. Then the results of the two discriminations are judged synthetically according to the ensemble rules. Finally the proposed method was shown effective by the comparison of the sentiment classification experiments. [Results] The accuracy rate of the lexicon constructed according to proposed method is 21.6% higher than that of HowNet. The lexicon constructed by PMI and Word2Vec respectively increase 3.7% and 2.1%. Meanwhile the number of positive and negative emotional words are greatly increased. [Limitations] The source of corpus is single, and it has certain limitations in guiding other fields. [Conclusions] The sentiment lexicon constructed by this method can be applied to sentiment analysis of social media texts effectively.

Select

Knowledge Discovery of Online Health Communities with Weighted Knowledge Network

Juhua Wu,Yu Wang,Ming Li,Shaoyun Cai

Data Analysis and Knowledge Discovery. 2019, 3(2): 108-117. https://doi.org/10.11925/infotech.2096-3467.2018.0619

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper integrates knowledge from fragmented user message, aiming to identify the needs of online health communities with information extraction and popular topic analysis techniques. [Methods] First, we used the Octopus Collector to retrieve posts from the BHC Forum of 39 Health Net. Then, we applied the weighted knowledge network model to explore the data. Finally, with the help of ICTCLAS 2013, BibExcel and Ucinet packages, we conducted word segmentation, obtained word frequencies, as well as filtered and, visualized the data. [Results] We constructed a user knowledge network and sub-networks for knowledge exchanges, user’s attention and the most popular topics. Combining word frequency and attention identified topics and relationship among them. [Limitations] More research is needed to examine the changing topics of different online health communities, replying posts, and time spans. [Conclusions] This study addresses issues facing the fragmented knowledge, and users’ information needs. It supports website knowledge management and medical diagnosis.

Select

The Construction of Digital Medical Information Service Evaluation System Based on User Perceived Value

Jian Li,Mingyue Wang,Luming Xu,Yingchun Tian

Data Analysis and Knowledge Discovery. 2019, 3(2): 118-126. https://doi.org/10.11925/infotech.2096-3467.2018.0488

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper constructs an interactive impact assessment system for medical information services that integrates users, information technology and hospital staff, from the perspective of user perceived value. [Methods] The proportion of statistical indicators and the screening of indicators are proposed based on principal component analysis, the Grey clustering method determines the index correlation matrix and the critical value in order to achieving the index optimization. Establishment of combination method can evaluate the quality of medical information service evaluation system. [Results] The evaluation index system includes 58 indicators and can be divided into 9 categories. The maximum weight proportion of service value dimension is 0.2059, and the minimum weight proportion of risk cost dimension is 0.0405. [Limitations] The scale of the questionnaire data is small. The index scores are determined by a few experts and have a certain degree of subjectivity. [Conclusions] The proposed evaluation system can theoretically provide decision-making basis for hospital medical information service construction, planning and management, and improve the level of hospital medical information service and user's experience satisfaction degree in practical application.

Please choose a citation manager

Content to export

25 February 2019, Volume 3 Issue 2

模态框（Modal）标题

Please choose a citation manager

Content to export

25 February 2019, Volume 3 Issue 2