Data Analysis and Knowledge Discovery

Select

Review of Expert Retrieval and Expert Ranking Studies

Ye Guanghui,Xia Lixin

Data Analysis and Knowledge Discovery. 2017, 1(2): 1-10. https://doi.org/10.11925/infotech.2096-3467.2017.02.01

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper reviews the expert retrieval and expert ranking literature to provide theoretical foundations for future studies. [Coverage] 65 papers were retrieved from the Web of Science (WOS), CNKI and other databases using the keywords of “expert retrieval”, “expert ranking”, and “ranking fusion”. [Methods] We analyzed research evaluating expert retrieval and fusion rankings, aiming to solve the issues of insufficiency of expert coverage and heavy computation of expert features. [Results] We found that most expert retrieval system adopted the relationship attribute fusion method, and the credibility of search results was decided by the users’ satisfaction and quality of the retrieved documents. Expert ranking was established by FRM, PageRank, D-S theory, social network and complex network analysis. Empirical research showed that the fusion ranking results were generally better than the baseline ones. [Limitations] More comparison of research among different ranking methods was needed. [Conclusions] Related studies help us building expert consulting platform from the perspective of expert information organization, expert selection and expert opinion fusion.

Select

An Early Warning Algorithm for Public Opinion of Safety Emergency

Tian Shihai,Lyu Deli

Data Analysis and Knowledge Discovery. 2017, 1(2): 11-18. https://doi.org/10.11925/infotech.2096-3467.2017.02.02

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study proposes a new early warning model to track the public sentiment online, aiming to improve transparency and responding speed of the safety emergencies. [Methods] We used the modified LSA+SVM algorithm to build an early warning model, which retrieved public opinion data by meta search. [Results] We examined the new model with three different incidents, and found it was practical and fast. The precision rate was 85.75% when the semantic dimension was kept at 10. [Limitations] This method was more effective for the safety incidents drawing public attention and discussion. [Conclusions] The proposed algorithm helps us build an early warning system for public opinion, which provides suggestions to related companies and government organizations.

Select

Identifying Hot Topics from Mobile Complaint Texts

Fang Xiaofei,Huang Xiaoxi,Wang Rongbo,Chen Zhiqun,Wang Xiaohua

Data Analysis and Knowledge Discovery. 2017, 1(2): 19-27. https://doi.org/10.11925/infotech.2096-3467.2017.02.03

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to extract valuable information from large amount of complaint texts with the help of Chinese message processing technologies. [Methods] First, we analyzed the characteristics of the complaint texts, and then clustered them by k-means algorithm. Second, we extracted topics from the texts of each category with the LDA model. In the mean time, we calculated the weight of the word of each topic, as well as the mean of document probability distribution. Third, we analyzed topics with the highest means and used the document supporting rates to identify the trending ones. [Results] The document supporting rates of the topics extracted by this study was three times higher than the average ones. [Limitations] We did not investigate the semantic relationship among the topics. [Conclusions] The LDA model is an effective method to detect hot topics of the mobile complaints and indicates some future studies.

Select

Extracting Keywords with Modified TextRank Model

Xia Tian

Data Analysis and Knowledge Discovery. 2017, 1(2): 28-34. https://doi.org/10.11925/infotech.2096-3467.2017.02.04

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study aims to improve the single document keyword extraction algorithm by adding the world knowledge vector from the Wikipedia to the TextRank model. [Methods] First, we created a new word embedding model based on the Word2Vec model with Wikipedia’s Chinese data. Second, we clustered the nodes of TextRank wordgraph to adjust the voting importance of each cluster. Third, we calculated the random walk probability with additional factors of coverage and location. Finally, we got the node score with iterative computation of the transition matrix, and then selected the Top N words as the needed keywords. [Results] The performance of the new TextRank model was much better than other methods when the Top N value was less than or equal to 7. If we only retrieved three keywords, the F measure reached its maximum value, which was 3.374% higher than the best existing results. When the Top N value was larger than 7, the results were similar to the traditional TextRank method. [Limitations] The computation cost was increased due to the cluster analysis. [Conclusions] The new weighted TextRank model could extract keywords effectively.

Select

Constructing Dynamic Social Tag Cloud for User Interests

Xie Mengyao,Pan Xuwei

Data Analysis and Knowledge Discovery. 2017, 1(2): 35-40. https://doi.org/10.11925/infotech.2096-3467.2017.02.05

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] Social tags can be used for the recommendation and navigation sections of information retrieval systems. This paper proposes a method to construct a dynamic user tag cloud based on the temporal evolution to reveal the changes of user interests. [Methods] We established the tags’ dynamic weights with the forgetting and strengthening characteristics of memory in psychology. Thus, the dynamic user tag cloud reflect user’s changing focus. [Results] Compared with the existing ones, the proposed algorithm could effectively sort the tags, and then make accurate predictions or recommendations. [Limitations] The proposed method performed well over long period of time because user’s interests do not change significantly in a short period of time. [Conclusions] The proposed algorithm could effectively identify user’s interests and then improve the personalized services.

Select

Building Asian Tumor-patients Prognostic Model with Bayesian Network and SEER Database——Case Study of Non-Small Cell Lung Cancer

Yin Bincan,Xin Shichao,Zhang Han,Zhao Yuhong

Data Analysis and Knowledge Discovery. 2017, 1(2): 41-46. https://doi.org/10.11925/infotech.2096-3467.2017.02.06

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study aims to improve the tumor-prognostic assessment for Asian patients who were diagnosed with Non-Small Cell Lung Cancer (NSCLC). The proposed model identifies the influencing factors of the patients’ survival status and predicts their prognostic situation. [Methods] First, we used single factor statistical method and logistic regression to identify the prognostic variables. Second, we employed the Bayesian Network algorithm to construct the prognostic survival model for the Asian NSCLC patients. Finally, we compared the performance of our model with three other algorithms. [Results] The identified prognostic variables include age, tumor size, grade, tumor stage, as well as the lymph nodes ratio. The proposed model could predict NSCLC patients’ prognostic survival status effectively. [Limitations] The SEER database had limited number of prognostic factors, which may influence the prediction accuracy. [Conclusions] The Bayesian Network could help us build optimal prognosis model for cancer patients to improve their survival rates. The proposed model is better than the Decision Tree, Support Vector Machine and Artificial Neural Network models.

Select

Identifying Chinese Microblog Author Gender Based on Dependency

Qi Ruihua

Data Analysis and Knowledge Discovery. 2017, 1(2): 58-63. https://doi.org/10.11925/infotech.2096-3467.2017.02.08

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a new method to indentify the gender of Chinese microblog author with the help of dependency features. [Methods] This study collected public posts from Tencent Microblogs and extracted the dependency features, which were analyzed and compared with existing vocabulary, structure, function words, and part-of-speech tagging features. [Results] A controlled experiment showed that the proposed method obtained the highest values of precision, recall and F-measure. [Limitations] The new method needs to be examined with larger corpus. [Conclusions] The proposed method is the most effective way to identify the gender of microblog author.

Select

Segmenting Chinese Words from Food Safety Emergencies

Zhang Yue,Wang Dongbo,Zhu Danhao

Data Analysis and Knowledge Discovery. 2017, 1(2): 64-72. https://doi.org/10.11925/infotech.2096-3467.2017.02.09

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper examines the automatic word segmentation models, which plays key roles to build databases for food safety administration. We used the statistical learning method based on conditional random field to segment words from food safety emergencies. [Methods] First, we analyzed the length of target words and conducted multiple experiments on the selection and template of word features for the automatic segmentation methods. Second, we identified the impacts of different features and templates to the segmentation results. [Results] We found that selecting more features might not yield better results due to the characteristics interference. About 46.62% of the phrases from the corpus of food safety emergencies only contained two or three words. The first words before and after the current word of the features template pose more effects to the results. [Conclusions] We have identified the optimal feature and template for the automatic segmentation of words and the F score reaches 92.88% with the 5Tag features.

Select

Analyzing Sentiments of Micro-blog Posts Based on Support Vector Machine

Yang Shuang,Chen Fen

Data Analysis and Knowledge Discovery. 2017, 1(2): 73-79. https://doi.org/10.11925/infotech.2096-3467.2017.02.10

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a new method based on the Support Vector Machine to monitor online public opinion. [Methods] We extracted fourteen linguistic characteristics of the micro-blog posts and analysed their sentiments with Support Vector Machine. [Results] The precision, recall and F value of the proposed method were 82.40%, 81.91%, and 82.10%, respectively. [Limitations] The size of training corpus needs to be expanded. [Conclusions] The proposed method could effectively analyze sentiments of micro-blog posts.

Select

Constructing Users Profiles with Content and Gesture Behaviors

Wang Qiangbing,Zhang Chengzhi

Data Analysis and Knowledge Discovery. 2017, 1(2): 80-86. https://doi.org/10.11925/infotech.2096-3467.2017.02.11

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper constructs users profiles by gauging their interests from gesture behaviors and related contents from a mobile article reading system. [Context] Users profiles construction with content and gesture behaviors can identifies users’ mobile reading interests and profiles effectively. [Methods] First, we collected user gesture behaviors (such as tap, double tap, swipe, drag, pinch in/out) as well as corresponding contents from a mobile article reading system. Second, we established the users model based on the collected data and reading time. [Results] Users could find their own reading interests while browsing papers with our system, which help us build users profiles. [Conclusions] Users gesture behaviors reveal their reading interests, which could improve the performance of marketing and personalized recommendation systems.

Select

A Sentiment Analysis Model Based on Temporal Characteristics of Travel Blogs

Cheng Cuiqiong,Xu Jian

Data Analysis and Knowledge Discovery. 2017, 1(2): 87-95. https://doi.org/10.11925/infotech.2096-3467.2017.02.12

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study aims to find the temporal-distribution patterns of tourists’ attitudes towards their destinations through sentiment analysis of travel blogs. [Context] More and more tourists collect information on their destinations from travel blogs, which provide enormous business opportunities. [Methods] We proposed a sentiment analysis model based on temporal characteristics of travel blogs. It includes the following modules: data collection, preprocessing, identifying sentiment words, weight calculation, and analysis. The model was examined with four types of travel blogs. [Results] The number of post with “good” emotion was always higher than others each month. The volatility of “good”, “happiness” and “disgust” emotion was the highest in different months. The volatility emotion over time was not correlated to the number of related travel blogs. There is no relationship between the peak/off seasons and the emotion of tourists. [Conclusions] The proposed model could identify the changing of tourist sentiment over time, which provides new information for tourism managers and potential visitors.

Please choose a citation manager

Content to export

25 February 2017, Volume 1 Issue 2

模态框（Modal）标题

Please choose a citation manager

Content to export

25 February 2017, Volume 1 Issue 2