Data Analysis and Knowledge Discovery

Select

Review of Data Analysis Methods in Measuring Technology Fusion and Trend

Li Shuying,Fang Shu

Data Analysis and Knowledge Discovery. 2017, 1(7): 2-12. https://doi.org/10.11925/infotech.2096-3467.2017.0546

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper reviews literature on technology convergence/fusion, aiming to explore research progress in the field and provide reference for further studies. [Coverage] We retrieved 73 papers in Chinese and English from the Web of Science (WOS), CNKI and other databases using the keywords “Technology Convergence” or “Technology Fusion”. [Methods] We reviewed the concepts of technology convergence/fusion, and related data analysis methods. [Results] We found the number of research on technology convergence/fusion was increasing. Their data analysis methods used patents as indicators, illustrating evolution path with patent citation network, and establishing fusion track with co-classification analysis. [Limitations] More research is needed to compare the hybrid methods. [Conclusions] The data analysis methods for technology convergence/fusion require much optimization, which leaves many knowledge gaps to be filled.

Select

Evaluating Brands of Agriculture Products: A Literature Review

Wang Xueying,Zhang Zixuan,Wang Hao,Deng Sanhong

Data Analysis and Knowledge Discovery. 2017, 1(7): 13-21. https://doi.org/10.11925/infotech.2096-3467.2017.0431

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper analyzes titles of research evaluating brands of agriculture products in China, aiming to summarize the latest developments in this field. [Methods] First, we used the k-means to cluster the retrieved titles. Then, we employed factor analysis, multidimensional scale analysis, and hierarchical clustering analysis to examine the data. [Results] We found the total number of articles published each year, as well as research topics, brand types, evaluation methods and perspectives, and impact factors of these studies. [Limitations] We did not examine keywords and abstracts of the selected literature. [Conclusions] The results of clustering reveals the developments of related research. However, our study does not discuss types of products and methods of interband evaluations.

Select

Evolution Path and Hot Topics of Citizen Science Studies

Zhang Xuanhui,Zhao Yuxiang

Data Analysis and Knowledge Discovery. 2017, 1(7): 22-34. https://doi.org/10.11925/infotech.2096-3467.2017.07.04

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper investigates the origin and status quo of citizen science related studies abroad, aiming to promote the development of similar research in China. [Coverage] We retrieved 1 796 papers from the Web of Science (WOS) core collection, using the keywords “citizen science” or “crowd science”. [Methods] We employed the methods of bibliometrics, social network and content analysis, as well as visualization tools, to illustrate the evolution path of citizen science and the popular research topics. This paper also analyzed the citizen science research in Library and Information Science. [Results] We found that the crowd wisdom and open science paradigm were highly emphasized thanks to the development of the Internet and mobile technologies. Citizen science had been growing rapidly and included the following perspectives, project, theoretical and participant studies. Although the main focus of citizen science research was on natural science, the Library and Information Science had promising outcomes. [Limitations] We did not include conference papers and full text analysis in this study. [Conclusions] Library and Information Science could play an important role in the future study of citizen science.

Select

Review of Information Retrieval Research: Case Study of Conference Papers

Yang Chaofan,Deng Zhonghua,Peng Xin,Liu Bin

Data Analysis and Knowledge Discovery. 2017, 1(7): 35-43. https://doi.org/10.11925/infotech.2096-3467.2017.07.05

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper reviews conference papers on information retrieval, aiming to identify the research hotspots and development trends in this field. [Coverage] Papers published by ACL, ACMMM, ICML, KDD, and SIGIR from 2012 to 2016. [Methods] We first collected these papers’ abstracts and keywords to process them with word segmentation package. Then, we analyzed these data with statistic tests. [Results] We found that mobile search was the most popular topic and the information retrieval models had been optimized. Filtering and recommending received more attention from the researchers. Information retrieval studies established close ties with artificial intelligence. User’s privacy protection and health information retrieval were also popular. [Limitations] Only collected the abstracts and keywords. More research is needed to study the full texts and citations. [Conclusions] This paper presents the latest developments of information retrieval research.

Select

Detecting Online Rumors with Sentiment Analysis

Shou Huanrong,Deng Shuqing,Xu Jian

Data Analysis and Knowledge Discovery. 2017, 1(7): 44-51. https://doi.org/10.11925/infotech.2096-3467.2017.0479

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to identify rumors automatically with the help of sentiment analysis. [Methods] First, we chose high-quality and low-quality information sources. Then, we calculated the sentiment value and difference between the information from different sources. Based on the assumption that the information from high-quality source was more reliable, information from low-quality channels could be listed as rumor if the sentiment difference between them exceeded the pre-set threshold. [Results] We applied the proposed method to information on food and health as well as health and medical issues, and then successfully identified twenty-three rumors from thirty suspected cases. The accuracy rate of rumor detection was 76.67%, the F-value was 83.34%, the recall and precision was 71.42% and 100%, respectively. For non-rumor message, the F-value, recall, and precision were 72.73%, 100% and 57.14%. [Limitations] We did not extract the data automatically from different sources and the sample size was relatively small. [Conclusions] Sentiment analysis could help us identify rumors effectively.

Select

Multi-Label Classification of Chinese Books with LSTM Model

Deng Sanhong,Fu Yuyangzi,Wang Hao

Data Analysis and Knowledge Discovery. 2017, 1(7): 52-60. https://doi.org/10.11925/infotech.2096-3467.2017.0484

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a new method to automatically cataloguing Chinese books based on LSTM model, aiming to solve the issues facing single or multi-label classification. [Methods] First, we introduced deep learning algorithms to construct a new classification system with character embedding technique. Then, we trained the LSTM model with strings consisting of titles and keywords. Finally, we constructed multiple binary classifiers, which were examined with bibliographic data from three universities. [Results] The proposed model performed well and had practical value. [Limitations] We only analyzed five categories of Chinese bibliographies, and the granularity of classification was coarse. [Conclusions] The proposed Chinese book classification system based on LSTM model could preprocess data and learn incrementally, which could be transferred to other fields.

Select

Fine-grained Sentiment Analysis Based on Weibo

Dun Xinhui,Zhang Yunqiu,Yang Kaixi

Data Analysis and Knowledge Discovery. 2017, 1(7): 61-72. https://doi.org/10.11925/infotech.2096-3467.2017.0516

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper conducts a fine-grained sentiment analysis of Weibo posts by dividing the sentiments into eight categories and calculating their intensity values. [Methods] First, we analyzed the Weibo corpus to construct the question word list. Besides the seven sentiments defined by DUTIR, we added “suspected” to the list. Then, we used the Pointwise Mutual Information method, the impacts of negative words and the degree adverbs to construct the expression symbol dictionary. We employed Python to retrieve the needed data from Weibo, and applied the jiebaR package to segment the words. Finally, we classified the sentiments and calculated their intensity. [Results] We got the proportion of eight sentiment categories and sentiment intensity of commonly used drugs for diabetes. The Precision values of “angry” and “sad” were the highest (85.73% and 83.05%), while the Recall and F values of “happy” and “like” were the highest (more than 81%). The Precision, Recall and F values of “suspected” were 77.33%, 78.58% and 77.95% respectively. [Limitations] The sentiment dictionary needs to be expanded. [Conclusions] The proposed model could analyze the sentiment of Weibo Posts more effectively than traditional methods.

Select

Sentiment Analysis in Cross-Domain Environment with Deep Representative Learning

Yu Chuanming,Feng Bolin,An Lu

Data Analysis and Knowledge Discovery. 2017, 1(7): 73-81. https://doi.org/10.11925/infotech.2096-3467.2017.0506

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] The study trains the model with the source domain of rich labeling/tagging data and to project the source and target domain documents into the same feature space. This paper tries to solve the performance issue facing the target domain due to the lack of data. [Methods] First, we collected the Chinese, English and Japanese comments on books, DVDs and music from Amazon. Then, we proposed a Cross Domain Deep Representation Model (CDDRM) based on the Convolutional Neural Network (CNN) and Structural Correspondence Learning (SCL) techniques. Finally, we conducted cross-domain knowledge transfer and sentiment analysis. [Results] We found the best F value of CDDRM was 0.7368, which indicated the effectiveness of the proposed model. [Limitations] The F1 value of our model on long articles needs to be improved. [Conclusions] Transfer learning could help supervised learning obtain good classification results with small training sets. Compared with traditional methods, CDDRM does not require the training and testing sets having same or similar data structure.

Select

Feature Selection Based on Modified QPSO Algorithm

Li Zhipeng,Li Weizhong

Data Analysis and Knowledge Discovery. 2017, 1(7): 82-89. https://doi.org/10.11925/infotech.2096-3467.2017.07.10

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study proposes an algorithm for feature selection aiming to improve the precision and efficiency of text classification. [Methods] First, we selected features based on their characteristics. Then, we constructed the algorithm with extension theory to strengthen its searching ability. Finally, we compared the performance of different methods for text classification. [Results] Compared with IG, MI and QPSO, the proposed algorithm had better accuracy in feature selection. [Limitations] The efficiency of our algorithm needs to be improved. [Conclusions] The modified QPSO Algorithm is an effective way to select features.

Select

Improving Collaborative Filtering Recommendation Based on Trust Relationship Among Users

Xue Fuliang,Liu Junling

Data Analysis and Knowledge Discovery. 2017, 1(7): 90-99. https://doi.org/10.11925/infotech.2096-3467.2017.07.11

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to improve user similarity calculation in collaborative filtering recommendation with trust relationship among them. Once there is no similar user for members of the target group, we recommend the most trusted ones as the similar users. [Methods] First, we retrieved the trusted users as candidates for the similar users. Second, we combined the trusted and the target users to form a project score set, and evaluated the estimated value of the projects receiving no comment from the target group. Third, we quantified the trust relationship among users to form a regulation factor. Finally, we calculated the adjustment factor and created the similarity cluster of users, and made cross-recommendation among similar users. [Results] The collaborative filtering recommendation method based on trust relationship had better performance than traditional ones. [Limitations] Only examined the new method with one sample dataset with trusted relationship. More research is needed to test the proposed method with other datasets. [Conclusions] The trusted relationship among users contains valuable information, which could be used to calculate user similarity for collaborative filtering recommendation services, and then effectively solves the sparsity and cold start issue.

Please choose a citation manager

Content to export

25 July 2017, Volume 1 Issue 7

模态框（Modal）标题

Please choose a citation manager

Content to export

25 July 2017, Volume 1 Issue 7