Data Analysis and Knowledge Discovery

Select

Recognizing Core Topic Sentences with Improved TextRank Algorithm Based on WMD Semantic Similarity

Wang Zixuan,Le Xiaoqiu,He Yuanbiao

Data Analysis and Knowledge Discovery. 2017, 1(4): 1-8. https://doi.org/10.11925/infotech.2096-3467.2017.04.01

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to automatically recognize key sentences describing the research topics of scientific papers. [Methods] First, we used paper sections as the unit to organize sentence sets. Then, we calculated the WMD distance between sentences by trained domain word embeddings. Third, we optimized the iterative process of TextRank algorithm, and used external features to adjust sentence’s weights. Finally, we identified the core topic sentences according to the sentence’s weights descendingly. [Results] We examined the proposed method with scientific papers on climate changes and compared it with the traditional TextRank algorithm. The recognition efficiency (F-value) was about 5% higher than that of the TextRank algorithm. [Limitations] The extraction of sentence features needs to be improved, and word embedding training and related parameters of the proposed method need to be further optimized. [Conclusions] The improved TextRank algorithm, could effectively recognize inner core sentences of scientific paper sections. It could recognize core topic sentences of a paper with the adjusted weights of external features.

Select

Analyzing Dynamic Informational, Navigational and Transactional Online Queries

Zhang Xiaoojuan

Data Analysis and Knowledge Discovery. 2017, 1(4): 9-19. https://doi.org/10.11925/infotech.2096-3467.2017.04.02

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to improve the performance of search engines optimization through analyzing dynamic informational, navigational and transactional online queries. [Methods] First, the author analyzed user intentions with queries, Web documents and the information needs. Second, for each category of query intention, this paper investigated the changing of Web documents and information needs for different trending queries. [Results] The distribution of popular informational, transactional and navigational queries were different. The informational queries were more dependent on Web documents and needs than the other two types of queries. [Limitations] The data for this study was collected in 29 days. More research is needed to automatically identify and aggregate the popular queries. [Conclusions] Search engines need to list diversified results for informational queries. They need to keep the relevant pages on the first page for navigational queries, maintain the original ranking of relevant pages for the user behavior-related queries, and improve the novelty of results for the entertainment-related queries.

Select

Analyzing Academic Community Based on Co-author Network

Qing Yaxian,Li Rui,Wu Huayi

Data Analysis and Knowledge Discovery. 2017, 1(4): 20-29. https://doi.org/10.11925/infotech.2096-3467.2017.04.03

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to categorize and evaluate the academic communities and identify their development and changing rules by analyzing the co-author network. [Methods] First, we used the fast community discovering algorithm to locate the academic communities from the co-author network. Then, we proposed an index to evaluate the academic impacts of scholars from the retrieved academic communities. Finally, we chose three most influential communities to explore their life cycles. [Results] Based on the data of the Journal of Annals of the Association of American Geographers, the proposed index could effectively identify leading researchers as well as the changing trends of research topics. [Limitations] We only collected data from one journal, which might yield in-complete results. [Conclusions] The proposed method could analyze the academic community from various aspects and provide scientific knowledge for scholars from different fields.

Select

Recognizing Dynamic Academic Impacts of Scholars Based on Cooperative Network

Fan Ruxia,Zeng Jianxun,Gao Yaruixi

Data Analysis and Knowledge Discovery. 2017, 1(4): 30-37. https://doi.org/10.11925/infotech.2096-3467.2017.04.04

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to identify the high cooperative scholars and their dynamic academic impacts with the help of high cooperative scholar recognition algorithm and the scholar impact recognition algorithm. [Methods] First, we identified the high cooperation scholars based on the number of collaborators. Then, we estimated the impacts of these scholars and their teams with the amount of publications and degree of centrality. [Results] The number of highly cooperative scholars varied among the teams. The dynamic academic impacts of highly cooperative scholars were either growing steadily or fluctuating maturely. [Limitations] Only used two indicators to measure the impacts of scholars. More indicators were needed to analyze the complex cases. [Conclusions] The proposed method could effectively identify the highly cooperative scholars of the team and their dynamic academic impacts.

Select

Recommending Scientific Research Collaborators with Link Prediction and Extremely Randomized Trees Algorithm

Lv Weimin,Wang Xiaomei,Han Tao

Data Analysis and Knowledge Discovery. 2017, 1(4): 38-45. https://doi.org/10.11925/infotech.2096-3467.2017.04.05

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a method to recommend scientific research collaborators based on link prediction and machine learning, which improves the precision of traditional method. [Methods] First, we used Link Prediction Algorithm index to build the feature input, and adopted the Extremely Randomized Trees Algorithm to train the classifier. Then, we obtained the optimal weight combination with the traversal algorithm to combine the classification results linearly. Finally, we received the best recommendation of collaborators. [Results] The improved ET method had better performance than the existing ones in recommending the collaboration cities. Besides, the proposed method was less affected by factors such as the network structure, and could be used with more applications. [Limitations] Scientific research collaboration is affected by the cooperation motivation, geographical, language and many other factors. The weighted author network did not examine authors from the same cities or with the same organizations. [Conclusions] The propsoed method could produce better recommendation results, which might help universities, institutions and individuals identify academic collabortors.

Select

Analyzing Continuance Intention of Health APP Users Based on Information Ecology

Zhang Min,Luo Meifen,Nie Rui,Zhang Yan

Data Analysis and Knowledge Discovery. 2017, 1(4): 46-56. https://doi.org/10.11925/infotech.2096-3467.2017.04.06

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to explore the factors affecting the continuance intention of mobile health application users. [Methods] From the perspective of information ecology, we first analysed information, users, technology and information environment factors. Then we proposed a new research hypotheses model based on the expectation confirmation model (ECM). [Results] We collected user behaviour data from server logs of multiple mobile health applications and questionnaires. A total of 288 valid samples were obtained and examined with SmartPLS2.0. We found that, all original relationships from the ECM existed in the mobile environment. The accuracy and consensus of information, perceived health threats, responding time and ease of use, as well as the direct / indirect network externality of the environment all positively correlated to the confirmation and perceived usefulness of mobile health applications. The eHealth literacy of users increased confirmation but restrained perceived usefulness. [Limitations] The sample size needed to be expanded, and the conclusions should to be promoted. [Conclusions] User’s continuance behaviour of mobile health APP is influenced by the information, users, technology and environment.

Select

Identifying Semantic Relations of Clusters Based on Linked Data

Cui Jiawang,Li Chunwang

Data Analysis and Knowledge Discovery. 2017, 1(4): 57-66. https://doi.org/10.11925/infotech.2096-3467.2017.04.07

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper introduces a model to identify the semantic relations for the co-word analysis results based on linked data. [Methods] First, we used Google Scholar, Springer and CNKI to retrieve the literature of the related research. Then, we analyzed the clusters relations of them. Finally, we constructed and examined the semantic relation model for clusters based on the linked data graph structure. [Results] The linked data helped us effectively explore the potential semantic relations among keywords. [Limitations] Due to the limits of the collected linked data, we only identified some sematic relationship, such as hierarchical, simple relavent, as well as classes-instance ones. More research is needed to improve the quality of linked data. [Conclusions] The proposed model could successfully discover the semantic relations among keywords, which help us get more insights from the cluster analysis.

Select

Predicting Dropout Rates of MOOCs with Sliding Window Model

Lu Xiaohang,Wang Shengqing,Huang Junjie,Chen Wenguang,Yan Zengwang

Data Analysis and Knowledge Discovery. 2017, 1(4): 67-75. https://doi.org/10.11925/infotech.2096-3467.2017.04.08

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to improve the MOOCs curriculum quality and pedagogy by analyzing the dropout behaviors with data from the MOOC of Peking University on Coursera. [Methods] We extracted 19 major features from the logs and then constructed a siding window model to predict the dropout rates. [Results] The precision of the proposed model was maintained above 90%. The SVM and LSTM methods further improved the performance of the proposed model. [Limitations] The new method needs to be examined with smaller sized courses. [Conclusions] Predicting dropout rates could help us improve the course quality effectively.

Select

Modeling User’s Interests Based on Image Semantics

Zeng Jin,Lu Wei,Ding Heng,Chen Haihua

Data Analysis and Knowledge Discovery. 2017, 1(4): 76-83. https://doi.org/10.11925/infotech.2096-3467.2017.04.09

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to predict the user’s interests accurately with a new modeling method based on the semantics of images shared on the microblogs. [Methods] First, we crawled the image data of Sina microblogging users. Then, we used high-level semantic information from these images. Finally, we predicted user’s interests based on the image semantic classifier by the SVM training. [Results] The proposed method could predict user’s interests effectively. Among the 169 Sina microblogging users, the precision, recall and F-values were 97.38%, 98.92% and 98.14%, respectively. [Limitations] The size of the test corpus needs to be expanded to have more comprehensive results. [Conclusions] The proposed model could predict user’s interests effectively, which lays some theoretical and technical foundations for the application of high-level image semantics.

Select

Building Semantic Enrichment Framework for Scientific Literature Retrieval System

Xie Jing,Wang Jingdong,Wu Zhenxin,Zhang Zhixiong,Wang Ying,Ye Zhifei

Data Analysis and Knowledge Discovery. 2017, 1(4): 84-93. https://doi.org/10.11925/infotech.2096-3467.2017.04.10

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to improve the scientific literature retrieval system with the help of semantic recognition and knowledge relationship computing. [Methods] First, we identified and extracted semantic objects from the scientific literature. Then, we calculated and established semantic relations among the objects using data-mining tools. Finally, we built semantic multidimensional index for these objects and relations, and then designed a new data organization model. [Results] The new system effectively identified the semantic information and improved the user experience. [Limitations] We need to expand the dataset used in this study and evaluate the new system in other areas. [Conclusions] The proposed system could retrieve more knowledge and indicate some future directions.

Select

Application of Text Clustering Method Based on Improved CFSFDP Algorithm

Zhan Chunxia,Wang Rongbo,Huang Xiaoxi,Chen Zhiqun

Data Analysis and Knowledge Discovery. 2017, 1(4): 94-99. https://doi.org/10.11925/infotech.2096-3467.2017.04.11

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to improve the un-satisfactory performance of CFSFDP (clustering by fast search and find of density peaks) algorithm with the help of based on particle swarm optimization. [Methods] First, we determined the cluster centers by searching optimal local density and distance thresholds to increase the accuracy of results. These clustering centers have relatively high local density and distance, which reduced the influence of discrete points. Then, we examined the proposed method on a randomly selected dataset from the question-answer database of a college entrance exam consulting platform. [Results] The modified CFSFDP algorithm had better performance than the original one. [Limitations] We did not include the semantic relations to process the texts. [Conclusions] The proposed algorithm could achieve good clustering results, and improve the efficiency of the consulting personnel .

Please choose a citation manager

Content to export

25 April 2017, Volume 1 Issue 4

模态框（Modal）标题

Please choose a citation manager

Content to export

25 April 2017, Volume 1 Issue 4