Data Analysis and Knowledge Discovery

Select

A Survey of Sentiment Analysis on Social Media

Ying Tan,Jin Zhang,Lixin Xia

Data Analysis and Knowledge Discovery. 2020, 4(1): 1-11. https://doi.org/10.11925/infotech.2096-3467.2019.0769

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper investigates recent researches addressing sentiment analysis on social media.[Coverage] 163 papers in total are collected and 91 articles are cited for this review, covering articles subject on social media and sentiment analysis retrieved from Web of Science Core Collection during 2015-2019, and a supplement from citation analysis and browsing.[Methods] Content analysis is used for exploring task, technology, and application of sentiment analysis on social media.[Results] A variety of sentiment analysis tasks are summarized, refine sentiment analysis techniques on social media platforms are clarified, application fields are discussed as well.[Limitations] There is no in-depth analysis of the step and procedure for the sentiment analysis algorithm.[Conclusions] The findings provide an overview of sentiment analysis study, including the state-of-the-art technique, application and challenges on social media platforms.

Select

Monitoring and Forecasting Economic Performance with Big Data

Jiandong Wang

Data Analysis and Knowledge Discovery. 2020, 4(1): 12-26. https://doi.org/10.11925/infotech.2096-3467.2019.1380

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This article reviews the current research in economic monitoring with big data from China and abroad.[Coverage] We searched the WOS, CNKI, and EI databases with the keywords of “Big data + Economics / Economy”. A total of 163 Chinese papers and 107 English papers, as well as seven monographs on the big data economics were retrieved. 157 representative documents were identified based on their relevance and quality.[Methods] This paper summarized the research methods, data sources, and conclusions of the retrieved literature published in the past ten years.[Results] Seven typical research paths were found from the perspectives of monitoring and forecasting. The former includes improving traditional surveys with big data, constructing new economic monitoring indicators, “nowcasting”, and analyzing economic performance. The latter includes building advance economic forecasting indicators, improving traditional forecasting models, and establishing new forecasting models.[Limitations] This article only examines the related research in the past ten years from specific fields, which needs to be further expanded.[Conclusions] Using big data for macroeconomic monitoring and forecasting has huge potentiality and practical dilemma. The differences and connections between big data and traditional economic analytics, as well as their impacts, also merit study.

Select

Advances in Patent Network

Peng Guan,Yuefen Wang

Data Analysis and Knowledge Discovery. 2020, 4(1): 26-39. https://doi.org/10.11925/infotech.2096-3467.2019.1201

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] The paper systematically reviews current studies on patent networks, and then summaries research questions and developing trends.[Coverage] We used “Patent Network” as search terms for the Web of Science and CNKI core journal databases, respectively. A total of 465 English papers and 196 Chinese papers were retrieved by removing the duplicated and irrelevant ones. Our final list included 106 representative articles on topic labeling.[Methods] Firstly, we used the community discovery algorithm to explore topics of the keyword co-occurrence network. Then, we extracted research topics of these Chinese and English papers to identify research trends. Finally,we reviewed papers with the highest numbers of citations from each trending topics.[Results] The construction methods of patent networks include cooperation, reference, technology transfer and technology similarity, etc. The popular research methods include social network analysis, complex network and text mining, etc.[Limitations] We only studied the representative literature, more research is needed to expand our analysis to all research topics.[Conclusions] The patent network analysis is emerging. Research on the evolution mechanism, model and simulation experiment of patent networks needs to be strengthened. More and more researchers focus on semantic analysis tendency of patent network, as well as the construction of patent comprehensive network.

Select

Survey of Attribute Reduction Methods

Jie Ma,Yan Ge,Hongyu Pu

Data Analysis and Knowledge Discovery. 2020, 4(1): 40-50. https://doi.org/10.11925/infotech.2096-3467.2018.1278

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper reviews the methods, developing trends and applications of attribute reduction, aiming to support systematic research in this field.[Coverage] From the Web of Science and CNKI, we retrieved 142 articles on attribute reduction, using the keywords of “Attribute Reduction” and “属性约简”. We also optimized the results with topic selection, intensive reading and retrospective method.[Methods] We surveyed the fundamentals of attribute reduction, and then summarized its leading research.[Results] The popular research of attribute reduction methods focused on rough sets, granular computing and formal concept analysis. Its developing trends were closely related to the dynamics of data and the fusion of intelligent algorithms.[Limitations] We only briefly discussed the merging of attribute reduction algorithms.[Conclusions] We explored the developing trends of attribute reduction methods.

Select

Qualitative Data Analysis in Chinese Social Science Studies——The Case of Nvivo

Hong Pan,Li Tang

Data Analysis and Knowledge Discovery. 2020, 4(1): 51-62. https://doi.org/10.11925/infotech.2096-3467.2019.1227

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] At present, massive unstructured data has emerged on a large scale, which makes effective use of qualitative data analysis tools increasingly important. This paper systematically reviews Nvivo-applied research in Chinese social science.[Coverage] We used “Nvivo” as the keyword to search in CNKI database. A total of 327 sample articles were retrieved from 2008 to 2018 with manual cleaning.[Methods] We used the content analysis method to encode the sample literature, and then analyzed the application status of qualitative data analysis tools.[Results] (I)Application subject. Over the last decade, Nvivo-applied research had been growing rapidly in China. However, the links between research teams and institutions were rather weak.(II) Application process.More than 80% of the methods were content analysis for non-obtrusive designs and interview for obtrusive designs. Less than 10% of Nvivo-applied research included four steps of data coding, coding test, coding analysis and theoretical modeling. (III)Application object.These researches focused on grounded theory, qualitative research and content analysis, and the leading researchers were from public administration, library and information science, and journalism communication.[Limitations] We should improve Nvivo-applied research from the perspectives of scientific research cooperation, step normalization, method diversification, and data diversity.[Conclusions] The future qualitative data analysis tools could play a better role in social science studies thanks to their powerful data coding and theoretical construction functions

Select

Knowledge Representation Based on Deep Learning:Network Perspective

Chuanming Yu,Haonan Li,Manyi Wang,Tingting Huang,Lu An

Data Analysis and Knowledge Discovery. 2020, 4(1): 63-75. https://doi.org/10.11925/infotech.2096-3467.2019.0505

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper explores better representation models for the semantic relationship among knowledge objects.[Methods] Based on the existing algorithm of network representation learning, we proposed a combined knowledge network representation learning model (CKNRL), with integrated learning and deep learning techniques.[Results] We examined our new model with the knowledge network link prediction task of Chinese and English news parallel corpus. The AUC value of the CKNRL model was 0.929, which was higher than those of the traditional algorithms, i.e. DeepWalk(0.925), Node2Vec(0.926) and SDNE(0.899).[Limitations] Our study was based on the word co-occurrence network, and more research is needed to examine the CKNRL model for link prediction on more types of knowledge networks.[Conclusions] The semantic relationship among knowledge objects can be better represented by the proposed fusion model.

Select

An Evolutionary Schema for Metadata Description

Xuhui Li,Tao Yu,Ting Li,Yiwen Li,Jinguang Gu

Data Analysis and Knowledge Discovery. 2020, 4(1): 76-88. https://doi.org/10.11925/infotech.2096-3467.2019.0791

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes an evolutionary schema for metadata description, aiming to address the frequent changing of schema for information system applications.[Methods] First, we summarized related research and formalized the structural description of the conceptual schema. Then, we summarized the common forms of conceptual schema evolution by combining schema structures. Finally, we determined the evolution schema structure based on the normal model.[Results] This paper established an evolutionary mechanism for metadata description and the ENM (Evolutionary Normal Metadata) model.[Limitations] This paper is only a preliminary study. More in-depth discussion is needed on the theoretical nature of structural expressions of normal conceptual schema.[Conclusions] The proposed method has strong abilityin the concept description of semantic features.

Select

Automatic Identification of Term Citation Object with Feature Fusion

Na Ma,Zhixiong Zhang,Pengmin Wu

Data Analysis and Knowledge Discovery. 2020, 4(1): 89-98. https://doi.org/10.11925/infotech.2096-3467.2019.0869

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper explores methods automatically identifying term citation objects from scientific papers, with feature fusion and pseudo-label noise reduction strategy.[Methods] First, we converted the identification of term citation objects into sequential annotation. Then, we combined linguistic and heuristic features of term citation objects in the BiLSTM-CNN-CRF input layer, which enhanced their feature representations. Finally, we designed pseudo-label learning noise reduction mechanism, and compared the performance of different models.[Results] The optimal F1 value of our method reached 0.6018, which was 8% higher than that of the BERT model.[Limitations] The experimental data was collected from computer science articles, thus, our model needs to be examined with data from other fields.[Conclusions] The proposed method could effectively identify term citation objects.

Select

Identifying Implicit Features with Word Embedding

Hui Nie,Huan He

Data Analysis and Knowledge Discovery. 2020, 4(1): 99-110. https://doi.org/10.11925/infotech.2096-3467.2019.0702

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] The paper tries to extract implicit features from online reviews, aiming to obtain complete product-specific information and users’ evaluation from reviews.[Methods] We compared the performance of two leading methods for implicit feature extraction, relationship-based inference and classification.Then, we introduced the word embedding model, an online review corpus, and semantic-related words to improve each algorithm’s effectiveness. Finally, we examined the impacts of dataset equilibrium on the algorithms.[Results] To idenfity implicit features, the classification-based methods performed better than those based on relation inference with the non-equilibrium dataset. Word embedding significantly improved the quality of sentence model, which increased the recall and F1 scores by 5.91% and 2.48% respectively. With the equilibrium dataset, the relation-inference methods did a better job and the best F1-score was 0.7503 (word embedding).[Limitations] The size of corpus for training word embedding and the balanced dataset needs to be expanded.[Conclusions] The appropriate modeling schemes based on the target datasets and the equilibrium datasets yield better results. Word embedding helps us optimize the methods for classification.

Select

Classification of Short Texts Based on nLD-SVM-RF Model

Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang

Data Analysis and Knowledge Discovery. 2020, 4(1): 111-120. https://doi.org/10.11925/infotech.2096-3467.2019.0790

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper addresses the issue of data sparseness due to short texts, which also improves the performance of short texts classification.[Methods] We proposed a multi-channel text model for the input of short text classifier by integrating the semantics, word order features and topic features. Then, we created the classification method named nLD-SVM-RF with the help of SVM and random forest algorithms. Finally, we examined the new model with short text of complaints.[Results] We compared the performance of our new model with the SVM and RF single classifiers using Doc2vec as the feature. When n =5, the accuracy of the nLD-SVM-RF method increased by 9.70% and 6.25%, respectively.[Limitations] The experimental data size needs to be expanded.[Conclusions] The nLD-SVM-RF model provides a practical solution for the business community to analyse short texts and improve decision-making.

Select

Automatic Concept Update Strategy Towards Heterogeneous Terminology Integration

Haixia Sun,Panpan Deng,Jiao Li,Liu Shen,Qing Qian

Data Analysis and Knowledge Discovery. 2020, 4(1): 121-130. https://doi.org/10.11925/infotech.2096-3467.2019.0955

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a method updating integrated concept for the version evolution of source Knowledge Organization Systems (KOSs), aiming to promote the dynamic development of the heterogeneous terminology integration system.[Methods] Our model focuses on terms, synonym sets and preferred terms of concepts. Firstly, we identified terms changing types and preferred terms changing modes of concepts in source KOSs by exact string matching. Then, we recognized their synonym sets changing patterns through concept vector space. Finally, we updated synonym sets and preferred terms of integrated concepts fusion rule and similarity. We also assessed the results yielded by our method using medical integration concept set of STKOS and its important sources, MeSH and HUGO.[Results] The synonymous merging rate of new term from source KOSs reached 94.96%, and the update accuracy of preferred term of changed integrated concepts reached 99.91%.[Limitations] We did not consider ambiguity of the terms and the results were affected by the number of vocabulary and update order.[Conclusions] The proposed method can be applied to update concepts of synonymous knowledge organization systems because of their source KOSs evolution.

Select

Retrieving Scientific Documents with Formula Description Structure and Word Embedding

Xinyu Zai,Xuedong Tian

Data Analysis and Knowledge Discovery. 2020, 4(1): 131-138. https://doi.org/10.11925/infotech.2096-3467.2019.0943

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study proposes a scientific document retrieval method combining formula match and text ranking, which address the challenges from mathematical expressions.[Methods] First, we used the analysis algorithm for formula description structure to study the mathematical expressions. Then, we acquired formula structure information, and retrieved technical documents based on mathematical expressions. Meanwhile, we obtained the inquiry keywords and document word vectors with the help of word embedding model. Finally, we ranked the documents based on the similarity between the two word vectors[Results] The recall and precision scores of our new model were 0.77 and 0.63, which were 24.2% and 23.5% higher than those of the traditional scientific document retrieval methods.[Limitations] Our method only focuses on expressions in LaTeX format.[Conclusions] The proposed model combining formula and document keywords improves the performance of scitific document retrieval.

Please choose a citation manager

Content to export

25 January 2020, Volume 4 Issue 1

模态框（Modal）标题

Please choose a citation manager

Content to export

25 January 2020, Volume 4 Issue 1