Data Analysis and Knowledge Discovery

Select

Review of Attention Mechanism in Natural Language Processing

Shi Lei,Wang Yi,Cheng Ying,Wei Ruibin

Data Analysis and Knowledge Discovery. 2020, 4(5): 1-14. https://doi.org/10.11925/infotech.2096-3467.2019.1317

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper summarizes the evolution and application of attention mechanism in natural language processing.[Coverage] We searched “attention” with the title/topic fields of WoS, ACM Digital Library, arXiv and CNKI from January 2015 to October 2019. Then, we manually screened the topic literature in the field of natural language processing, and obtained 68 related papers.[Methods] We first summarized the general attention mechanism, and sorted out its derivations. Second, we thoroughly reviewed their applications in natural language processing tasks.[Results] The application of attention mechanism in natural language processing focused on sequence labeling, text classification, reasoning and generative tasks. There were adaptation rules between tasks and the various attention mechanisms.[Limitations] Some adaptations between the mechanisms and the tasks were obtained from the overall performance of the model. More research is needed to examine the performance of different mechanisms.[Conclusions] The study of attention mechanism has effectively promoted the development of natural language processing. However, the mechanism of action is not yet clear. Future research should focus on making attention mechanism closer to those of the human beings.

Select

A Systematic Review of Factors Influencing Online Trust

Zhang Yi,Yang Yi,Deng Wen

Data Analysis and Knowledge Discovery. 2020, 4(5): 15-26. https://doi.org/10.11925/infotech.2096-3467.2019.1376

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to identify the influencing factors of online trust, which helps us gain more insights on user’s needs, as well as impacts of internal and external environments. It explains the effects of influencing factors, and improves online trust.[Coverage] We searched Web of Science, CNKI and other databases with “online trust”, “network trust”, or “system trust”, and retrieved 91 representative literature.[Methods] We reviewed the developments and concepts of online trust, and explored research on main influencing factors.[Results] Online trust research focused on the trustors, the trusted objects, the technology platforms and the external environments, as well as their effects. The emerging technologies also influenced online trust and reconstruction research. The theme evolution trends were closely related to the developments of trust theory and technology.[Limitations] This study only discussed the influencing factors and evaluation metrics.[Conclusions] Online trust research could be optimized from theoretical model, as well as research methods and perspectives.

Select

Constructing Knowledge Graph for Financial Equities

Lv Huakui,Hong Liang,Ma Feicheng

Data Analysis and Knowledge Discovery. 2020, 4(5): 27-37. https://doi.org/10.11925/infotech.2096-3467.2019.0929

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper constructs a financial knowledge graph from the perspective of equity, which provides new directions for financial research. [Context] The existing financial research mainly analyses the data of creditor’s rights. Our study helps regulators and researchers through visualization of financial equity data.[Methods] With the help of knowledge connection, we constructed a knowledge graph for Chinese financial equities based on their ownership and the proportion of shareholdings. Then, we visualized the relationship among the financial institutions.[Results] Our knowledge graph had more than 45.86 million nodes and 145.74 million relationships. Users could query entities and their relationships for up to three layers.[Conclusions] The proposed method analyzes the financial network from the perspective of equity, which breaks through the limitations of existing research focusing on creditor’s rights.

Select

Automatic Data Processing Strategy of Citation Anomie Based on Feature Fusion

Li Junlian,Wu Yingjie,Deng Panpan,Leng Fuhai

Data Analysis and Knowledge Discovery. 2020, 4(5): 38-45. https://doi.org/10.11925/infotech.2096-3467.2020.0201

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] To normalize different expressions of the same citation document, realize standard control and management of periodical citation data, and alleviate the data quality problems caused by citation anomie.[Methods] Taking the construction of the periodical citation database as the target scenario, the core characteristics of periodical citation data were analyzed according to the reference standards. The subsets of effective features were obtained based on the decision tree and accuracy, the execution priority of decision rules was specified and an automatic data processing strategy was constructed based on multi-feature fusion.[Results] 10,000 periodical citation sample data and 10,000 validation data sets were selected from the Chinese Biomedical Citation Index (CBMCI) for the experiment. The results show that our proposed feature fusion approach achieved 99.72% and 98.70% accuracy of the journal citation normalization on these two datasets, respectively.[Limitations] This article only explored the Chinese periodical citation anomie data and has not yet covered the citations of other languages and types.[Conclusions] The proposed method could automatically standardize large-scale journal citation data with high efficiency, thus reduce the burden of labor-intensive manual intervention. The idea of feature fusion can be also applied to the automatic normalization strategies of other types of citation documents.

Select

Coreference Resolution Based on Dynamic Semantic Attention

Deng Siyi,Le Xiaoqiu

Data Analysis and Knowledge Discovery. 2020, 4(5): 46-53. https://doi.org/10.11925/infotech.2096-3467.2019.1321

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to more effectively identify the coreference, aiming to address the issues of ambiguous anaphor meaning and complex antecedent structure.[Methods] We established an end-to-end framework and used score ranking to identify the coreference relationships. Firstly, we calculated scores of all spans to retrieve the “mentions”. Then, we used scores of the candidate mention pairs to determine coreference relationship. We also built span representation with external multiple semantic representations. Finally, we combined scores of the two parts to generate the final list.[Results] We examined our model with the OntoNotes benchmark datasets. The precision, recall and F1 values of our model were 2.02%, 0.42% and 1.14% higher than those of the SOTA model.[Limitations] The training data sets only collected news, talk shows, or weblogs. More sci-tech literature is needed to further improve the model’s performance.[Conclusions] The proposed model could more effectively identify coreferences.

Select

Extracting Product Properties with Dependency Relationship Embedding and Conditional Random Field

Li Chengliang,Zhao Zhongying,Li Chao,Qi Liang,Wen Yan

Data Analysis and Knowledge Discovery. 2020, 4(5): 54-65. https://doi.org/10.11925/infotech.2096-3467.2019.1006

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper designs multiple word representation methods, aiming to obtain the latent semantic features and extract product properties from reviews.[Methods] First, we used word properties, dependency relationship and embedding techniques to construct three types of word representations, which included basic, structural and category semantic information. Then, we applied conditional random field model to extract product properties with these semantic information.[Results] The accuracy of the proposed method was 3.97% higher than that of the DepREm-CRF.Its F₁ value was up to 7.65% better than the popular ones.[Limitations] More research is needed to investigate the relationship between online sentiments and properties.[Conclusions] The proposed method is able to effectively extract properties from product reviews, and lays good foundation for fine-grained sentiment analysis research.

Select

Calculating Word Similarities Based on Formal Concept Analysis

Liu Ping,Peng Xiaofang

Data Analysis and Knowledge Discovery. 2020, 4(5): 66-74. https://doi.org/10.11925/infotech.2096-3467.2019.1297

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to add a topic layer between document and word layers, aiming to calculate word similarities effectively. [Methods] First, we proposed a topic defintion and representation model based on the theory of formal concept analysis. Then, we mapped words to the topic layer. Finally, we developed an algorithm to calculate word similarities with the help of topic-to-topic relationship.[Results] We analyzed papers of SIGIR conference from 2006 to 2016 with the proposed method to calculate word similarities in the field of information retrieval. The precision and recall of the proposed method were up to 30% and 21% higher than those of the FastText method.[Limitations] The proposed method relies on the quality of extracted feature words of documents.[Conclusions] The proposed method utilizes the semantic relations among associated topics, and effectively calculate word similarities.

Select

Wei Guohui,Zhang Fengcong,Fu Xianjun,Wang Zhenguo

Data Analysis and Knowledge Discovery. 2020, 4(5): 75-83. https://doi.org/10.11925/infotech.2096-3467.2019.0974

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to measure the similarity of traditional Chinese medicine components, and then establish a discriminant method for their cold and hot natures.[Methods] Traditional Chinese medicines with similar compositions have similar medicinal properties. Therefore, we used ultraviolet spectra to characterize their components and retrieved the UV spectrum data of 61 traditional Chinese medicines. Then, we used the Mahalanobis distance to measure the similarities of these UV spectrum data. Finally, we constructed a prediction and recognition model for cold and hot natures based on the majority voting algorithm.[Results] We evaluated the proposed model with cross validation and extrapolation techniques. With the solvent of petroleum ether, areas under the ROC curve of cross validation and extrapolated prediction were 0.883 and 0.866. Predictive accuracies of cross validation and extrapolated prediction were 0.754 and 0.776. With multi-solvent comprehensive analysis, the accuracies of cross validation and extrapolation were 0.672 and 0.686.[Limitations] The data size of our study needs to be expanded.[Conclusions] The proposed model could effectively identify ultraviolet spectrum of traditional Chinese medicine components.

Select

Subspace Cross-modal Retrieval Based on High-Order Semantic Correlation

Zhu Lu,Tian Xiaomeng,Cao Sainan,Liu Yuanyuan

Data Analysis and Knowledge Discovery. 2020, 4(5): 84-91. https://doi.org/10.11925/infotech.2096-3467.2019.0912

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper converts the heterogeneous multi-modal data into isomorphism, aiming to address the semantic gaps and improve the accuracy of cross-modal retrieval.[Methods] First, we decided the high-order semantic correlation between multi-modal data. Then, we combined the annotation and the structure information of multi-modal data. Finally, we transformed the data of different modals into isomorphism for direct retrieval.[Results] We examined our method with three open datasets of WIKI, NUS-WIDE and XMedia. The average MAP value obtained by our method was 0.111 3, 0.091 0 and 0.185 0 higher than the best results of CCA, JGRHML, SCM and JFSSL.[Limitations] Our method is not applicable to semi-supervised and unsupervised data.[Conclusions] The proposed method improves the accuracy of cross-modal retrieval effectively.

Select

Recommending Tourism Attractions Based on Segmented User Groups and Time Contexts

Zheng Songyin,Tan Guoxin,Shi Zhongchao

Data Analysis and Knowledge Discovery. 2020, 4(5): 92-104. https://doi.org/10.11925/infotech.2096-3467.2019.1080

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study tries to provide personalized recommendations for tourists, aiming to improve the low efficiency of user decision-making due to information overload.[Methods] We proposed a new SPT (user Similarity, Popular spot and Time) algorithm, and used real data from Ctrip to compare its recommendation results with traditional algorithms. We also proposed a method to construct training set based on “segmented user groups” and examined its impacts on the recommendation results.[Results] The SPT algorithm yielded better results than traditional recommendation methods in precision, recall, coverage and popularity. The algorithm based on “segmented user groups” further improved the effectiveness of recommendation. The precision and recall of the proposed algorithm reached 43.75% and 61.59%.[Limitations] The algorithm could not find similar users for new users. Our new method requires further testing with more datasets.[Conclusions] The proposed method improves recommendation results of tourism attractions, as well as tourists’ decision-making and personalized services.

Select

Personalized Recommendation Model Based on Collaborative Filtering Algorithm of Learning Situation

Su Qing,Chen Sizhao,Wu Weimin,Li Xiaomei,Huang Tiankuan

Data Analysis and Knowledge Discovery. 2020, 4(5): 105-117. https://doi.org/10.11925/infotech.2096-3467.2019.1092

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a personalized model based on learning situation, which recommends schemes for learners and addresses the information overload issues.[Methods] First, we constructed a PAD-CF collaborative filtering algorithm based on three factors related to learning situation. Then, we introduced the knowledge map and degrees centrality of knowledge points to retrieve the recommended points.[Results] Compared to Pearson-CF, Edurank, and CF-SPM, the proposed model improved the F value by 6.24%, 2.68%, and 1.98%, respectively. The growth rates were 3.87%, 2.39%, and 1.43%.[Limitations] We need to add more complicated learning factors to improve the accuracy of predicted knowledge points.[Conclusions] The proposed model is highly practical for real world cases.

Select

Identifying Scenic Spot Entities Based on Improved Knowledge Transfer

Zhao Ping,Sun Lianying,Tu Shuai,Bian Jianling,Wan Ying

Data Analysis and Knowledge Discovery. 2020, 4(5): 118-126. https://doi.org/10.11925/infotech.2096-3467.2019.0907

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper addresses the issues facing labeled data in the recognition of scenic spots.[Methods] We proposed an improved knowledge transfer algorithm for entity recognition and used datasets from the People’s Daily to evaluate our new model.[Results] Our method’s accuracy was 1.62% higher than the model using all labeled data.[Limitations] More research is needed to examine the expansion of samples.[Conclusions] The proposed method uses less labeled data in entity recognition and provides better technical support for tourism recommendation.

Please choose a citation manager

Content to export

25 May 2020, Volume 4 Issue 5

模态框（Modal）标题

Please choose a citation manager

Content to export

25 May 2020, Volume 4 Issue 5