Current Issue
    , Volume 4 Issue 5 Previous Issue    Next Issue
    For Selected: View Abstracts Toggle Thumbnails
    Review of Attention Mechanism in Natural Language Processing
    Shi Lei,Wang Yi,Cheng Ying,Wei Ruibin
    2020, 4 (5): 1-14.  DOI: 10.11925/infotech.2096-3467.2019.1317
    Abstract   HTML ( 31 PDF (911KB) ( 204 )

    [Objective] This paper summarizes the evolution and application of attention mechanism in natural language processing.[Coverage] We searched “attention” with the title/topic fields of WoS, ACM Digital Library, arXiv and CNKI from January 2015 to October 2019. Then, we manually screened the topic literature in the field of natural language processing, and obtained 68 related papers.[Methods] We first summarized the general attention mechanism, and sorted out its derivations. Second, we thoroughly reviewed their applications in natural language processing tasks.[Results] The application of attention mechanism in natural language processing focused on sequence labeling, text classification, reasoning and generative tasks. There were adaptation rules between tasks and the various attention mechanisms.[Limitations] Some adaptations between the mechanisms and the tasks were obtained from the overall performance of the model. More research is needed to examine the performance of different mechanisms.[Conclusions] The study of attention mechanism has effectively promoted the development of natural language processing. However, the mechanism of action is not yet clear. Future research should focus on making attention mechanism closer to those of the human beings.

    Figures and Tables | References | Related Articles | Metrics
    A Systematic Review of Factors Influencing Online Trust
    Zhang Yi,Yang Yi,Deng Wen
    2020, 4 (5): 15-26.  DOI: 10.11925/infotech.2096-3467.2019.1376
    Abstract   HTML ( 11 PDF (667KB) ( 53 )

    [Objective] This paper tries to identify the influencing factors of online trust, which helps us gain more insights on user’s needs, as well as impacts of internal and external environments. It explains the effects of influencing factors, and improves online trust.[Coverage] We searched Web of Science, CNKI and other databases with “online trust”, “network trust”, or “system trust”, and retrieved 91 representative literature.[Methods] We reviewed the developments and concepts of online trust, and explored research on main influencing factors.[Results] Online trust research focused on the trustors, the trusted objects, the technology platforms and the external environments, as well as their effects. The emerging technologies also influenced online trust and reconstruction research. The theme evolution trends were closely related to the developments of trust theory and technology.[Limitations] This study only discussed the influencing factors and evaluation metrics.[Conclusions] Online trust research could be optimized from theoretical model, as well as research methods and perspectives.

    Figures and Tables | References | Related Articles | Metrics
    Constructing Knowledge Graph for Financial Equities
    Lv Huakui,Hong Liang,Ma Feicheng
    2020, 4 (5): 27-37.  DOI: 10.11925/infotech.2096-3467.2019.0929
    Abstract   HTML ( 24 PDF (1797KB) ( 77 )

    [Objective] This paper constructs a financial knowledge graph from the perspective of equity, which provides new directions for financial research. [Context] The existing financial research mainly analyses the data of creditor’s rights. Our study helps regulators and researchers through visualization of financial equity data.[Methods] With the help of knowledge connection, we constructed a knowledge graph for Chinese financial equities based on their ownership and the proportion of shareholdings. Then, we visualized the relationship among the financial institutions.[Results] Our knowledge graph had more than 45.86 million nodes and 145.74 million relationships. Users could query entities and their relationships for up to three layers.[Conclusions] The proposed method analyzes the financial network from the perspective of equity, which breaks through the limitations of existing research focusing on creditor’s rights.

    Figures and Tables | References | Related Articles | Metrics
    Automatic Data Processing Strategy of Citation Anomie Based on Feature Fusion
    Li Junlian,Wu Yingjie,Deng Panpan,Leng Fuhai
    2020, 4 (5): 38-45.  DOI: 10.11925/infotech.2096-3467.2020.0201
    Abstract   HTML ( 7 PDF (849KB) ( 42 )

    [Objective] To normalize different expressions of the same citation document, realize standard control and management of periodical citation data, and alleviate the data quality problems caused by citation anomie.[Methods] Taking the construction of the periodical citation database as the target scenario, the core characteristics of periodical citation data were analyzed according to the reference standards. The subsets of effective features were obtained based on the decision tree and accuracy, the execution priority of decision rules was specified and an automatic data processing strategy was constructed based on multi-feature fusion.[Results] 10,000 periodical citation sample data and 10,000 validation data sets were selected from the Chinese Biomedical Citation Index (CBMCI) for the experiment. The results show that our proposed feature fusion approach achieved 99.72% and 98.70% accuracy of the journal citation normalization on these two datasets, respectively.[Limitations] This article only explored the Chinese periodical citation anomie data and has not yet covered the citations of other languages and types.[Conclusions] The proposed method could automatically standardize large-scale journal citation data with high efficiency, thus reduce the burden of labor-intensive manual intervention. The idea of feature fusion can be also applied to the automatic normalization strategies of other types of citation documents.

    Figures and Tables | References | Related Articles | Metrics
    Coreference Resolution Based on Dynamic Semantic Attention
    Deng Siyi,Le Xiaoqiu
    2020, 4 (5): 46-53.  DOI: 10.11925/infotech.2096-3467.2019.1321
    Abstract   HTML ( 15 PDF (839KB) ( 31 )

    [Objective] This paper tries to more effectively identify the coreference, aiming to address the issues of ambiguous anaphor meaning and complex antecedent structure.[Methods] We established an end-to-end framework and used score ranking to identify the coreference relationships. Firstly, we calculated scores of all spans to retrieve the “mentions”. Then, we used scores of the candidate mention pairs to determine coreference relationship. We also built span representation with external multiple semantic representations. Finally, we combined scores of the two parts to generate the final list.[Results] We examined our model with the OntoNotes benchmark datasets. The precision, recall and F1 values of our model were 2.02%, 0.42% and 1.14% higher than those of the SOTA model.[Limitations] The training data sets only collected news, talk shows, or weblogs. More sci-tech literature is needed to further improve the model’s performance.[Conclusions] The proposed model could more effectively identify coreferences.

    Figures and Tables | References | Related Articles | Metrics
    Extracting Product Properties with Dependency Relationship Embedding and Conditional Random Field
    Li Chengliang,Zhao Zhongying,Li Chao,Qi Liang,Wen Yan
    2020, 4 (5): 54-65.  DOI: 10.11925/infotech.2096-3467.2019.1006
    Abstract   HTML ( 9 PDF (1028KB) ( 42 )

    [Objective] This paper designs multiple word representation methods, aiming to obtain the latent semantic features and extract product properties from reviews.[Methods] First, we used word properties, dependency relationship and embedding techniques to construct three types of word representations, which included basic, structural and category semantic information. Then, we applied conditional random field model to extract product properties with these semantic information.[Results] The accuracy of the proposed method was 3.97% higher than that of the DepREm-CRF.Its F1 value was up to 7.65% better than the popular ones.[Limitations] More research is needed to investigate the relationship between online sentiments and properties.[Conclusions] The proposed method is able to effectively extract properties from product reviews, and lays good foundation for fine-grained sentiment analysis research.

    Figures and Tables | References | Related Articles | Metrics
    Calculating Word Similarities Based on Formal Concept Analysis
    Liu Ping,Peng Xiaofang
    2020, 4 (5): 66-74.  DOI: 10.11925/infotech.2096-3467.2019.1297
    Abstract   HTML ( 9 PDF (756KB) ( 25 )

    [Objective] This paper tries to add a topic layer between document and word layers, aiming to calculate word similarities effectively. [Methods] First, we proposed a topic defintion and representation model based on the theory of formal concept analysis. Then, we mapped words to the topic layer. Finally, we developed an algorithm to calculate word similarities with the help of topic-to-topic relationship.[Results] We analyzed papers of SIGIR conference from 2006 to 2016 with the proposed method to calculate word similarities in the field of information retrieval. The precision and recall of the proposed method were up to 30% and 21% higher than those of the FastText method.[Limitations] The proposed method relies on the quality of extracted feature words of documents.[Conclusions] The proposed method utilizes the semantic relations among associated topics, and effectively calculate word similarities.

    Figures and Tables | References | Related Articles | Metrics
    Similarity Measurement of Traditional Chinese Medicine Components for Cold-hot Nature Discrimination
    Wei Guohui,Zhang Fengcong,Fu Xianjun,Wang Zhenguo
    2020, 4 (5): 75-83.  DOI: 10.11925/infotech.2096-3467.2019.0974
    Abstract   HTML ( 7 PDF (809KB) ( 24 )

    [Objective] This paper tries to measure the similarity of traditional Chinese medicine components, and then establish a discriminant method for their cold and hot natures.[Methods] Traditional Chinese medicines with similar compositions have similar medicinal properties. Therefore, we used ultraviolet spectra to characterize their components and retrieved the UV spectrum data of 61 traditional Chinese medicines. Then, we used the Mahalanobis distance to measure the similarities of these UV spectrum data. Finally, we constructed a prediction and recognition model for cold and hot natures based on the majority voting algorithm.[Results] We evaluated the proposed model with cross validation and extrapolation techniques. With the solvent of petroleum ether, areas under the ROC curve of cross validation and extrapolated prediction were 0.883 and 0.866. Predictive accuracies of cross validation and extrapolated prediction were 0.754 and 0.776. With multi-solvent comprehensive analysis, the accuracies of cross validation and extrapolation were 0.672 and 0.686.[Limitations] The data size of our study needs to be expanded.[Conclusions] The proposed model could effectively identify ultraviolet spectrum of traditional Chinese medicine components.

    Figures and Tables | References | Related Articles | Metrics
    Subspace Cross-modal Retrieval Based on High-Order Semantic Correlation
    Zhu Lu,Tian Xiaomeng,Cao Sainan,Liu Yuanyuan
    2020, 4 (5): 84-91.  DOI: 10.11925/infotech.2096-3467.2019.0912
    Abstract   HTML ( 5 PDF (1210KB) ( 22 )

    [Objective] This paper converts the heterogeneous multi-modal data into isomorphism, aiming to address the semantic gaps and improve the accuracy of cross-modal retrieval.[Methods] First, we decided the high-order semantic correlation between multi-modal data. Then, we combined the annotation and the structure information of multi-modal data. Finally, we transformed the data of different modals into isomorphism for direct retrieval.[Results] We examined our method with three open datasets of WIKI, NUS-WIDE and XMedia. The average MAP value obtained by our method was 0.111 3, 0.091 0 and 0.185 0 higher than the best results of CCA, JGRHML, SCM and JFSSL.[Limitations] Our method is not applicable to semi-supervised and unsupervised data.[Conclusions] The proposed method improves the accuracy of cross-modal retrieval effectively.

    Figures and Tables | References | Related Articles | Metrics
    Recommending Tourism Attractions Based on Segmented User Groups and Time Contexts
    Zheng Songyin,Tan Guoxin,Shi Zhongchao
    2020, 4 (5): 92-104.  DOI: 10.11925/infotech.2096-3467.2019.1080
    Abstract   HTML ( 7 PDF (1527KB) ( 26 )

    [Objective] This study tries to provide personalized recommendations for tourists, aiming to improve the low efficiency of user decision-making due to information overload.[Methods] We proposed a new SPT (user Similarity, Popular spot and Time) algorithm, and used real data from Ctrip to compare its recommendation results with traditional algorithms. We also proposed a method to construct training set based on “segmented user groups” and examined its impacts on the recommendation results.[Results] The SPT algorithm yielded better results than traditional recommendation methods in precision, recall, coverage and popularity. The algorithm based on “segmented user groups” further improved the effectiveness of recommendation. The precision and recall of the proposed algorithm reached 43.75% and 61.59%.[Limitations] The algorithm could not find similar users for new users. Our new method requires further testing with more datasets.[Conclusions] The proposed method improves recommendation results of tourism attractions, as well as tourists’ decision-making and personalized services.

    Figures and Tables | References | Related Articles | Metrics
    Personalized Recommendation Model Based on Collaborative Filtering Algorithm of Learning Situation
    Su Qing,Chen Sizhao,Wu Weimin,Li Xiaomei,Huang Tiankuan
    2020, 4 (5): 105-117.  DOI: 10.11925/infotech.2096-3467.2019.1092
    Abstract   HTML ( 15 PDF (1566KB) ( 46 )

    [Objective] This paper proposes a personalized model based on learning situation, which recommends schemes for learners and addresses the information overload issues.[Methods] First, we constructed a PAD-CF collaborative filtering algorithm based on three factors related to learning situation. Then, we introduced the knowledge map and degrees centrality of knowledge points to retrieve the recommended points.[Results] Compared to Pearson-CF, Edurank, and CF-SPM, the proposed model improved the F value by 6.24%, 2.68%, and 1.98%, respectively. The growth rates were 3.87%, 2.39%, and 1.43%.[Limitations] We need to add more complicated learning factors to improve the accuracy of predicted knowledge points.[Conclusions] The proposed model is highly practical for real world cases.

    Figures and Tables | References | Related Articles | Metrics
    Identifying Scenic Spot Entities Based on Improved Knowledge Transfer
    Zhao Ping,Sun Lianying,Tu Shuai,Bian Jianling,Wan Ying
    2020, 4 (5): 118-126.  DOI: 10.11925/infotech.2096-3467.2019.0907
    Abstract   HTML ( 14 PDF (849KB) ( 43 )

    [Objective] This paper addresses the issues facing labeled data in the recognition of scenic spots.[Methods] We proposed an improved knowledge transfer algorithm for entity recognition and used datasets from the People’s Daily to evaluate our new model.[Results] Our method’s accuracy was 1.62% higher than the model using all labeled data.[Limitations] More research is needed to examine the expansion of samples.[Conclusions] The proposed method uses less labeled data in entity recognition and provides better technical support for tourism recommendation.

    Figures and Tables | References | Related Articles | Metrics
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn