Current Issue
    , Volume 6 Issue 9 Previous Issue    Next Issue
    For Selected: View Abstracts Toggle Thumbnails
    Forecasting Developments of Core Topics in Science and Technology with Trend Analysis
    Cui Ji, Zhang Jinpeng, Bao Zhou, Ding Shengchun
    2022, 6 (9): 1-13.  DOI: 10.11925/infotech.2096-3467.2021.1451
    Abstract   HTML ( 28 PDF(2315KB) ( 88 )  

    [Objective] The study creates a predictive model based on trending topics and analyzes the related literature, aiming to forecast the developments of core topics. [Methods] First, we analyzed the characteristics of research topics from scientific and technological literature. Then, we extracted the core topics of strategic coordinate identification. Finally, we used the ARIMA model and exponential smoothing method to predict the topics’ trending degrees. [Results] The mean absolute error and mean root mean square error of the exponential smoothing method were both smaller than those of the ARIMA model. [Limitations] The selection of initial parameters for the model, the distribution of coefficients and the number of published papers will affect the prediction performance. [Conclusions] The two proposed models could yield better prediction results for growing and emerging topics.

    Figures and Tables | References | Related Articles | Metrics
    Analyzing Characteristics of ESI Discipline Distribution in China, U.S. and U.K. with Sub-Disciplines and Text Contents
    Zhang Wanshu, Yao Haitao, Wang Xuefeng
    2022, 6 (9): 14-26.  DOI: 10.11925/infotech.2096-3467.2021.1439
    Abstract   HTML ( 10 PDF(4418KB) ( 39 )  

    [Objective] This paper examines the highly cited papers from ESI, aiming to identify the characteristics of their discipline distributions in China, the United States and the United Kingdom. [Methods] First, we merged the sub-disciplines and text contents based on the general framework of biodiversity. Then, we constructed three indicators of discipline variety, discipline balance and discipline disparity. Finally, we analyzed the changing of indicators over a five-year-period. [Results] There is a gap between China and the United States or the United Kingdom in the diversity of Social Sciences and Biomedical Sciences, in the balance of Engineering, Mathematics, as well as Environment & Ecology, and in the disparity of Computer Sciences, Geosciences, Botanic and Animal Sciences. However, some indicators showed an upward trend. [Limitations] More research is needed to examine the threshold of discipline coverages, as well as the contribution differences due to the order of authors’ nationalities. [Conclusions] Our study finds the differences between China, the United States or the United Kingdom in the distribution of research disciplines, which benefits discipline evaluation and future developments.

    Figures and Tables | References | Related Articles | Metrics
    Poet’s Emotional Trajectory in Time and Space: Case Study of Li Bai for Digital Humanities
    Gao Jinsong, Zhang Qiang, Li Shuaike, Sun Yanling, Zhou Shubin
    2022, 6 (9): 27-39.  DOI: 10.11925/infotech.2096-3467.2021.1413
    Abstract   HTML ( 18 PDF(28612KB) ( 65 )  

    [Objective] This paper explores the poets’ changes of time-space trajectory and emotional dimension, aiming to provide a new research perspective for the humanities field. [Context] We improves the visualization of the current research on digital humanities and the accessibility of the results, with the help of ontology and GIS technology. [Methods] We constructed a poet ontology model for Li Bai, a famous poet in China’s Tang Dynasty, and created the knowledge model for his related concepts and relationships. Then, we used the GIS technology to present the changes of Li Bai’s temporal and spatial emotional trajectory, which helped us explore the tacit knowledge. [Results] Li Bai’s life trajectories spanned more than half of today’s China, with the most frequent trajectories near today’s Nanjing. Dangtu was Li Bai’s “sorrowful and joyful” place, while Nanjing was Li Bai’s “sorrowful” place. Li Bai was more “joyful” than “sorrowful” in his youth, while he became more “sorrowful”than “joyful” in middle age. Li Bai was “sorrowful and joyful” in his later years. [Conclusions] This paper provides practical guidelines for studying poet’s emotional trajectories in time and space, which benefits the humanities research.

    Figures and Tables | References | Related Articles | Metrics
    Detecting Multimodal Sarcasm Based on SC-Attention Mechanism
    Chen Yuanyuan, Ma Jing
    2022, 6 (9): 40-51.  DOI: 10.11925/infotech.2096-3467.2021.1362
    Abstract   HTML ( 11 PDF(5393KB) ( 19 )  

    [Objective] This paper designs an SC-Attention fusion mechanism,aiming to improve the low prediction accuracy and difficult fusion of multimodal features in the existing detection models for multimodal sarcasm. [Methods] First, we used the CLIP and RoBERTa models to extract features from pictures, picture attributes, and texts. Then, we combined the SC-Attention mechanism with SENet’s attention mechanism to establish the Co-Attention mechanism and fuse multi-modal features. Third, we re-allocated attention feature weights by the original modals. Finally, we input features to the full connection layers to detect sarcasm. [Results] The accuracy and F1 of the proposed model reached 93.71% and 91.68%, which were 10.27 and 11.5 percentage point higher than the existing ones. [Limitations] We need to examine our model with more data sets. [Conclusions] The proposed model reduces information redundancy and feature loss, which effectively improves the accuracy of multimodal sarcasm detection.

    Figures and Tables | References | Related Articles | Metrics
    News Recommendation with Latent Topic Distribution and Long and Short-Term User Representations
    Tang Jiao, Zhang Lisheng, Sang Chunyan
    2022, 6 (9): 52-64.  DOI: 10.11925/infotech.2096-3467.2021.1376
    Abstract   HTML ( 12 PDF(1109KB) ( 32 )  

    [Objective] This paper proposes a news recommendation model based on contents and additional information on users’ current preferences, aiming to improve the performance of the existing ones. [Methods] We estblished a news representation model integrating the titles, abstracts, full-texts, as well as explicit and potential topics. We also built a user representation model utilizing the long and short-term user interests as well as their current concerns and preferences. [Results] We examined the proposed model with two large-scale news recommendation datasets. It reached 69.51% on AUC, 34.09% on MRR, 37.25% on nDCG@5, and 43.01% on nDCG@10 with the first dataset. For the second one, we had 66.05% on AUC, 30.93% on MRR, 34.30% on nDCG@5, and 40.46% on nDCG@10, which were all higher than the seven baseline models. [Limitations] More research is needed to study users with few historical behaviors. [Conclusions] The proposed model could create vectors for news contents and user representations using advanced natural language processing techniques. It also effectively improves the performance of news recommendation models.

    Figures and Tables | References | Related Articles | Metrics
    Classifying Reasons of Hotel Reviews with Domain ERNIE and BiLSTM Model
    Zhang Zhipeng, Mao Yusheng, Zhang Liyi
    2022, 6 (9): 65-76.  DOI: 10.11925/infotech.2096-3467.2021.1303
    Abstract   HTML ( 7 PDF(1946KB) ( 60 )  

    [Objective] This paper proposes a classification model to identify reasons of hotel reviews from online booking platforms. [Methods] Firstly, we constructed a pretraining corpus with millions of online reviews and manually annotated the ORSC dataset for the proposed model. Then, we extracted the text features of ORSC dataset by adding the constructed corpus to ERNIE model. Finally, we used the BiLSTM model to merge all features and identify reviews with reasons. [Results] On ORSC datasets, the DERNIE model’s accuracy was 91.33% while the F1 value was 91.20%. After adding BiLSTM features, the accuracy increased to 94.57% and the F1 value became 94.62%. [Limitations] The pre-trained language models require large amount of data from the additional corpus, which might affect the computing speed and efficiency. [Conclusions] Our new model can effectively identify reason sentences from online reviews.

    Figures and Tables | References | Related Articles | Metrics
    CNN-SM: Identifying Words on Defective Products with Sememe and Multi-features
    You Xindong, Yuan Menglong, Zhang Le, Lv Xueqiang
    2022, 6 (9): 77-85.  DOI: 10.11925/infotech.2096-3467.2021.1369
    Abstract   HTML ( 20 PDF(1018KB) ( 51 )  

    [Objective] This paper proposes a CNN model based on the sememe and multi-features, aiming to improve the recognition accuracy of words on defected consumer products. [Methods] First, we created the model’s input with a distributed word vector fused with sememe. Then, we added part-of-speech features and randomly embedded word position vectors to the input. Finally, we removed the max pooling and increased the information contained in the depth vector output by the convolution kernel, which provided sufficient information for word classification. [Results] Compared with the CNN model only adding word position vectors, the proposed method improved the precision, recall and F1 values by 0.021, 0.002 and 0.012, respectively. [Limitations] We need to improve the polarity recognition of the same expression in different scenarios. [Conclusions] The sememe, part-of-speech, and the removal of pooling layer could improve the performance of model for domain word recognition.

    Figures and Tables | References | Related Articles | Metrics
    Extracting Entities for Enterprise Risks Based on Stroke ELMo and IDCNN-CRF Model
    Yang Meifang, Yang Bo
    2022, 6 (9): 86-99.  DOI: 10.11925/infotech.2096-3467.2021.1308
    Abstract   HTML ( 7 PDF(2417KB) ( 55 )  

    [Objective] This paper proposes a new model to learn the text characteristics and contextual semantic relevance, aiming to extract entities for the enterprise risks more effectively. [Methods] Our entity extraction model is based on stroke ELMo embedded in the IDCNN-CRF. First, we used the bidirectional language model to pre-train the large-scale unstructured data for enterprise risks and obtained the stroke ELMo vector as the input feature. Then, we sent it to the IDCNN network for training, and utilized the CRF to process the output layer of IDCNN. Finally, we got the optimal entity sequence labeling for the enterprise risks. [Results] The F value of this proposed model is 91.9%, which is 2.0% higher than the performance of BiLSTM-CRF deep neural network models. The running speed of our model is 2.36 times faster than the BiLSTM-CRF. [Limitations] More research is needed to exmine this model in more fields. [Conclusions] The proposed model provides reference for constructing entity corpus of enterprise risks.

    Figures and Tables | References | Related Articles | Metrics
    Entity Recognition and Labeling for Medical Literature Based on Neural Network
    Zhao Ruijie, Tong Xinyu, Liu Xiaohua, Lu Yonghe
    2022, 6 (9): 100-112.  DOI: 10.11925/infotech.2096-3467.2021.1414
    Abstract   HTML ( 10 PDF(997KB) ( 54 )  

    [Objective] This paper proposes a new entity recognition model, aiming to find new knowledge effectively and improve the utilization of medical papers. [Methods] We constructed a pharmaceutical entity recognition model based on Attention-BiLSTM-CRF and examined it on the public datasets of GENIA Term Annotation Task and BioCreative II Gene Mention Tagging. We also used the model to annotate abstracts of biomedical scientific papers. [Results] The F1 values of our model on the two data sets were 81.57% and 84.23%, while the accuracy rates were 92.51% and 97.85%. These results are better than those of the benchmark ones. Moreover, our model has more advantages in processing the extremely unbalanced data. [Limitations] The volume of data and application of entity labeling experiments are relatively homogeneous. [Conclusions] The proposed model improves the effectiveness of entity recognition and mining of new medical knowledge.

    Figures and Tables | References | Related Articles | Metrics
    Drug Recommendation Based on Graph Neural Network with Patient Signs and Medication Data
    Cheng Quan, She Dexin
    2022, 6 (9): 113-124.  DOI: 10.11925/infotech.2096-3467.2021.1452
    Abstract   HTML ( 7 PDF(2381KB) ( 26 )  

    [Objective] This paper proposes a new drug recommendation algorithm based on the graph neural network integrating patient signs and medication history, aiming to improve the illness diagnosis and treatments. [Methods] First, we constructed a transitive relationship model for abnormal signs and drugs based on the Graph Neural Network(GNN). Then, we designed a precise drug recommendation plan with sign perception and built a heterogeneous graph for the “sign-patient-drug” relationship. Third, our model learned the node representation with sign perception using the R-GCN encoder. Finally, we designed a sign-aware interaction decoder, which integrated the abnormal signs to recommend drugs accurately. [Results] We examined the proposed model with diagnosis and treatment records of three types of diseases from the MIMIC-Ⅲ dataset. Compared with the SVD, NeuMF and NGCF models, the proposed method’s Recall@20 value increased by 5.76, 5.33 and 0.91 percentage point, respectively. Meanwhile, it increased the NDCG@20 value by 5.03, 4.25 and 2.67 percentage point. [Limitations] Our method did not include the dynamic changes of patients’ drug use due to the developments of diseases. [Conclusions] The proposed drug recommendation method is effective and feasible. This model could perceive the impacts of patient signs on medication, which lays foundations for precise drug recommendation algorithm integrating multi-dimensional information.

    Figures and Tables | References | Related Articles | Metrics
    Analyzing Medical Semantic Association with Complex Network
    Zhang Junliang, Fang Xuemei, Zhang Fan, Liu Xiwen, Zhu Peng
    2022, 6 (9): 125-137.  DOI: 10.11925/infotech.2096-3467.2021.1178
    Abstract   HTML ( 11 PDF(14281KB) ( 30 )  

    [Objective] This paper aims to study medical semantic association with the help of complex network. [Methods] First, we constructed a medical semantic association network using the medical semantic concepts as nodes and semantic associations as edges. Then, we analyzed the network characteristics and semantic community. Finally, we created vectors for the semantic concepts and conducted semantic clustering analysis with the neural network. [Results] We retrieved relevant literature on “coronavirus” from MEDLINE of PubMed and built a semantic association network with 43 nodes and 877 edges. Then, we visualized the network characteristics, semantic community and semantic clusters. [Limitations] The experimental data size needs to be expanded. [Conclusions] The proposed network effectively describes the semantic association among medical concepts and benefits medical knowledge discovery services.

    Figures and Tables | References | Related Articles | Metrics
    Reader Preference Analysis and Book Recommendation Model with Attention Mechanism of Catalogs
    Wang Dailin, Liu Lina, Liu Meiling, Liu Yaqiu
    2022, 6 (9): 138-152.  DOI: 10.11925/infotech.2096-3467.2021.1317
    Abstract   HTML ( 26 PDF(3008KB) ( 65 )  

    [Objective] This paper proposes a new reader preference analysis method as well as a personalized book recommendation model (IABiLSTM), aiming to improve the accuracy of the existing algorithms. [Methods] First, we extracted the semantic features of books according to their titles and catalog contents. We used the BiLSTM network to capture the long-distance dependency of the texts and word order context information. We also utilized the Two-layer Self-Attention mechanism to enhance the deeper semantic expression of book catalog features. Then, we analyzed readers’ historical browsing behaviors, which were quantified with interest function. Third, we combined the semantic features of books with readers’ interests to generate their preference vector. Fourth, we calculated the similarity between the vectors of candidate books’ semantic features and readers’ preferences, and predicted the scores for personalized book recommendation. [Results] We examined our model on Douban Reading and Amazon datasets, and set the N value as 50. The MSE,Precision and Recall reached 1.1%, 89.1%, and 85.2%, on the Douban data, while they were 1.2%, 75.2%, and 72.8% with the Amazon data. These performance were better than those of the comparison model. [Limitations] More research is needed to examine our model with other datasets. [Conclusions] The proposed model improves the accuracy of book recommendation, and benefits common NLP tasks.

    Figures and Tables | References | Related Articles | Metrics
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn