Current Issue
    , Volume 6 Issue 7 Previous Issue    Next Issue
    For Selected: View Abstracts Toggle Thumbnails
    Original article
    Graph Databases for Complex Network Analysis
    Liu Chunjiang, Li Shuying, Hu Hanlin, Fang Shu
    2022, 6 (7): 1-11.  DOI: 10.11925/infotech.2096-3467.2021.1168
    Abstract   HTML ( 25 PDF(837KB) ( 154 )  

    [Objective] This paper systematically reviews the progress and trends of graph database research and applications for complex network analysis. [Coverage] We searched the Web of Science, Scopus, and CNKI database for Chinese and English literature. A total of 15 graph databases and open-source packages, 21 practical cases, and 14 research papers were retrieved. [Methods] First, we compared the mainstream graph database products from China and abroad. Then, we explored the latest solutions for complex network analysis, including algorithms (such as centrality, path finding, link prediction, and community detection), graph visualization, performance and related applications. [Results] The graph database has become an important analysis tool and research method for complex network analysis and big data mining. They also work closely with graph computing engines for complex network analysis. [Limitations] This paper only examined a few representative cases. [Conclusions] The graph database could effectively query, represent and analyze complex network data for their patterns or structures. Their presentation of multi-dimensional data is crucial for mining implicit relationships.

    Figures and Tables | References | Related Articles | Metrics
    Review of Studies Identifying Disruptive Technologies
    Zhang Jinzhu, Wang Qiuyue, Qiu Mengmeng
    2022, 6 (7): 12-31.  DOI: 10.11925/infotech.2096-3467.2022.0142
    Abstract   HTML ( 15 PDF(5497KB) ( 89 )  

    [Objective] This paper reviews the literature identifying disruptive technologies, aiming to examine research topics and development trends, as well as establish a framework for further studies. [Coverage] We searched Chinese and English papers from CNKI and Web of Science with relevant keywords. We retrieved 1 974 papers published between 2011 and 2020 for quantitative analysis, and 61 papers published between 2001 and 2020 for qualitative analysis. [Methods] First, we identified the popular topics and development trends through quantitative analysis. Then, we examined the highly cited papers and the latest literature to review their research methods. Finally, we built a framework based on the results of quantitative and qualitative analysis which also predicted future trends. [Results] Studies identifying disruptive technologies were more popular in the fields of information technology, medical treatment, chemical industry, and high-end manufacturing. They included multiple-methodology from the perspectives of technologies themselves, products, sci-tech information mining, and external environment. We established three frameworks for disruptive technology identification and explored some future developments. [Limitations] More research on macro indicators, such as society- and economy-related issues, need to be reviewed comprehensively. [Conclusions] The research on disruptive technology identification has become inter-disciplinary, which include more quantitative methodology and the nonlinear algorithms based on deep learning.

    Figures and Tables | References | Related Articles | Metrics
    Recommending Medical Literature with Random Forest Model and Query Expansion
    Ding Hao, Hu Guangwei, Qi Jianglei, Zhuang Guangguang
    2022, 6 (7): 32-43.  DOI: 10.11925/infotech.2096-3467.2021.1148
    Abstract   HTML ( 22 PDF(2024KB) ( 78 )  

    [Objective] This paper tries to find valuable contents from a large number of medical literatures, aiming to help physicians make diagnosis and improve medical literature recommendation. [Methods] We proposed a new method based on the random forest model and keyword query expansion. First, we used the MeSH dictionary and the automatically constructed acronym dictionary to establish the complete relationship between keywords and corresponding articles at three levels of sentence, paragraph and document. Then, we calculated the multiple similarity between topics and articles. For each article, the PageRank and Authority weights of HITS were calculated through the citation network in the literature set. [Results] Compared with the average of the 10 values with the highest NDCG@100 value from the TREC clinical decision support follow-up evaluation, the overall average difference of the proposed method was within 0.9%, which was very small. [Limitations] Some new literatures or the “Sleeping Beauty” literature may have lower retrieval ranking due to low citation in the early stage. Our method cannot make accurate recommendations for these papers. [Conclusions] The proposed method effectively improves the medical literature recommendation.

    Figures and Tables | References | Related Articles | Metrics
    Subject Topic Mining and Evolution Analysis with Multi-Source Data
    Li Hui, Hu Jixia, Tong Zhiying
    2022, 6 (7): 44-55.  DOI: 10.11925/infotech.2096-3467.2021.1296
    Abstract   HTML ( 15 PDF(4549KB) ( 103 )  

    [Objective] This paper examines the evolution of research topics, which helps researchers quickly identify the status quo and trends in their fields. [Methods] First, we merged multi-source datasets and divided the domain research topics by time period. Then, we calculated topic importance with their popularity, density, and closeness centrality. Third, we utilized topic semantic similarity to identify the related ones from adjacent time periods. Finally, we combined the topic importance fluctuation and the topic similarity to decide their evolution types and paths. [Results] We examined our model with papers on artificial intelligence and analyzed the changes of topics in the past 20 years. We identified the popular research topics and their evolution paths, which showed obvious thematic fusion and split development in four periods. [Limitations] The topic naming rules could be more effective and we could not show the whole life cycle of the booming artificial intelligence research. [Conclusions] The proposed model could effectively reveal the topic evolution of research.

    Figures and Tables | References | Related Articles | Metrics
    Mining Online User Profiles and Self-Presentations: Case Study of NetEase Music Community
    Wu Jiang, Liu Tao, Liu Yang
    2022, 6 (7): 56-69.  DOI: 10.11925/infotech.2096-3467.2021.1449
    Abstract   HTML ( 25 PDF(1742KB) ( 95 )  

    [Objective] This paper explores patterns, evolutionary laws, group differences and influences on community recognition of online users’ self-presentation topics. [Methods] Firstly, we identified online users of NetEase music community and constructed their profiles from the perspectives of qualification and participation. Then, we adopted the BERT model to cluster users’ short comments, and identified their self-presentation topics. Third, we utilized cosine similarity to analyze the evolution of topics and group differences. Finally, we used covariance to analyze the impacts of self-presentation topics on community recognition. [Results] There are eight self-presentation topics, while the proportion of “reviews” decreased and “recollection” increased. “Interaction”topics were more popular in “relax” style than in others. The proportion of each topic at different time was almost the same. Under the themes of “recollection”, the cosine similarity value of quality users was higher than those of other users. The cosine similarity of continuous participants was higher than those of the inactive participants. The impact of users’ self-presentation topics on their community recognition was significant at the 0.1 level. [Limitations] More research is needed to examine users of other online communities. [Conclusions] “Recollection” is the most popular one among users’ self-presentation topics, which are affected by styles and time. There was a diversity trend for the topics with the development of the community, as well as obvious differences among user groups.

    Figures and Tables | References | Related Articles | Metrics
    Detecting Signals of Adverse Drug Reactions with Data from Online Health Community
    Guo Jinjing, Xia Guanghui, Huang Qi, He Liyun, Zhang Huabing
    2022, 6 (7): 70-86.  DOI: 10.11925/infotech.2096-3467.2021.1263
    Abstract   HTML ( 12 PDF(981KB) ( 47 )  

    [Objective] Online health communities provide new information for detecting adverse drug reaction (ADR) signals. This study identifies ADR signals from patients’ reviews and generates early warnings for potential side-effects of antidiabetic drugs. [Methods] First, we retrieved patients’ reviews (adverse reactions) on antidiabetic drugs from Ask a Patient website. Then, we combined natural language processing techniques and lexicons (UMLS and MedDRA) to normalize and map these reviews. Third, we constructed a drug-ADR co-occurrence matrix and used the PRR method to identify drug-ADR pairs meeting the signal detection threshold. Finally, we invited expert to interpret the extracted results, which were evaluated with Drugs.com standards. [Results] A total of 539 drug-ADR pairs were identified, with an overall identification accuracy of 85% and recall of 82%. [Limitations] The accuracy of identifying ADR terms was affected by the inclusion of non-ADR terms, such as examination, surgical operation, and social environment from MedDRA. [Conclusions] The proposed model enriches the data sources and methods of ADR signal detection.

    Figures and Tables | References | Related Articles | Metrics
    Building Multi-Source Semantic Knowledge Graph for Drug Repositioning
    Zhang Han, An Xinyu, Liu Chunhe
    2022, 6 (7): 87-98.  DOI: 10.11925/infotech.2096-3467.2021.1364
    Abstract   HTML ( 11 PDF(2000KB) ( 56 )  

    [Objective] This paper constructs a cross-platform semantic knowledge graph with whole datasets, which helps us find novel drug knowledge. [Methods] First, we developed a new model for the proposed knowledge graph, which integrated semantic relations from PubMed, DrugBank and CTD, as well as knowledge fusion and attribute definition. Then, we conducted drug repositioning with pathway identification and link predication to discover new treatments for cancers. [Results] The F-score of pathway identification (0.57) was better than that of the linkage predication (0.56). The more pathways existing between drugs and indications, the greater possibility of predicting positively. [Limitations] Since the reasoning mechanism was based on the existing associations among knowledge units, it is hard to discover the novel indications for drugs without the known targets. It is difficult to update knowledge graph dynamically due to the huge data volume. [Conclusions] The proposed knowledge graph could effectively find new drug indications as well as improve the efficiency for drug research and development.

    Figures and Tables | References | Related Articles | Metrics
    Matching Similar Cases with Legal Knowledge Fusion
    Zheng Jie, Huang Hui, Qin Yongbin
    2022, 6 (7): 99-106.  DOI: 10.11925/infotech.2096-3467.2022.0040
    Abstract   HTML ( 10 PDF(999KB) ( 115 )  

    [Objective] This paper constructs a model to match similar cases with integrated legal knowledge, aiming to improve the accuracy of case matching. [Methods] First, we concatenated the legal knowledge with the case texts, which helped the model learn characteristics of legal knowledge and text information simultaneously. Then, we used the LSTM network to model text segmentally, and increased the length of the accommodated texts. Finally, we used triplet loss and adversarial-based contrastive loss to jointly train the model and enhanced its robustness. [Results] The proposed model significantly improved the accuracy of similar case matching, which is 7.07% higher than the baseline BERT model. [Limitations] We used longer text sequences for matching, which is more time consuming than other models. [Conclusions] The proposed model has stronger matching and generalization ability, which helps legal case retrieval.

    Figures and Tables | References | Related Articles | Metrics
    STNLTP: Generating Chinese Patent Abstracts Based on Integrated Strategy
    Zhang Le, Du Yifan, Lü Xueqiang, Dong Zhian
    2022, 6 (7): 107-117.  DOI: 10.11925/infotech.2096-3467.2021.1307
    Abstract   HTML ( 24 PDF(1037KB) ( 42 )  

    [Objective] This paper proposes an abstracting model for Chinese patents based on integration strategy (STNLTP), aiming to reduce the duplication and long document dependency issues of the existing automatic abstracting techniques. [Methods] First, we introduced a patent term dictionary, and used the sememe vector based on SAT model to represent traditional Chinese medicine patents. Then, with the help of integration strategy, we utilized the TextRank, Lead4 and NMF models to extract key sentences from the patents. Third, we identified the optimal key sentences with the clustering and redundancy removing. Finally, we processed these optimal key sentences with the pointer-generator network based on Transformer character vector to create the abstracts. [Results] Our new model successfully combined the extractive and generative methods. Compared with the existing RLCPAR model, we improved the evaluation indicators of ROUGE-1, ROUGE-2 and ROUGE-L by 2.00%, 9.73% and 2.35%, respectively. [Limitations] There are still some errors in the new abstracts. [Conclusions] The new STNLTP model could effectively generate Chinese patent abstracts.

    Figures and Tables | References | Related Articles | Metrics
    Identifying Financial Text Causality with Bi-LSTM and Two-way CNN
    Zhang Shunxiang, Zhang Zhenjiang, Zhu Guangli, Zhao Tong, Huang Ju
    2022, 6 (7): 118-127.  DOI: 10.11925/infotech.2096-3467.2021.1344
    Abstract   HTML ( 12 PDF(1487KB) ( 72 )  

    [Objective] This paper proposes a network model with Bi-LSTM and two-way CNN, which addresses the missing characteristic information for causality identification and improves its accuracy. [Methods] First, we used the Bi-LSTM to generate the text feature matrix for the financial texts. Then, we extracted the causal features from the matrix using two-way CNN with different convolution cores. Third, we spliced the feature vectors obtained by maximum and average pooling methods. Finally, we transferred the spliced vectors to the full connection layer for output. [Results] The accuracy of our new model reached 82.3%, which is at least 3% higher than those of the existing methods. [Limitations] We did not establish specific function module for the financial texts. [Conclusions] The proposed model could effectively identify the causality from the documents.

    Figures and Tables | References | Related Articles | Metrics
    Evolution of Public Sentiments During COVID-19 Pandemic
    Bian Xiaohui, Xu Tong
    2022, 6 (7): 128-140.  DOI: 10.11925/infotech.2096-3467.2021.0711
    Abstract   HTML ( 18 PDF(4275KB) ( 95 )  

    [Objective] This study analyzes the social media posts during the COVID-19 pandemic, aiming to reveal the temporal and spatial differences of public opinion, the sentiment evolution under different circumstances, as well as the trans-regional spreading of the public sentiments. [Methods] Firstly, we utilized the Latent Dirichlet Allocation (LDA) model to generate the latent topics and related keyword groups, which also analyzed public sentiment evolutions from the perspectives of global and individual topics. Then, we described the trans-regional spread of public sentiments based on the social spread model adapted from the classic Independent Cascade Model. [Results] The new model summarized the general rules of the temporal evolution and spatial difference, as well as the impacts of distance to the epidemic centers and the financial levels. We also found two different types of topics indicating reasons for popularity and sentiment differences, as well as multi-view connections among these topics. The strength of trans-regional sentiment spread could be affected by both regional distance and epidemic situation. [Limitations] The new framework could not process the multimodal data. [Conclusions] The proposed model helps the local government make better strategies according to specific conditions, and pay more attention to the impacts of related events. They should also strengthen regional cooperation and coordination for controlling pandemics and monitoring public sentiments.

    Figures and Tables | References | Related Articles | Metrics
    A Text-Aligned Cross-Language Sentiment Classification Method Based on Adversarial Networks
    Yang Wenli, Li Nana
    2022, 6 (7): 141-151.  DOI: 10.11925/infotech.2096-3467.2021.1462
    Abstract   HTML ( 7 PDF(1854KB) ( 58 )  

    [Objective] The paper tries to improve the accuracy of cross-language sentiment classification by narrowing the distribution of bilingual text pairs in the shared space. [Methods] In the process of emotional knowledge transfer, we aligned the word and text pairs simultaneously by adjusting the balance coefficient. Then, we combined the language discriminator to generate the conversion matrix for adversarial network optimization. Finally, we used a multi-feature fusion hierarchical neural network to represent the texts, the contexts, as well as the topic relevance of words and sentences, which addressed the issue of long-distance feature dependence of the texts. [Results] We examined our model on the NLP&CC 2013 standard data sets and the average cross-language sentiment classification accuracy was 83.66%, which was 2.30% higher than the benchmark model. [Limitations] This method was only tested with Chinese and English datasets. More research is needed to evaluate its effectiveness with other languages. [Conclusions] Improving the similarity of bilingual texts could effectively increase the accuracy of cross-language sentiment classification.

    Figures and Tables | References | Related Articles | Metrics
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn