Home Table of Contents

25 November 2021, Volume 5 Issue 11
    

  • Select all
    |
  • Li Xiao, Qu Jiansheng
    Data Analysis and Knowledge Discovery. 2021, 5(11): 1-12. https://doi.org/10.11925/infotech.2096-3467.2021.0515
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper reviews the latest application and evolution of meta-analysis in social sciences. [Methods] First, we summarized the main characteristics of meta-analysis as well as the key problems facing the application of meta-analysis in social sciences. Then, we conducted case studies with the MetaBUS and CoDa databases. Finally, we exmined the meta-analysis from different perspectives. [Results] The meta-analysis in social sciences mainly studied the aggregated data meta with traditional methods, such as standardized mean differences and correlation coefficients. At present, the key issues are effect size deviation, lack of transparency and quality assessment, as well as time and manpower consuming, etc. Meta-analytics database and meta-analytic research can benefit each other. Data repository, open science movement and artificial intelligence technology all posed various significant impacts on meta-analytical research. [Limitations] The content analysis is mainly based on sampling samples, so there are potential limitations in comprehensively revealing the characteristics and problems of meta-analysis. [Conclusions] There are still many problems to be addressed for meta-analysis in social sciences, and all parties need to work together to improve these research and draw better conclusions.

  • Sheng Shu, Huang Qi, Yang Yang, Xie Qiwen, Qin Xinguo
    Data Analysis and Knowledge Discovery. 2021, 5(11): 13-28. https://doi.org/10.11925/infotech.2096-3467.2021.0260
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper explores the core framework of message exchange standard——Health Level Seven (HL7) Fast Healthcare Interoperability Resources (FHIR), aiming to standardize medical data formats and disease terms in Chinese. [Methods] We proposed a healthcare data interoperability method based on the FHIR framework. Then, we combined the ontology standardization conceptual model and the Disease Ontology to regulate the expression of disease terms, with ontology construction, mapping and migration techniques. [Results] We retrieved 176 pieces electronic medical records from the YiXiang platform with a Python crawler. After ontology mapping and migration, we fully standardized the medical records and disease term coding using the expression of FHIR data format. [Limitations] We did not standardize the semantics of heterogeneous medical data of multiple types. [Conclusions] This study provides a new perspective for constructing standard medical records system and related technology in China.

  • Yu Chuanming, Zhang Zhengang, Kong Lingge
    Data Analysis and Knowledge Discovery. 2021, 5(11): 29-44. https://doi.org/10.11925/infotech.2096-3467.2021.0491
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This study systematically reviews the internal mechanism and influencing factors of knowledge graph representation models, aiming to investigate their impacts on specific tasks. [Methods] For the link prediction task, we compared the performance of translation-based and semantic matching-based knowledge graph representation models on FB15K, WN18, FB15K-237 and WN18RR datasets. [Results] With the Hits@1 indicator, the TuckER model generated the best value on WN18, FB15K-237 and WN18RR datasets (0.946 0, 0.263 3 and 0.443 0, respectively), while the ComplEx model yielded the highest value on FB15K dataset (0.731 4). [Limitations] We only compared the effects of knowledge graph representation model on the link prediction and knowledge base QA tasks. More research is needed to examine their performance on information retrieval, recommendation system and other tasks. [Conclusions] There are significant differences between the translation-based and the semantic matching-based knowledge graph representation models. The score function, negative sampling, and optimization method of the knowledge graph representation model, as well as the proportion of training data have significant impacts on the results of the link prediction.

  • Ding Hao, Ai Wenhua, Hu Guangwei, Li Shuqing, Suo Wei
    Data Analysis and Knowledge Discovery. 2021, 5(11): 45-58. https://doi.org/10.11925/infotech.2096-3467.2021.0292
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper constructs a prediction model based on hybrid time series to improve the recommendation accuracy. [Methods] First, we constructed a trend prediction model using neural network and fuzzy clustering technique for interest fluctuations at different magnitudes. Then, we utilized neural network to extract and predict the sliding features of small fluctuation series. Finally, we used the membership degree of fuzzy clustering to divide the relationship for large fluctuation series data. [Results] User simulation tests with four groups of experimental data showed that extracted data features for different amplitudes of interest fluctuation yielded more accurate prediction results, which were 19.18% lower than other algorithms’ RMSE and 45.78% higher than other algorithms' Hit-Ratio. [Limitations] The analysis of time fluctuation relies on historical data, therefore, additional cold-start algorithm is needed to preprocess the sparse historical data. [Conclusions] This method could effectively process the fluctuation of interest, and improve the personalized information services.

  • Cheng Tiejun, Wang Man, Huang Baofeng, Feng Lanping
    Data Analysis and Knowledge Discovery. 2021, 5(11): 59-67. https://doi.org/10.11925/infotech.2096-3467.2021.0525
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper tries to predict the development trend of online public opinion in emergencies. [Methods] First, we identified multiple uncertain factors affecting the evolution of online public opinion. Then, we constructed a CEEMDAN-BP prediction model combining Complete Ensemble Empirical Mode Decomposition with Adaptive Noise, phase-space reconstruction and Back Propagation Network. Finally, we conducted an empirical study to examine the new model with three emergencies. [Results] Our CEEMDAN-BP model could better predict the development trend of online public opinion in emergencies. The average absolute errors of prediction in three emergencies were 8.60%, 17.98% and 11.97%, respectively. Our model’s prediction accuracy and stability were better than the existing ones. [Limitations] The experimental data was based on the daily statistics, which could not fully reflect the changing public opinion. [Conclusions] The CEEMDAN-BP model can effectively predict the development trend of online public opinion in emergencies, which helps related departments to prepare for and manage the emergencies.

  • Han Pu, Zhang Wei, Zhang Zhanpeng, Wang Yuxin, Fang Haoyu
    Data Analysis and Knowledge Discovery. 2021, 5(11): 68-79. https://doi.org/10.11925/infotech.2096-3467.2021.0339
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper proposes a multi-channel MCMF-A model for Weibo posts based on feature fusion and attention mechanism, aiming to further explore the semantic information of public health emergency. [Methods] Firstly, we generated word vectors with Word2vec and FastText at the feature vector embedding level, which were merged with the vectors of part-of-speech features and position features. Secondly, we constructed multi-channel layer based on CNN and BiLSTM to extract local and global features of Weibo posts. Thirdly, we utilized the attention mechanism to extract important features of the texts. Finally, we merged the multi-channel output results, and used the softmax function for sentiment classification. [Results] We examined MCMF-A model with 42 384 Weibo posts on COVID-19. The F1 value of the proposed model reached 90.21%, which was 9.71% and 9.14% higher than the benchmark CNN and BiLSTM models. [Limitations] More research is needed to expand the experiment data size to include more small and multi-modal information such as images and voices. [Conclusions] The proposed model could effectively conduct sentiment analysis with Weibo posts.

  • Wang Hong, Shu Zhan, Gao Yinquan, Tian Wenhong
    Data Analysis and Knowledge Discovery. 2021, 5(11): 80-88. https://doi.org/10.11925/infotech.2096-3467.2021.0347
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper proposes a new method to identify implicit discourse relations based on a single classifier and multi-task learning model. [Methods] First,we modeled the implicit and explicit discourse relationships with the multi-task learning method. Then, we converted the four classification problems to two and trained the single classifier. [Results] We examined our new method with the HIT-CDTB data set. For the corpus with extended and parallel relations, the F1 values reached 0.94 and 0.81 respectively, which were significantly improved with four inter-sentence relations. [Limitations] The performance of our model could be improved with more distributed and expanded datasets. [Conclusions] The proposed method yields the best results with the HIT-CDTB data set. Deleting connectives will add noise to the training set and negatively affect the model’s performance.

  • Wang Song, Yang Yang, Liu Xinmin
    Data Analysis and Knowledge Discovery. 2021, 5(11): 89-101. https://doi.org/10.11925/infotech.2096-3467.2021.0544
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper proposes a method to discover the potentialities of user ideas, aiming to effectively identify creative ones from open innovation communities. [Methods] First, we analyzed the formation process of creative value and constructed the dual network structure for user ideas. Then, we developed a model based on graph attention networks to discover their potential values. Third, we trained the model to learn the node characteristics of this dual network and mapped the relationships between networks. [Results] The model was empirically examined with data from a typical open innovation community. The results show that the proposed model achieved an accuracy rate of 90.49%, higher than other relevant baseline models. [Limitations] The model was only validated on the Meizu community dataset, which needs to be expanded to other open innovation communities in future studies. [Conclusions] The combination of the dual network structure and the graph attention network can effectively identify the potential value of user ideas in the open innovation community, which provides technical support for increasing user participation and fully utilizes the community innovation resources.

  • Wu Shengnan, Pu Hongjun, Tian Ruonan, Liang Wenqi, Yu Qi
    Data Analysis and Knowledge Discovery. 2021, 5(11): 102-113. https://doi.org/10.11925/infotech.2096-3467.2021.0323
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper tries to identify the main influencing parameters of the link prediction algorithms with the help of network structures and data from multiple studies. [Methods] We retrieved empirical research on link prediction from China and abroad, which include 5 papers, 22 networks, 26 algorithms and 278 studies. We used three-level meta-analysis and Bayesian network meta-analysis to explore the network structures and their impacts on algorithms’ performance. [Results] The algorithms included in our study generally had a good predictive effect MD=1.183 2 (95%CI: (1.000 5, 1.365 9)). The network density, average degree and clustering coefficient are the main factors affecting the prediction results (Pval<0.05). Katz, LHN-II, MFI, LRW, and SRW algorithms yielded better results with sparse networks and their SUCRA values were greater than 0.5. [Limitations] Our research does not include empirical analysis with large-scale data. [Conclusions] With the help of meta-analysis, our study explores the development directions for the link prediction algorithms.

  • Wu Yanwen, Cai Qiuting, Liu Zhi, Deng Yunze
    Data Analysis and Knowledge Discovery. 2021, 5(11): 114-123. https://doi.org/10.11925/infotech.2096-3467.2021.0548
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper proposes a new method based on multi-source data fusion and scene similarity calculation to accurately recommend digital resources for users. [Methods] First, we constructed a scene model integrating multi-source data, and obtained their abstract representation. Then, we calculated the scene similarity based on the detailed similarity index. Finally, we predicted the scene list and corresponding resources according to their similarity level predictions, and optimized the recommendation results. [Results] Compared with CF Pearson, CF cosine, IOS and user-MRDC, the proposed CF-SSC algorithm performed best on the index MAE (0.688), and was slightly inferior to user-MRDC on the index RMSE (0.936). It required the least number of neighbors (20) to reach the optimal value of MAE and RMSE. [Limitations] Our new algorithm was only tested with small data sets. [Conclusions] The proposed similarity algorithm improves the prediction accuracy and the effectiveness of resource recommendation system.

  • Li Zhenyu, Li Shuqing
    Data Analysis and Knowledge Discovery. 2021, 5(11): 124-134. https://doi.org/10.11925/infotech.2096-3467.2021.0136
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper tries to construct a deep collaborative filtering model that can capture local relevance as well as explicit / implicit feedbacks. [Methods] In the explicit recommendation tasks, we embedded similar groups found by implicit feedback search. Then, we create models for user-item group, user-similar-item group, and item-similar-user group with Multi-Layer Perceptron. [Results] We examined the new algorithm with MovieLens datasets. Compared with existing methods, the MAE and RMSE of our model were reduced by 10.94% and 11.79% respectively. [Limitations] More research is needed to identify the optimal number of the nearest neighbors for different datasets. [Conclusions] The new model could more effectively generate the recommendation results.

  • Ji Youshu, Wang Dongbo, Huang Shuiqing
    Data Analysis and Knowledge Discovery. 2021, 5(11): 135-144. https://doi.org/10.11925/infotech.2096-3467.2021.0311
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper proposes an unsupervised method to automatically extract synonyms from ancient Chinese, aiming to develop more effective algorithms in this field. [Methods] First, we constructed an Ancient-modern Chinese alignment corpus at sentence level. Then, we used the word alignment algorithm to process the corpus. Finally, we extracted the synonyms based on the word alignments. [Results] The proposed method could automatically extract ancient Chinese synonyms. It successfully generated 16,272 sets of synonyms with an accuracy rate of 40.12%. [Limitations] This method does not work with the corpus without Ancient-modern Chinese sentence level alignment. More research is needed to improve the effects of word segmentation and alignment algorithms, which will yield better extraction results. [Conclusions] The proposed method could expand the manually compiled thesaurus, and lead human computing research to the semantic level.

  • Dong Miao, Su Zhongqi, Zhou Xiaobei, Lan Xue, Cui Zhigang, Cui Lei
    Data Analysis and Knowledge Discovery. 2021, 5(11): 145-152. https://doi.org/10.11925/infotech.2096-3467.2021.0671
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper tries to improve the performance of PubMedBERT for CID entity relation classification. [Methods] We proposed a classification model based on PubMedBERT, which was also fine-tuned by Text-CNN. Then, we input entity pairs and sentence pairs to the model. Third, we used PubMedBERT to encode CID texts and obtained their global features. Finally, we captured important local information from the global features with Text-CNN to decide whether entity pairs have CID relation. [Results] The precision, recall and F1 value of this method on the BioCreative V CDR dataset reached 78.3%, 73.5% and 75.8% respectively, which were at least 3.1%, 1.5% and 3.3% higher than other methods. [Limitations] This model only examines CID texts, and more research is needed to evaluate its performance on clinical data or corpus of other domains. [Conclusions] This method can capture the features of CID texts and improve their entity relation classification.