Data Analysis and Knowledge Discovery

Select

Exploring the Generation Mechanism of User’s Danmaku Commenting Behavior in Reaction Videos——Based on Cognitive-Affective Personality System Theory

Ye Xujie, Zhao Yuxiang, Zhang Xuanhui

Data Analysis and Knowledge Discovery. 2023, 7(2): 1-14. https://doi.org/10.11925/infotech.2096-3467.2022.0968

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims at exploring the underlying reasons for the generation of users’ danmaku commenting behavior in reaction videos, and contributing to the literature on value co-creation in the content creation of danmaku videos. [Methods] This paper takes reaction videos of the Bilibili video website as examples. By selecting the danmaku resources of 11 popular videos in different camps as samples, we conduct open coding using the grounded theory approach. Based on the Cognitive-Affective Personality System Theory (CAPS) framework, this paper builds a theoretical model of the generation mechanism of user’s danmaku commenting behavior in reaction videos. [Results] The results suggest that the user’s danmaku commenting behavior in reaction videos based on CAPS theory generally follows the path of “situation-Cognitive-Affective Units -behavior”. In addition, users’ knowledge accumulation will also directly affect danmaku commenting behavior. [Limitations] The model constructed using grounded theory may have subjective bias. There is a pressing need to test the generalizability of the model based on the further analysis of large sample reaction videos. [Conclusions] This model yields implications for promoting the dissemination of emerging digital content, and sheds light on the value-added, value transformation and value co-creation in the content creation of danmaku video.

Select

Knowledge Graph Completion Model Based on Entity and Relation Fusion

Zhang Zhengang, Yu Chuanming

Data Analysis and Knowledge Discovery. 2023, 7(2): 15-25. https://doi.org/10.11925/infotech.2096-3467.2022.1027

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study aggregates the global information of knowledge graph through a weighted graph convolutional neural network and a relational induction mechanism, aiming to enhance the quality of the knowledge graph representation and completion. [Methods] We proposed an end-to-end learning model for the knowledge graph completion task, which included a neighborhood information aggregation module, an entity relationship fusion module, an interaction module, as well as a prediction module. This new model aggregates the neighborhood information of entities to enrich their representations. It also enhances the interaction between entities and relationship representations with a core tensor. [Results] We examined the new model with the FB15K237, WN18RR, Kinship, and UMLS datasets. Compared with traditional knowledge graph completion models, the Hits@1 indicators of the proposed model increased by 4.1%, 3.9%, 17.8%, and 5.3% on the four datasets, respectively. [Limitations] We did not explore the performance of our new model on information retrieval and recommendation systems. [Conclusions] The proposed model significantly improves the effectiveness of the knowledge graph completion, which helps us identify missing information in knowledge graphs and may benefit information retrieval and automatic Q&A applications.

Select

Analysis of Neural Network Modules for Named Entity Recognition of Chinese Medical Texts

Duan Yufeng, He Guoxiu

Data Analysis and Knowledge Discovery. 2023, 7(2): 26-37. https://doi.org/10.11925/infotech.2096-3467.2022.0908

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper decomposes the named entity recognition models based on neural network for Chinese medical texts. We investigate the impacts of single neural network module and the collaboration of multiple modules on the entity recognition performance. [Methods] First, we chosed the benchmark datasets from CCKS2017, CCKS2019, and IMCS-NER for named entity recognition tasks. Then, we conducted extensive experiments to compare the performance of different single modules of the aforementioned layers. Third, we built and compared entity recognition models based on ensemble, parallel, and serial neural models. [Results] Using hfl/chinese-macbert-base, hfl/chinese-roberta-wwm-ext, hfl/chinese-bert-wwm-ext in the symbolic representation layer significantly improved the performance of entity recognition models, the average F1-scores reached 0.8816, 0.8816 and 0.8812 respectively. Stacking neural models at the context encoding layer improved the performance of the neural network. Moreover, ensembled neural networks could achieve the best performance, the F1-scores reached 0.9330, 0.8211 and 0.9181 respectively. [Limitations] More research is needed to examine our findings with datasets in other languages. [Conclusions] The characteristics of single neural modules and their collaboration could significantly affect the performance of the named entity recognition of Chinese medical texts.

Select

Recognizing Intensity of Medical Query Intentions Based on Task Knowledge Fusion and Text Data Enhancement

Zhao Yiming, Pan Pei, Mao Jin

Data Analysis and Knowledge Discovery. 2023, 7(2): 38-47. https://doi.org/10.11925/infotech.2096-3467.2022.0919

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a recognition model for the intensity of medical query intentions based on task knowledge fusion and text enhancement, aiming to improve the representation of query word vectors, as well as expand labeled data sets. [Methods] First, we used the SimBERT model to realize the text data enhancement of small task data set. Then, we utilized the medical query text corpus to incrementally pre-train the BERT model and obtain the MQ-BERT (Medical-Query BERT) model with task knowledge. Finally, we introduced the Bi-LSTM and other models to compare the classification performance before and after text data enhancement. [Results] The F-Score of our new MQ-BERT model reached 92.22%, which is superior than the existing models by Alibaba team on the same task data set (F-Score=87.5%). With the text data enhancement, the classification performance of our new model was also improved (F-Score=95.34%), which is 7.84% higher than the MC-BERT one. [Limitations] The data selection of incremental pre-training process could be further optimized. [Conclusions] Task knowledge fusion and text data enhancement can effectively improve the recognition accuracy of the intensity of medical query intentions, which benefits the developments of medical information retrieval system.

Select

Detecting Mis/Dis-information from Social Media with Semantic Enhancement

Wang Hao, Gong Lijuan, Zhou Zeyu, Fan Tao, Wang Yongsheng

Data Analysis and Knowledge Discovery. 2023, 7(2): 48-60. https://doi.org/10.11925/infotech.2096-3467.2022.0923

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper builds an automated detection model to effectively identify mis/dis-information from social media, aiming to balance the speed and accuracy of processing massive data. [Methods] The classification model is the mainstream processing technique to detect for mis/dis-information. However, most of them could not extract deep semantic features from the texts. Therefore, we used the single text feature BFID model (BERT False-Information-Detection) as the benchmark model, and proposed two new methods with fused semantic enhancement to detect the mis/dis-information. [Results] We examined the new models with data from Sina Weibo. The accuracy of the model based on fused sentiment feature BFID-SEN (BFID-Sentiment) increased about 1.59 percentage point, while the accuracy of model with fused image feature BFID-IMG (BFID-Image) model improved by 0.78 percentage point. [Limitations] The ability to fuse semantic enhancement is limited due to the small corpus size, sentiment categories and multimodal disinformation training datasets. [Conclusions] The proposed methods are able to more effectively identify false information from social media.

Select

Designing and Implementing Automatic Title Generation System for Sci-Tech Papers

Wang Yufei, Zhang Zhixiong, Zhao Yang, Zhang Mengting, Li Xuesi

Data Analysis and Knowledge Discovery. 2023, 7(2): 61-71. https://doi.org/10.11925/infotech.2096-3467.2022.0933

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper designs an automatic title generation system based on Chinese sci-tech papers’ abstracts, aiming to help researchers compose better titles. [Methods] First, we constructed a large-scale training dataset based on the CSCD database. Then, we created a title generation model with the help of BERT-UniLM. Finally, we designed the system interface using HTTP protocol to enable open calls. [Results] The implemented system could generate titles for articles appropriately. [Limitations] Since the BERT model limits its maximum token length, our new system automatically truncates abstracts exceeding the length limits and might affect the title generation. [Conclusions] This paper provides convenient tools for researchers and literature services, and also benefits automatic generation of titles for other scientific and technological documents.

Select

A Fine-Grained Sentiment Recognition Method Based on OCC Model and Triggering Events

Shen Lining, Yang Jiayi, Pei Jiaxuan, Cao Guang, Chen Gongzheng

Data Analysis and Knowledge Discovery. 2023, 7(2): 72-85. https://doi.org/10.11925/infotech.2096-3467.2022.0957

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to enrich the event logic of traditional fine-grained sentiment analysis from the perspective of emotion-triggering events. [Methods] We analyzed the OCC model’s sentiment generation rules and conditions to create the <event, sentiment> binary groups using event extraction and text classification methods. [Results] The proposed model constructed rules for emotion generation and built a theoretical basis for classifying sentiments. The model effectively identified emotion-triggering events (F1=0.933 8) and sentiments (F1=0.963 7). It generated <event, sentiment> binary groups (F1=0.889 2) to realize event-level fine-grained sentiment analysis. [Limitations] The structure of sentiment generation rules is simple and cannot reflect the diversity of netizens’ emotions. The corpus built at present has domain limitations and each corpus only contains one type of emotion-triggering event. [Conclusions] By associating event evaluations and emotions with the help of the OCC model, our new model becomes more in line with human thinking. The model has good interpretability and transferability, which enhances the granularity level of emotional objects in existing studies. It provides new ideas for research in the field of textual sentiment analysis.

Select

Extracting Emotion-Cause Pairs Based on Multi-Label Seq2Seq Model

Zhang Siyang, Wei Subo, Sun Zhengyan, Zhang Shunxiang, Zhu Guangli, Wu Houyue

Data Analysis and Knowledge Discovery. 2023, 7(2): 86-96. https://doi.org/10.11925/infotech.2096-3467.2022.0985

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper explores new algorithms to extract emotion-cause pairs based on multi-label Seq2Seq model. [Methods] First, we used the BERT pre-training to obtain semantically rich word vectors. Then, we utilized the Bi-GRU and LSTM to obtain the global and local features of the texts. Finally, we introduced the hybrid attention mechanism to merge the features and improve the integrity of these semantic features. [Results] Compared with the latest FSS-GCN model, the F1 value of our new model for emotional cause pairs increased by 0.98 percentage point and 11.60 percentage point on two data sets. The F1 value of emotion extraction increased by 0.87 percentage point and 1.10 percentage point, while the F1 value for cause extraction increased by 0.79 percentage point and 2.31 percentage point respectively. [Limitations] Our new model mainly examined the explicit emotion-cause pairs and did not explore implicit emotion-cause pairs. [Conclusions] The proposed model improves the F1 values of extracting emotion-cause pairs.

Select

Detecting Fake News Based on Title-Content Difference

Liu Shang, Shen Yifan

Data Analysis and Knowledge Discovery. 2023, 7(2): 97-107. https://doi.org/10.11925/infotech.2096-3467.2022.0293

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a fake news detection method based on the difference between news titles and contents, aiming to address the issues of extracting features from short news texts or retrieving comments. [Methods] Firstly, we designed the Cos-Gap calculation method to obtain the difference between news titles and contents’ textual and emotional features. Then, we constructed a News Differential Heterogeneous Graph Network (NDHN) based on the obtained differential features and the Heterogeneous Graph Attention Networks. The NDHN contains edges constructed based on differential features and nodes constructed based on semantic and emotional features of title, content, and emotion. [Results] We examined the proposed model on the GossipCop dataset and found that the NDHN can improve the classification accuracy by 2.7% and the F1 by 3.2%. [Limitations] This method is suitable for analyzing the news with title and has limitations for untitled texts from Sina Weibo or Twitter. [Conclusions] The new model could effectively detect fake news from social media.

Select

AEMIA:Extracting Commodity Attributes Based on Multi-level Interactive Attention Mechanism

Su Mingxing, Wu Houyue, Li Jian, Huang Ju, Zhang Shunxiang

Data Analysis and Knowledge Discovery. 2023, 7(2): 108-118. https://doi.org/10.11925/infotech.2096.3467.2022.1083

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper develops a new model to improve the perception of structural features and correlation between text features, aiming to fully explore the internal semantics and extract attributes. [Methods] First, we extracted the features of text, syntax and part of speech. Then, we merged different features to obtain complete text structure features. Third, we designed a multi-layer interactive attention mechanism, which focuses on the deep correlation between text structural features and text features. Fourth, we adopted bilinear fusion strategy to ensure the information integrity. Finally, we extracted attributes with common classifiers. [Results] We examined the new model with publicly available data sets, and found its extraction accuracy was at least 1.2 percentage point higher than that of the existing methods. [Limitations] The model was insensitive to implicit attribute words, and the performance of the model will be greatly reduced with more than three implicit attribute words in the sentence. [Conclusions] The proposed method can effectively improve the accuracy of commodity attributes extraction.

Select

Early Identification of Star Inventor Types in the Perspective of Innovation Duality

Liu Xiang, Liu Xiang, Yu Bowen

Data Analysis and Knowledge Discovery. 2023, 7(2): 119-128. https://doi.org/10.11925/infotech.2096-3467.2022.0330

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] Identifying the star inventors by the number of patents and patent citations has obvious time lag effects. Therefore, this paper constructs a graph convolutional neural network to find the emerging star inventors effectively. [Methods] This paper defines four types of star inventors: “composite”, “consolidation”, “breakthrough” and “development” which can also be grouped as “continuity innovation” and “breakthrough innovation”. Then, we constructed a model based on graph convolutional neural network combining patent titles and the cooperation relationship to find star inventors. [Results] We examined our model with patent data in the field of molecular biology and microbiology. The overall accuracy of this model in identifying the innovation types of star inventors reached 79.4%, which was about 15% higher than the method using word vectors. [Limitations] The proposed model could not identify “breakthrough star inventors” effectively. [Conclusions] Our new model could reduce the time-lag effect of the existing methods and identify the innovation type of star inventors earlier.

Select

Mapping and Analyzing Personal Academic Trajectory from Multiple Dimensions

Xie Zhen, Ma Jianxia, Hu Wenjing

Data Analysis and Knowledge Discovery. 2023, 7(2): 129-140. https://doi.org/10.11925/infotech.2096-3467.2022.0329

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a multi-dimensional framework for visualizing personal academic trajectories. [Methods] Guided by a timeline, we employed statistical analysis, semantic technology, and visualization tools to represent a scholar’s academic trajectory from the dimensions of research output, research theme, research context, and content evolution. [Results] We examined the proposed model with two scholars of cryosphere science. Compared with the existing tools, the proposed framework expands the dimension of data analysis and enriches the visualization. [Limitations] The data sources mainly came from scholarly articles, and other academic achievements, such as patents and projects, need further integration. Moreover, integrating multiple software tools during the mapping process requires further work. [Conclusions] This method can be used in academic profiling, scholarly evaluation, and selecting representative works, which provides a reference for integrating and analyzing personal academic achievements.

Select

Constructing Large-scale Knowledge Graph for Massive Sci-Tech Literature

Du Yue, Chang Zhijun, Dong Mei, Qian Li, Wang Ying

Data Analysis and Knowledge Discovery. 2023, 7(2): 141-150. https://doi.org/10.11925/infotech.2096-3467.2022.0328

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper builds a large-scale knowledge graph for scientific research, which meets the needs of sci-tech information services and improves the data consistency of traditional models. [Methods] First, we proposed an implicit knowledge graph construction method. Then, we used the identification tools for entity feature fields and implicit relationships to continuously update entities and discover entity relationship. [Results] We examined the proposed model with big data platform for PB-level sci-tech literature. Once there are changes in the entity data, the implicit knowledge graph will only update the entity data and will not modify their relationship. The model could retrieve all scholars from one institution through the predefined interface, and the average processing time was one hundredth of the triple-type knowledge graph. [Limitations] It is difficult to solidify the situation not satisfying the implicit relational data structure, and the entity data must be stored in a technical cluster with search engine. [Conclusions] The proposed method could effectively improve the data consistency issue due to changes in entity information. It helps us construct large-scale scientific research knowledge graph, which benefits the management, dissemination and utilization of sci-tech knowledge.

Please choose a citation manager

Content to export

25 February 2023, Volume 7 Issue 2

模态框（Modal）标题

Please choose a citation manager

Content to export

25 February 2023, Volume 7 Issue 2