Data Analysis and Knowledge Discovery

Select

An Analysis on the Basic Technologies of ChatGPT

Qian Li, Liu Yi, Zhang Zhixiong, Li Xuesi, Xie Jing, Xu Qinya, Li Yang, Guan Zhengyi, Li Xiyu, Wen Sen

Data Analysis and Knowledge Discovery 2023, 7 (3): 6-15. DOI: 10.11925/infotech.2096-3467.2023.0229

Abstract （1845）

HTML （146）

PDF （1060KB）（1123）

Save

[Objective] Review and analyze the corpus, algorithms and models related to ChatGPT, and provide a systematic reference for peer research. [Methods] This paper systematically reviewed the relevant literature and materials since the release of GPT-3. We depict the overall architecture of ChatGPT technology, and explain and analyze the models, algorithms, and principles behind it. [Results] This paper restores the technical details that support ChatGPT functionality based on limited information through literature research. Rationalizing the overall technical architecture diagram of ChatGPT and explaining each technical component of it. The algorithmic principles and model composition of each technical component of ChatGPT is analyzed at three levels: the corpus system, the pre-training algorithm and model, and the fine-tuning algorithm and model. [Limitations] The investigation of the literature related to ChatGPT inevitably has omissions, and the interpretation of some technical contents is not deep enough. Some contents inferred by the authors may be incorrect. [Conclusions] The breakthrough in the application of ChatGPT technology is the result of continuous accumulation through iterative training of corpora, models and algorithms, as well as the effective combination and integration of various algorithmic models.

Table and Figures | Reference | Related Articles | Metrics

Select

Extracting Chinese Information with ChatGPT：An Empirical Study by Three Typical Tasks

Bao Tong, Zhang Chengzhi

Data Analysis and Knowledge Discovery 2023, 7 (9): 1-11. DOI: 10.11925/infotech.2096-3467.2023.0473
Accepted: 12 June 2023
Online available: 12 September 2023

Abstract （1360）

HTML （118）

PDF （868KB）（800）

Save

[Objective] This paper evaluates the performance of typical Chinese information extraction tasks such as named entity recognition, relationship extraction, and event extraction with ChatGPT. It also analyzes the performance differences of ChatGPT in different tasks and domains, which provides recommendations for ChatGPT in Chinese contexts. [Methods] We used manual prompts to evaluate the test results with exact matching or loose matching on three typical information extraction tasks across seven datasets. We evaluated the named entity recognition of ChatGPT on MSRA, Weibo, Resume, and CCKS2019 datasets and compared it with GlyceBERT and ERNIE3.0 models. We extracted the relationships with ChatGPT and ERNIE3.0 Titan on FinRE and SanWen datasets. We ran the event extraction of ChatGPT and ERNIE3.0 on the CCKS2020 dataset. [Results] In the named entity recognition task, ChatGPT was outperformed by GlyceBERT and ERNIE3.0 models. ERNIE3.0 Titan was also superior to ChatGPT significantly in the relationship extraction task. In the event extraction task, ChatGPT’s performance was slightly better than ERNIE3.0 under loose matching. [Limitations] The evaluation of ChatGPT’s performance using prompts is subjective, and different prompts may lead to different results. [Conclusions] ChatGPT needs to improve its performance on typical Chinese information extraction tasks, and users should choose appropriate prompts for better results.

Table and Figures | Reference | Related Articles | Metrics

Select

The Influence of ChatGPT on Library & Information Services

Zhang Zhixiong, Yu Gaihong, Liu Yi, Lin Xin, Zhang Menting, Qian Li

Data Analysis and Knowledge Discovery 2023, 7 (3): 36-42. DOI: 10.11925/infotech.2096-3467.2023.0230

Abstract （1346）

HTML （81）

PDF （565KB）（1117）

Save

[Objective] This paper aims to discuss the inspiration and influence of artificial intelligence (AI) technologies represented by ChatGPT on Literature & Information Service, and put forward suggestions for the Literature & Information Service field. [Methods] This paper explores the essence of the rapid breakthrough of AI technologies based on the evolution of AI, analyzes the impact on Literature & Information Service based on the technical capability of ChatGPT, and proposes suggestions for the development of the Literature & Information Service field to take full advantages and values of Literature & Information Service. [Results] Five insights from the rapid development of AI technology for Literature & Information Service are summarized. The impact of ChatGPT is elaborated on six aspects: data organization, knowledge service, information analysis, literature utilization, team construction and service priorities. Based on the characteristics of Literature & Information Service, nine suggestions are put forward. [Conclusions] The essence of the rapid breakthrough of AI technologies lies in the improvement of knowledge acquisition capability. Moreover, the success of ChatGPT proves that high-value corpus is the basis of all AI technologies. The Literature & Information Service field holds high-value data resources containing abundant human knowledge, which is of great importance and significance for AI technologies. ChatGPT focuses on content generation, while Literature & Information Service focuses on evidence-based work. Literature & Information Service should actively respond to and expand AI technologies to comply with the advancement of the era of AI and contribute the wisdom and solutions.

Reference | Related Articles | Metrics

Select

ChatGPT Performance Evaluation on Chinese Language and Risk Measures

Zhang Huaping, Li Linhan, Li Chunjin

Data Analysis and Knowledge Discovery 2023, 7 (3): 16-25. DOI: 10.11925/infotech.2096-3467.2023.0214
Online available: 16 March 2023

Abstract （1341）

HTML （75）

PDF （798KB）（1418）

Save

[Objective] This paper briefly introduces the main technical innovations of ChatGPT, and evaluates the performance of ChatGPT in Chinese on four tasks using nine datasets, analyzes the risk with ChatGPT and proposes our solutions. [Methods] ChatGPT and WeLM models were tested using the ChnSentiCorp dataset, and ChatGPT and ERNIE 3.0 Titan were tested using the EPRSTMT dataset, and it was found that ChatGPT did not differ much from the large domestic models in sentiment analysis tasks. The LCSTS and TTNews datasets were used to test the ChatGPT and WeLM models, and both ChatGPT outperformed the WeLM model; CMRC2018 and DRCD were used for extractive machine reading comprehension(MRC), and the C³ dataset was used for common sense MRC, and it was found that ERNIE 3.0 Titan outperformed ChatGPT in this task. WebQA and CKBQA were used to do Chinese closed-book quiz testing, and it was found that ChatGPT was prone to make factual errors in this task, and the domestic model outperformed ChatGPT. [Results] ChatGPT performed well on classic tasks of natural language processing, such as sentiment analysis with an accuracy rate of more than 85% and a higher probability of factual errors on closed-book questions. [Limitations] The error of evaluation score may be introduced in the process of converting discriminative tasks into generative ones. This paper only evaluated ChatGPT in zero-shot case, so it is not clear how it performs in other cases. ChatGPT may be updated iteratively in subsequent releases, and the profiling results may be time-sensitive. [Conclusions] ChatGPT is powerful but still has some drawbacks, for the large model of Chinese need to be national strategy oriented and pay attention to the limitations of the language model.

Table and Figures | Reference | Related Articles | Metrics

Select

Cross-Lingual Sentiment Analysis: A Survey

Xu Yuemei, Cao Han, Wang Wenqing, Du Wanze, Xu Chengyang

Data Analysis and Knowledge Discovery 2023, 7 (1): 1-21. DOI: 10.11925/infotech.2096-3467.2022.0472

Abstract （868）

HTML （65）

PDF （1427KB）（762）

Save

[Objective] This paper teases out the research context of cross-lingual sentiment analysis (CLSA). [Coverage] We searched “TS=cross lingual sentiment OR cross lingual word embedding” in Web of Science database and 90 representative papers were chosen for this review. [Methods] We elaborated the following CLSA methods in detail: (1) The early main methods of CLSA, including those based on machine translation and its improved variants, parallel corpora or bilingual sentiment lexicon; (2) CLSA based on cross-lingual word embedding; (3) CLSA based on Multi-BERT and other pre-trained models. [Results] We analyzed their main ideas, methodologies, shortcomings, etc., and attempted to reach a conclusion on the coverage of languages, datasets and their performance. It is found that although pre-trained models such as Multi-BERT have achieved good performance in zero-shot cross-lingual sentiment analysis, some challenges like language sensitivity still exist. Early CLSA methods still have some inspirations for existing researches. [Limitations] Some CLSA models are mixed models and they are classified according to the main methods. [Conclusions] We look into the future development of CLSA and the challenges facing the research area. With in-depth research of pre-trained models on multi-lingual semantics, CLSA models fit for more and wider languages will be the future direction.

Table and Figures | Reference | Related Articles | Metrics

Select

Review of Methods for Interdisciplinary Topic Identification

Li Jialei, An Peijun, Xiao Xiantao

Data Analysis and Knowledge Discovery 2023, 7 (4): 1-15. DOI: 10.11925/infotech.2096-3467.2022.0687
Accepted: 30 August 2022
Online available: 09 November 2022

Abstract （820）

HTML （29）

PDF （1013KB）（432）

Save

[Objective] This paper summarizes various methods for interdisciplinary topic identification through a literature review and tries to find shortcomings with potential improvements. [Coverage] We retrieved 74 articles on the concepts and methods of interdisciplinary topic identification from the CNKI and Web of Science databases. [Methods] Based on clarifying the concepts of “interdisciplinarity” and related terms, this paper reviewed the method for interdisciplinary topic identification from three perspectives: recognition based on external characteristics, recognition based on internal features, and a combination of both. [Results] There are still some deficiencies in the existing methods, such as limited data source and identification corpus, insufficient semantics of identification method, coarse identification granularity, a lack of interdisciplinary measurement indicators at the subject level, as well as a lack of forward-looking and dynamic exploration in the identification results. [Limitations] We mainly selected representative literature and did not provide an in-depth exploration of the technical details of interdisciplinary topic identification. We did not review the study of interdisciplinary literature discovery. More research is needed to expand the application of trend tracking and subject clustering in interdisciplinary topic identification. [Conclusions] Future research should expand the identification methods based on multi-source data or full text, improve the semantic mining ability, conduct fine-grained identification, build multi-dimensional interdisciplinary topic measurement indices, and strengthen research on potential interdisciplinary topics and dynamic trends.

Table and Figures | Reference | Related Articles | Metrics

Select

Review of Semantic Representation of Experimental Protocols at Process-Level

Fu Yun, Liu Xiwen, Zhu Liya, Han Tao

Data Analysis and Knowledge Discovery 2023, 7 (8): 1-16. DOI: 10.11925/infotech.2096-3467.2023.0335
Accepted: 25 June 2023
Online available: 13 September 2023

Abstract （745）

HTML （54）

PDF （5477KB）（342）

Save

[Objective] This paper explores the research progress of the process-level semantic representation of experimental protocols. It aims to discover the key issues to be addressed and identify development trends. [Coverage] We used related topics to retrieve the relevant literature from Web of Science, arXiv, Engineering Village, CNKI, Wanfang, and VIP. We also examined the requirements of the submission requirements and evaluation principles of renowned journals on experimental protocols. [Methods] First, we defined the concepts of experimental protocols and their semantic representation at the process-level. Then, we examined the representation methods, representation element extraction, and application of representative data. [Results] The research on process-level semantic representation is in the early development stages. The representation framework was not unified, and the elements were different. The experimental protocols were mainly written in natural language, which were difficult to extract the representation elements automatically. Some studies explored the application of process-level semantic representation data, which leaves more knowledge gaps to be filled. [Limitations] This paper does not thoroughly discuss the technical details of extracting representation elements from literature and data application methods. [Conclusions] We need to establish a unified representation framework for more complete elements by integrating various representation methods. We should also explore automatic extraction methods based on advanced intelligent technology and application using the process-level semantic representation data.

Table and Figures | Reference | Related Articles | Metrics

Select

Review of Early Warning for Online Public Opinion

Di Luyang, Zhong Han, Shi Shuicai

Data Analysis and Knowledge Discovery 2023, 7 (8): 17-29. DOI: 10.11925/infotech.2096-3467.2022.0866
Accepted: 02 February 2023
Online available: 22 March 2023

Abstract （738）

HTML （38）

PDF （888KB）（382）

Save

[Objective] This paper summarizes the developments of early warning research for online public opinion. [Coverage] We searched the Web of Science and CNKI with related terms such as public opinion warning, online public opinion, and public opinion risks. A total of 52 articles representing the foundations of the disciplines and the development trends were selected for a comprehensive review. [Methods] We summarized the foundations of early warning studies from the perspective of online public opinion characteristics and risk evaluations. Then, we examined the status quo of current research on early warning for online public opinion. [Results] Currently, most research focuses on expert empowerment, machine learning, communication process, and sentiment analysis. All of them can accurately predict the risk level of online public opinion, which is significant to the online environment and social stability. [Limitations] More research is needed to review early warning technology. [Conclusions] The existing research does not provide universal concepts for online public opinion. The risk evaluation method needs to be improved. We should also establish authoritative and unified standards to compare the performance of different monitoring systems.

Table and Figures | Reference | Related Articles | Metrics

Select

Graph Databases for Complex Network Analysis

Liu Chunjiang, Li Shuying, Hu Hanlin, Fang Shu

Data Analysis and Knowledge Discovery 2022, 6 (7): 1-11. DOI: 10.11925/infotech.2096-3467.2021.1168
Accepted: 01 December 2021
Online available: 31 December 2021

Abstract （725）

HTML （43）

PDF （837KB）（539）

Save

[Objective] This paper systematically reviews the progress and trends of graph database research and applications for complex network analysis. [Coverage] We searched the Web of Science, Scopus, and CNKI database for Chinese and English literature. A total of 15 graph databases and open-source packages, 21 practical cases, and 14 research papers were retrieved. [Methods] First, we compared the mainstream graph database products from China and abroad. Then, we explored the latest solutions for complex network analysis, including algorithms (such as centrality, path finding, link prediction, and community detection), graph visualization, performance and related applications. [Results] The graph database has become an important analysis tool and research method for complex network analysis and big data mining. They also work closely with graph computing engines for complex network analysis. [Limitations] This paper only examined a few representative cases. [Conclusions] The graph database could effectively query, represent and analyze complex network data for their patterns or structures. Their presentation of multi-dimensional data is crucial for mining implicit relationships.

Table and Figures | Reference | Related Articles | Metrics

Select

Mining Online User Profiles and Self-Presentations: Case Study of NetEase Music Community

Wu Jiang, Liu Tao, Liu Yang

Data Analysis and Knowledge Discovery 2022, 6 (7): 56-69. DOI: 10.11925/infotech.2096-3467.2021.1449

Abstract （717）

HTML （42）

PDF （1742KB）（488）

Save

[Objective] This paper explores patterns, evolutionary laws, group differences and influences on community recognition of online users’ self-presentation topics. [Methods] Firstly, we identified online users of NetEase music community and constructed their profiles from the perspectives of qualification and participation. Then, we adopted the BERT model to cluster users’ short comments, and identified their self-presentation topics. Third, we utilized cosine similarity to analyze the evolution of topics and group differences. Finally, we used covariance to analyze the impacts of self-presentation topics on community recognition. [Results] There are eight self-presentation topics, while the proportion of “reviews” decreased and “recollection” increased. “Interaction”topics were more popular in “relax” style than in others. The proportion of each topic at different time was almost the same. Under the themes of “recollection”, the cosine similarity value of quality users was higher than those of other users. The cosine similarity of continuous participants was higher than those of the inactive participants. The impact of users’ self-presentation topics on their community recognition was significant at the 0.1 level. [Limitations] More research is needed to examine users of other online communities. [Conclusions] “Recollection” is the most popular one among users’ self-presentation topics, which are affected by styles and time. There was a diversity trend for the topics with the development of the community, as well as obvious differences among user groups.

Table and Figures | Reference | Related Articles | Metrics

Select

ChatGPT-Based Scientific Paper Entity Recognition: Performance Measurement and Availability Research

Zhang Yingyi, Zhang Chengzhi, Zhou Yi, Chen Bikun

Data Analysis and Knowledge Discovery 2023, 7 (9): 12-24. DOI: 10.11925/infotech.2096-3467.2023.0474
Accepted: 12 June 2023
Online available: 12 September 2023

Abstract （703）

HTML （35）

PDF （1911KB）（649）

Save

[Objective] This paper aims to use a large language model for entity recognition tasks of academic papers. [Methods] We utilized ChatGPT, a large language model, as an entity recognition tool, a pseudo-label generation tool, and a training set generation tool. Then, we analyzed ChatGPT’s performance, price, and time for the tasks. [Results] The F1 of the ChatGPT-based method in all three perspectives is higher than that of the neural network baseline model trained with a small dataset. For example, the F1 from the perspective of entity recognition was 21.4% higher than the model trained by manually annotating 10 abstracts. The ChatGPT-based methods had stable performance on academic paper datasets in different disciplines. [Limitations] We only examined the new method with English academic paper abstract datasets. More research is needed to examine it with the Chinese datasets. [Conclusions] ChatGPT can identify entities from academic paper abstracts with little manually annotated data. The recognition results need to be further filtered to be applied to downstream tasks.

Table and Figures | Reference | Related Articles | Metrics

Select

Analyzing Divergence of Multi-layer Sentiment for Online Public Opinion Events

Hua Wei, Wu Siyang, Yu Chao, Wu Jiexun, Xu Jian

Data Analysis and Knowledge Discovery 2023, 7 (4): 16-31. DOI: 10.11925/infotech.2096-3467.2022.0370

Abstract （679）

HTML （25）

PDF （2395KB）（345）

Save

[Objective] This paper provides a new model to analyze public opinion from the perspective of sentiment divergence, aiming to address online public opinion events effectively. [Methods] First, we introduced the concept of sentiment disagreement and proposed a multi-level sentiment disagreement algorithm. Then, we constructed a multi-level sentiment disagreement analysis model for online opinion events. This model could calculate sentiment values and disagreement for the online opinion event, comment object, and user layers to perform correlation analysis among the three layers. [Results] Introducing sentiment disagreement can compensate for the lack of research on netizens’ opinion divergence in the original sentiment analysis. This model can identify the critical nodes of public opinion events and the comments generating significant controversy. It also evaluates the effectiveness of public opinion guidance and locates the causes of controversies. [Limitations] We only retrieved the needed data from Sina Weibo (Microblog). More research is needed to collect data from social platforms like Douban and Zhihu. [Conclusions] The proposed model can be applied to monitor the key nodes of public opinion events, select different public opinion guidance methods based on the reasons for controversies, and evaluate the effectiveness of public opinion guidance.

Table and Figures | Reference | Related Articles | Metrics

Select

Constructing TCM Knowledge Graph with Multi-Source Heterogeneous Data

Zhai Dongsheng, Lou Ying, Kan Huimin, He Xijun, Liang Guoqiang, Ma Zifei

Data Analysis and Knowledge Discovery 2023, 7 (9): 146-158. DOI: 10.11925/infotech.2096-3467.2022.0893
Accepted: 03 December 2022
Online available: 21 March 2023

Abstract （677）

HTML （40）

PDF （2073KB）（385）

Save

[Objective] This paper constructs a knowledge graph for Traditional Chinese Medicine(TCM) with multi-source heterogeneous data. It supports research innovation in TCM.[Methods] First, we obtained the TCM patents from the IncoPat database. We retrieved the targets and disease data from the Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform(TCMSP) and Online Mendelian Inheritance in Man (OMIM). Then, we extracted the entity and relationship of TMC patents with the deep learning information joint extraction model. We also used string matching and dictionaries to finish the data specification and entity alignment. Third, we constructed the TCM knowledge graph based on the ontology structure we designed. Finally, we analyzed the optimization of TCM prescriptions with the frequency analysis and Apriori algorithm. [Results] The ontology structure designed in this paper contains 31 entity types and 48 semantic relationships, covering specific entities such as solutions and technical effects in TCM patents. We examined the effectiveness of the knowledge graph and the efficiency of optimizing prescriptions with the diabetic nephropathy data. [Limitations] It took us a long time to manually annotate some samples to extract textual information.[Conclusions] The knowledge graph constructed in this paper provides data support for TCM research. It also benefits prescription optimization and realizes multivariate research in TCM.

Table and Figures | Reference | Related Articles | Metrics

Select

Evaluating Privacy Policy for Mobile Health APPs with Machine Learning

Zhao Yang, Yan Zhouzhou, Shen Qiqi, Li Zhonghang

Data Analysis and Knowledge Discovery 2022, 6 (5): 112-126. DOI: 10.11925/infotech.2096-3467.2021.0897

Abstract （652）

HTML （25）

PDF （1486KB）（461）

Save

[Objective] This paper analyzes privacy policies for mobile health APPs in China with machine learning, aiming to improve the efficiency and accuracy of compliance evaluation. [Methods] First, we constructed the evaluation system for the privacy policy compliance of mobile health APPs according to relevant policies and regulations. Then, based on the hard voting classifier, we established the compliance evaluation model integrating three machine learning algorithms: CNN, RNN and LSTM. Finally, we examined our model using 1210 mobile health APPs from the Android APP market, and evaluated the compliance of their privacy policies. [Results] The overall compliance of the privacy policies for mobile health APPs was poor. There are many violations in the six evaluation criteria. The compliance scores of online medical APPs, medical service APPs, health management APPs, and medical information APPs were 0.63, 0.59, 0.61and 0.66. [Limitations] Due to the limited amount of annotated privacy policy data, the proposed model may not be able to fully learn the features of evaluation indicators. [Conclusions] This proposed model could conduct large-scale, fine-grained automatic evaluation of the compliance of APPs privacy policies. It also provides new ideas and methods for the government agencies and APP operators to improve decision making.

Table and Figures | Reference | Related Articles | Metrics

Select

Review of Studies Identifying Disruptive Technologies

Zhang Jinzhu, Wang Qiuyue, Qiu Mengmeng

Data Analysis and Knowledge Discovery 2022, 6 (7): 12-31. DOI: 10.11925/infotech.2096-3467.2022.0142

Abstract （601）

HTML （27）

PDF （5497KB）（290）

Save

[Objective] This paper reviews the literature identifying disruptive technologies, aiming to examine research topics and development trends, as well as establish a framework for further studies. [Coverage] We searched Chinese and English papers from CNKI and Web of Science with relevant keywords. We retrieved 1 974 papers published between 2011 and 2020 for quantitative analysis, and 61 papers published between 2001 and 2020 for qualitative analysis. [Methods] First, we identified the popular topics and development trends through quantitative analysis. Then, we examined the highly cited papers and the latest literature to review their research methods. Finally, we built a framework based on the results of quantitative and qualitative analysis which also predicted future trends. [Results] Studies identifying disruptive technologies were more popular in the fields of information technology, medical treatment, chemical industry, and high-end manufacturing. They included multiple-methodology from the perspectives of technologies themselves, products, sci-tech information mining, and external environment. We established three frameworks for disruptive technology identification and explored some future developments. [Limitations] More research on macro indicators, such as society- and economy-related issues, need to be reviewed comprehensively. [Conclusions] The research on disruptive technology identification has become inter-disciplinary, which include more quantitative methodology and the nonlinear algorithms based on deep learning.

Table and Figures | Reference | Related Articles | Metrics

Select

The Inspiration Brought by ChatGPT to LLM and the New Development Ideas of Multi-modal Large Model

Zhao Chaoyang, Zhu Guibo, Wang Jinqiao

Data Analysis and Knowledge Discovery 2023, 7 (3): 26-35. DOI: 10.11925/infotech.2096-3467.2023.0216

Abstract （587）

HTML （28）

PDF （1583KB）（486）

Save

[Objective] This paper analyzes the basic technical principles of ChatGPT, and discusses its influence on the development of large language model and the development of multi-modal pretrained model. [Methods] By analyzing the development process and technical principles of ChatGPT, this paper discusses the influence of model building methods such as instruct fine-tuning, data acquisition and annotation, and reinforcement learning based on human feedback on the large language model. At the same time, this paper analyzes several key scientific problems encountered in the construction of multi-modal model, and discusses the future development of multi-modal pretrained model by referring to ChatGPT’s technical scheme. [Conclusions] The success of ChatGPT provides a good reference technology path for the development of pretrained fundamental model to downstream tasks. In the future construction of multi-modal large model and the realization of downstream tasks, we can make full use of high-quality instruction fine-tuning and other technologies to significantly improve the performance of downstream tasks.

Table and Figures | Reference | Related Articles | Metrics

Select

Review of Structural Diversity Studies on Social Networks

Lu Yingjie, Zhang Yinglong

Data Analysis and Knowledge Discovery 2022, 6 (8): 1-11. DOI: 10.11925/infotech.2096-3467.2021.1358

Abstract （585）

HTML （40）

PDF （1235KB）（328）

Save

[Objective] This paper reviews the latest developments of the structural diversity studies on social networks and discusses their future directions. [Coverage] We searched the Web of Science, Microsoft Academic, DBLP, CNKI, Wanfang Data and VIP with “Structural Diversity”, “Structural Diversity and Social Networks ”. A total of 55 representative and related literature published from April 2012 to April 2022 were retrieved. [Methods] First, we traced to the source of structural diversity studies and analyzed their existing issues. Then, we examined the structural diversity research from three perspectives: model improvements, efficient algorithms, and practical applications. Finally, we discussed future works. [Results] Structural diversity is a model based on network topology features, which studies factors affecting individuals’ major decision makings. The original model has the bad universality and low precision issues. Combined with graph mining technology, the new model performs well and has been applied in many fields. [Limitations] We only summarized research on structural diversity and did not compare them with other social contagion theories. [Conclusions] Graph mining algorithm could improve the structural diversity model in group division. Structural diversity is an indicator for finding highly influential nodes and required by efficient search algorithms. Structural diversity has been applied in the fields of behavior and link predictions. Features optimizing this model merit more research to evaluated their performance.

Table and Figures | Reference | Related Articles | Metrics

Select

Trending Topics on Metaverse: A Microblog Text Analysis with BERT and DTM

He Chaocheng, Huang Qian, Li Xinru, Wang Chunying, Wu Jiang

Data Analysis and Knowledge Discovery 2023, 7 (9): 25-38. DOI: 10.11925/infotech.2096-3467.2022.1128
Accepted: 27 May 2023
Online available: 12 September 2023

Abstract （584）

HTML （43）

PDF （3934KB）（501）

Save

[Objective] This paper comprehensively examines the evolution of public opinion triggered by the metaverse concepts, which provides insights for metaverse-related policies and industry planning. [Methods] We retrieved Weibo textual posts on metaverse-related from September 2021 to February 2023. Then, we utilized BERT and DTM models to extract semantic and topic features. Third, we employed the K-means algorithm for topic clustering and explored their evolutionary patterns. [Results] The public attention on the metaverse originated around NFTs and gaming. With capital speculation within the digital industry, the entertainment and physical industries joined the race. The emergence of ChatGPT further prompted the public’s exploration of the status quo of the metaverse, technology innovation, and prospective applications. [Limitations] We did not include foreign language data from Twitter to compare the focus and trends of the metaverse topics among domestic and international users. [Conclusions] This study examines the characteristics and evolution of social attention on topics related to the meta-universe from quantitative and macro perspectives. It helps us regulate online public opinion in the meta-universe.

Table and Figures | Reference | Related Articles | Metrics

Select

Exploring the Generation Mechanism of User’s Danmaku Commenting Behavior in Reaction Videos——Based on Cognitive-Affective Personality System Theory

Ye Xujie, Zhao Yuxiang, Zhang Xuanhui

Data Analysis and Knowledge Discovery 2023, 7 (2): 1-14. DOI: 10.11925/infotech.2096-3467.2022.0968

Abstract （582）

HTML （35）

PDF （1432KB）（581）

Save

[Objective] This paper aims at exploring the underlying reasons for the generation of users’ danmaku commenting behavior in reaction videos, and contributing to the literature on value co-creation in the content creation of danmaku videos. [Methods] This paper takes reaction videos of the Bilibili video website as examples. By selecting the danmaku resources of 11 popular videos in different camps as samples, we conduct open coding using the grounded theory approach. Based on the Cognitive-Affective Personality System Theory (CAPS) framework, this paper builds a theoretical model of the generation mechanism of user’s danmaku commenting behavior in reaction videos. [Results] The results suggest that the user’s danmaku commenting behavior in reaction videos based on CAPS theory generally follows the path of “situation-Cognitive-Affective Units -behavior”. In addition, users’ knowledge accumulation will also directly affect danmaku commenting behavior. [Limitations] The model constructed using grounded theory may have subjective bias. There is a pressing need to test the generalizability of the model based on the further analysis of large sample reaction videos. [Conclusions] This model yields implications for promoting the dissemination of emerging digital content, and sheds light on the value-added, value transformation and value co-creation in the content creation of danmaku video.

Table and Figures | Reference | Related Articles | Metrics

Select

A Deep Reinforcement Learning Recommendation Model with Multi-modal Features

Pan Huali, Xie Jun, Gao Jing, Xu Xinying, Wang Changzheng

Data Analysis and Knowledge Discovery 2023, 7 (4): 114-128. DOI: 10.11925/infotech.2096-3467.2022.0479
Accepted: 30 August 2022
Online available: 09 November 2022

Abstract （582）

HTML （26）

PDF （1821KB）（436）

Save

[Objective] This paper addresses data sparsity and dynamic changes in user interests with multimodal feature fusion and deep reinforcement learning. [Methods] First, we used a pre-trained model and attention mechanism to achieve intra-modal representation and fusion of three modalities. Then, we created a model for user-item interactions. Finally, we utilized the deep reinforcement learning algorithm to capture user interest drift and long and short-term rewards in real time to achieve personalized recommendations. [Results] Compared with the highest value in the baseline models, the proposed model improved precision@5 by 11.8%, 16.5%, 11.4%, and NDCG@5 by 5.3%, 8.0%, 6.4%, on the MovieLens-1M, MovieLens-100K, and Douban datasets, respectively. [Limitations] The user interaction history in the Douban dataset is relatively small, and the proposed model cannot learn more accurate user preferences during training. Compared with the experiments on the MovieLens dataset, we received limited recommendation results. [Conclusions] The proposed model integrates multimodal information to reconstruct the state representation network of deep reinforcement learning, improving the recommendation effect.

Table and Figures | Reference | Related Articles | Metrics

Top Read Articles