Data Analysis and Knowledge Discovery

Select

Investigating the Influence of Interdisciplinary Knowledge Integration on High-Value Patent

Hou Jianhua, Deng Xianjiang, Tang Shiqi

Data Analysis and Knowledge Discovery. 2025, 9(3): 69-82. https://doi.org/10.11925/infotech.2096-3467.2024.0353

Abstract (399) PDF (139) HTML (305)

Knowledge map

Save

[Objective] This study aims to explore the influence of interdisciplinary knowledge integration on the emergence of high-value patents and to delineate their distinctive characteristics. [Methods] High-value patents are operationalized as patents that receive the China Patent Gold Award. Interdisciplinary knowledge integration is quantified by two dimensions: IPC classification and patent knowledge units. Regression analysis investigates the effects of interdisciplinary knowledge integration, measured by these two dimensions, on both patent award status and individual patent value dimensions. [Results] The analysis reveals that high-value patents tend to exhibit a narrower interdisciplinary scope in terms of IPC classification, while simultaneously demonstrating a more diverse knowledge structure. In particular, interdisciplinary knowledge integration, when indicated by IPC classification, shows an inverted U-shaped relationship with patent value. Conversely, interdisciplinary knowledge integration, when indicated by knowledge units, shows a negative correlation with patent value. [Limitations] This study is limited by its reliance on the China Patent Gold Award as the sole proxy for high-value patents, which may not fully encompass the multifaceted nature of high-value patent characteristics. [Conclusions] This research provides valuable insights into the proactive identification and protection of high-value patents. Furthermore, the findings inform strategies to enhance upstream patent quality control and to facilitate effective patent translation and commercial utilization.

Select

Participation Motivations of Skill Crowdsourcing Service Providers Based on Self-Determination Theory: An Empirical Analysis of Platform Data

Wang Xiaolun, Yao Qian, Lin Jiahui, Zhao Yuxiang, Sun Zhihao, Lin Xinlan

Data Analysis and Knowledge Discovery. 2025, 9(1): 55-64. https://doi.org/10.11925/infotech.2096-3467.2024.0098

Abstract (382) PDF (150) HTML (318)

Knowledge map

Save

[Objective] Based on self-determination theory, this study explores the motivations of service providers to participate in tasks on skill crowdsourcing platforms. [Methods] We retrieved 15,641 bids and 2,385 service provider records from the epwk.com platform. We utilized the TF-IDF and the BERT to analyze text features and calculate motivation variables. Finally, we constructed a negative binomial regression model considering the dependent variables as count variables. [Results] The motivations and behaviors of service providers participating in skill crowdsourcing were significantly correlated at the 1% level (R²=23.10%). Task difficulty improved the model’s explanatory power, negatively moderating competence and reputation (p<0.05) while positively moderating social recognition (p<0.01). [Limitations] The representativeness is limited to a single platform. Future studies could collect data from multiple platforms for comparative validation. External factors such as platform dynamics and policy environments might interfere with the data, which should be considered in future research to deepen the conclusions. [Conclusions] This paper expands the theoretical foundation for service provider participation in crowdsourcing tasks and offers practical insights for service providers, buyers, and platforms.

Select

Supplementary Q&A Recommendation Based on Transfer Learning Enhanced Multi-Label Multi-Document Classifier

Li Ying, Li Ming

Data Analysis and Knowledge Discovery. 2024, 8(10): 89-99. https://doi.org/10.11925/infotech.2096-3467.2023.0683

Abstract (509) PDF (163) HTML (375)

Knowledge map

Save

[Objective] This paper proposes a recommendation method for supplementary question-and-answer (Q&A) based on a multi-label, multi-document Q&A classification model enhanced by transfer learning. It aims to identify and recommend supplementary answers in online Q&A communities. [Methods] We introduced new features alongside existing ones to classify the supplementary relationships between questions and answers. Then, we established a transfer learning-enhanced multi-label, multi-document classification model to identify and recommend supplementary answers. [Results] We conducted three meta-tasks on real datasets from the Zhihu community. The proposed method improves precision, recall, and F1 score by 48.29%, 15.75%, and 32.53%, respectively, on average. [Limitations] The method was only applied to health-related Q&A topics in Zhihu and has yet to be validated across different platforms or topics. [Conclusions] The proposed recommendation method effectively recommends supplementary answers. It helps users in Q&A communities obtain more comprehensive answers and promote knowledge utilization within the community.

Select

Predicting Overall Budget Performance Evaluation of Research Institutions

He Jun, Yu Jianjun, Rong Xiaohui

Data Analysis and Knowledge Discovery. 2024, 8(10): 136-145. https://doi.org/10.11925/infotech.2096-3467.2023.0645

Abstract (328) PDF (350) HTML (200)

Knowledge map

Save

[Objective] This paper aims to ensure the objectivity, timeliness, and accuracy of the overall budget performance evaluation of research institutions, and to improve the efficiency of performance evaluation work. [Methods] We proposed a method for predicting research institutions’ overall budget performance evaluation based on LightGBM. Our method integrates various data from scientific research management information systems. It uses machine learning algorithms to analyze and predict the overall budget performance evaluation results by correlating research inputs and outputs with performance. [Results] In the application of the overall budget performance evaluation of research institutions, the accuracy of the proposed method reached 94.12%. The human resources required for the budget performance evaluation process were reduced from 10 people to 5, and the time cost was shortened from 38 days to about 10 days. [Limitations] Some performance evaluation indicators are subjective and difficult to quantify using business data from scientific research management information systems. [Conclusions] The proposed method has excellent performance in predicting overall budget performance evaluation results. It reduces the fairness issues due to subjective evaluation, and saves the human resources and time costs in budget performance evaluation, thus improving their efficiency.

Select

CCI-ClipCap: A Chinese Ceramic Image Description Model Based on Prompt Paradigm

Shi Bin, Wang Hao, Liu Maolin, Deng Sanhong

Data Analysis and Knowledge Discovery. 2024, 8(10): 146-158. https://doi.org/10.11925/infotech.2096-3467.2023.0688

Abstract (403) PDF (225) HTML (297)

Knowledge map

Save

[Objective] This study aims to construct a Chinese Ceramic Image Description Model (CCI-ClipCap) to provide technical support for ceramic culture research and digital preservation. [Methods] Based on ClipCap, the prompt paradigm is introduced to improve the model’s understanding of cross-modal data, enabling automatic description of ceramic images. Additionally, we proposed a text similarity evaluation method tailored for structured textual representation. [Results] The CCI-ClipCap model improved the multi-modal fusion process with the prompt paradigm, effectively extracting information from ceramic images and generating accurate textual descriptions. Compared to baseline models, the Bleu and Rouge values increased by 0.04 and 0.14, respectively. [Limitations] The data used originated from the British Museum collections, not native Chinese datasets. This single-source data may affect the model’s performance. [Conclusions] The CCI-ClipCap model generates text with rich levels of expression, demonstrating a soild understanding of ceramic knowledge and exhibiting high professionalism.

Select

Link Prediction in Patent Citation Networks Based on Graph and Semantic Representation Learning

Hu Wei, Li Shuying, Zhang Xin, Yang Ning

Data Analysis and Knowledge Discovery. 2024, 8(10): 28-43. https://doi.org/10.11925/infotech.2096-3467.2024.0737

Abstract (63) PDF (44) HTML (49)

Knowledge map

Save

[Objective] This study optimizes a link prediction model in the patent citation network to enhance the analysis and prediction of technological evolution. It also further improves theories and methods related to technology diffusion. [Methods] We constructed a new framework for link prediction modeling (Graph-PatentBERT-RF) based on the characteristics of patent literature. First, we used the GraphSAGE model to obtain the vectorized representation of the training set’s patent citation network. In contrast, the PatentBERT model provides semantic representation vectors of patent texts in four thematic dimensions. Then, these vectors were combined with other features to train a random forest model. Finally, we obtained the optimized link prediction probabilities in the patent citation network. [Results] An empirical study in quantum sensing demonstrated that the Graph-PatentBERT-RF model achieves optimal comprehensive prediction performance, with an F1-score over 2.2% higher than the baseline models. Our model also illustrated the nonlinear relationships and complex interactions across more than four levels among citation relationships, multidimensional technical text, and time lag features. [Limitations] The data preprocessing steps need further optimization to improve the model's performance. [Conclusions] The constructed model enhances the overall predictive performance of patent citation networks, providing an optimized solution to the current issue of incomplete citation data, and contributes to the development of various applications in technology evolution analysis based on citation networks.

Select

Automatic Detection Model for “Paper Mills”

Hu Tianyi, Liu Jianhua, E Haihong, Ding Junpeng, Qiao Xiaodong

Data Analysis and Knowledge Discovery. 2024, 8(10): 125-135. https://doi.org/10.11925/infotech.2096-3467.2023.0937

Abstract (53) PDF (37) HTML (43)

Knowledge map

Save

[Objective] This study explores feature models for the automated detection of articles by “paper mills” across multiple dimensions. It aims to provide critical support for the governance of research integrity and quality control of academic publishing in China. [Methods] We retrieved retraction records and associated data resources of “paper mills” articles from websites like Retraction Watch to construct the first open dataset for training and evaluating the automated detection model for paper mills. We developed a classification model for “paper mill” papers (RWTA-Model) using a text random walk strategy and text attention mechanism. We modeled 33 grammatical features of “paper mills”. Finally, we used the SHAP method to identify significant features automatically. [Results] The F1 scores based on title structure features, abstract structure features, and main text structure features reached 0.7669, 0.8423, and 0.8480, respectively. For the three types of article structure data, the proposed method achieved the best results when compared to various baseline methods and identified 12 significant grammatical features. [Limitations] The supporting feature construction dataset primarily focuses on the biomedical field, presenting a potential risk of domain bias. [Conclusions] The constructed classification model based on title, abstract, and main text structures, and the 33-dimensional automatic detection feature model, can effectively identify “paper mill” papers and uncover multidimensional features, supporting the automated detection of papers from paper mills.

Select

Conflict Identification, Classification and Differentiation Analysis Based on Public Appeals：Case Study of Pension Insurance Disputes

Qiu Jiangnan, Xu Xuedong, Lu Yanxia, Yang Zhilong

Data Analysis and Knowledge Discovery. 2025, 9(2): 106-119. https://doi.org/10.11925/infotech.2096-3467.2023.1371

Abstract (97) PDF (109) HTML (54)

Knowledge map

Save

[Objective] This paper identifies and classifies issues from public appeals. It also explores regional differences in issue types and response rates. [Methods] Taking pension insurance disputes as an example, the ERNIE model was enhanced with knowledge and data through domain-specific vocabulary construction, key appeal content extraction, and simple data augmentation. An ERNIE-BiLSTM contradiction identification and classification model was developed to deeply analyze contradictions in public appeals in low-data-resource scenarios, addressing existing studies’ lack of quantitative methods for social contradiction analysis. Finally, a differentiation analysis of contradictions was conducted based on the classification results. [Results] During the data collection period, pension insurance payment-related conflicts were more frequent in Henan and Liaoning provinces, while pension insurance service-related conflicts were more prevalent in Guangdong Province and Beijing. Significant differences in response rates were observed across different types of contradictions. [Limitations] This paper does not consider the correlation between different types of conflicts. [Conclusions] This paper reveals the inter-provincial differences in pension insurance disputes, providing governments with insights into hotspots and trends to assist in decision-making.

Select

Revisiting Deep Learning-based Rumor Detection Models with Interpretable Tools

He Guoxiu, Ren Jiayu, Li Zongyao, Lin Chenxi, Yu Haiyan

Data Analysis and Knowledge Discovery. 2024, 8(4): 1-13. https://doi.org/10.11925/infotech.2096-3467.2023.0684

Abstract (762) PDF (675) HTML (399)

Knowledge map

Save

[Objective] This study explores whether content-based deep detection models can identify the semantics of rumors. [Methods] First, we use the BERT model to identify the key features of rumors from benchmark datasets in Chinese and English. Then, we utilized two interpretable tools, LIME, based on local surrogate models, and SHAP, based on cooperative game theory, to analyze whether these features can reflect the nature of rumors. [Results] The key features calculated by the interpretable tools on different models and datasets showed significant differences, and it is challenging to decide the semantic relationship between the features and rumors. [Limitations] The datasets and models examined in this study need to be expanded. [Conclusion] Deep learning-based rumor detection models only work with the features of the training set and lack sufficient generalization and interpretability for diverse real-world scenarios.

Select

Automatic Multi-Label Classification of South China Sea Maps Based on AlexNet Model

Qi Xiaoying, Li Hanyu, Yang Haiping

Data Analysis and Knowledge Discovery. 2024, 8(4): 76-87. https://doi.org/10.11925/infotech.2096-3467.2023.0081

Abstract (319) PDF (377) HTML (131)

Knowledge map

Save

[Objective] This paper aims to achieve multi-semantic classification of maps and meet the needs for precise map retrieval and intelligence analysis. [Methods] We designed a map category system and proposed a multi-label map classification strategy. It realized the automatic classification of South China Sea maps based on the AlexNet convolution neural network classification model. [Results] The F1 value of the proposed model is 0.979. This model can effectively realize the multi-label automatic classification of the South China Sea maps. [Limitations] The deep categories of multi-label annotated datasets need to be supplemented. [Conclusions] This paper provides a reference for the semantic-based scientific classification of maps, precise retrieval, and cross-category association.

Select

Research on Cross-Type Text Classification Technology Based on Multi-Task Learning

Song Donghuan, Hu Maodi, Ding Jielan, Qu Zihao, Chang Zhijun, Qian Li

Data Analysis and Knowledge Discovery. 2025, 9(2): 12-25. https://doi.org/10.11925/infotech.2096-3467.2023.0885

Abstract (567) PDF (198) HTML (374)

Knowledge map

Save

[Objective] This study addresses the issue of low classification accuracy in conventional text classification tasks due to factors such as sparse domain-specific training data and significant differences between types. [Methods] We constructed a novel classification model based on the BERT-DPCNN-MMOE framework, integrating the deep pyramid convolutional networks with the multi-gate control unit mechanism. Then, we designed multi-task and transfer learning experiments to validate the effectiveness of the new model against eight well-established and innovative models. [Results] This research independently constructed cross-type multi-task data as the basis for training and testing. The BERT-DPCNN-MMOE model outperformed the other eight baseline models in multi-task and transfer learning experiments, with F1 score improvements exceeding 4.7%. [Limitations] Further research is needed to explore the model’s adaptability to other domains. [Conclusions] The BERT-DPCNN-MMOE model performs better in multi-task and cross-type text classification tasks. It is of significance for future specialized intelligence classification tasks.

Select

Identifying Core Technologies Based on the Commerce Control List-Patent Network Mapping: Case Study of Industrial Software

Zhu Yujing, Chen Fang, Wang Xuezhao

Data Analysis and Knowledge Discovery. 2024, 8(10): 1-13. https://doi.org/10.11925/infotech.2096-3467.2023.0699

Abstract (259) PDF (234) HTML (192)

Knowledge map

Save

[Objective] In response to Western technology export controls on China, this study proposes a method for identifying critical core technologies by mapping the U.S. Commerce Control List (CCL) to a patent-based dual-layer network. The goal is to provide a reference for selecting and prioritizing technology breakthrough directions. [Methods] The study integrates the CCL and patent data to build a dual-layer network consisting of a CCL-related network and a weighted patent citation network. We used a community detection algorithm to identify technology clusters in both layers and calculated the semantic similarity of inter-layer clusters to achieve automatic mapping. Using Word2Vec and the n-gram method, we extracted keywords from each cluster to represent technical topics. Finally, we identified the patent clusters with the highest similarity to the CCL clusters as critical core technologies. [Results] Empirical results in industrial software demonstrate that this method identifies 12 distinct patent clusters with the highest similarity to the CCL clusters, all of which have a similarity of over 0.85. They involve integrated circuit IP cores, precision measurement, process control, motion control, and turbine detection. Literature research has verified them as key core technologies in industrial software. [Limitations] The study only focused on industrial software for empirical research. The technical approach can be improved, and the identification results require further interpretation and analysis. [Conclusions] The proposed method efficiently and accurately identifies key core technology at a micro-level, features a high degree of automation, and is highly readable, providing significant practical application value.

Select

Citation Recommendation Using Heterogeneous Network Representation Learning and Attention Mechanism

Zhang Jinzhu, Sun Wenwen, Qiu Mengmeng

Data Analysis and Knowledge Discovery. 2024, 8(10): 14-27. https://doi.org/10.11925/infotech.2096-3467.2023.0724

Abstract (277) PDF (143) HTML (149)

Knowledge map

Save

[Objective] This study aims to expand the heterogeneous network in citation recommendations by including more nodes and relationships. It seeks to provide deep semantic representations and reveal how different relationships impact citation recommendations, ultimately improving the effectiveness of such recommendations. [Methods] By introducing semantic links, we constructed a heterogeneous network representation learning model incorporating an attention mechanism. This model generates deep semantic and structural representations, as well as similarity metrics for citation recommendations. We also conducted ablation experiments to explore the impact of different factors on citation recommendation. [Results] After introducing semantic links, the citation recommendation model’s AUC improved by 0.012. With the addition of a dual-layer attention mechanism, there was a further improvement of 0.079 in AUC. Compared to the baseline model CR-HBNE, the AUC and AP improved by 0.185 and 0.204, respectively. [Limitations] Manual selection of relationship paths is inefficient, and evaluating the recommendation results based on only two metrics is relatively simplistic. [Conclusions] The proposed method fully utilizes the complex associations and deep semantic information among citations, effectively improving citation recommendation performance.

Select

Extracting Few-Shot Relation Based on Prompt Ensemble

Xu Haoshuai, Hong Liang, Hou Wenjun

Data Analysis and Knowledge Discovery. 2024, 8(10): 66-76. https://doi.org/10.11925/infotech.2096-3467.2023.0973

Abstract (286) PDF (140) HTML (151)

Knowledge map

Save

[Objective] This paper addresses the challenge of constructing label mapping in prompt learning-based relation extraction methods when labeled data is scarce. [Methods] The proposed approach enhances prompt effectiveness by injecting relational semantics into the prompt template. Data augmentation is performed through prompt ensemble, and an instance-level attention mechanism is used to extract important features during the prototype construction process. [Results] On the public FewRel dataset, the accuracy of the proposed method surpasses the baseline model by 2.13%, 0.55%, 1.40%, and 2.91% in four few-shot test scenarios, respectively. [Limitations] The method does not utilize learnable virtual prompt templates in constructing prompt templates, and there is still room for improvement in the representation of answer words. [Conclusions] The proposed method effectively mitigates the problem of limited information and insufficient accuracy in prototype construction under few-shot scenarios, improving the model’s accuracy in few-shot relation extraction tasks.

Select

Can Phonetics and Orthography Effectively Enhance Chinese Character Representation?

Duan Yufeng, Zhang Meicong, Liu Yanzuo, He Guoxiu

Data Analysis and Knowledge Discovery. 2024, 8(10): 100-111. https://doi.org/10.11925/infotech.2096-3467.2023.0665

Abstract (288) PDF (100) HTML (163)

Knowledge map

Save

[Objective] This study aims to investigate the effectiveness of using phonetics and orthography features to enhance the representation of Chinese characters. [Methods] Based on the Named Entity Recognition (NER) task, we used a general embedding module, a bidirectional LSTM module, and a fully connected network with Softmax activation as the benchmark embedding layer, context encoding and decoding layers. Then, we compared the changes in Micro-F1 scores and entity-specific F1 scores after enhancing character embeddings with Chinese pinyin, images, Wubi input codes, Four-Corner codes, Cangjie codes, and radicals, using datasets such as MSRA, PeopleDaily, CCKS2017, Resume, and E-Commerce. [Results] Using phonetic and orthographic enhanced embeddings led to a performance decrease of nearly 0.01 in the MSRA and PeopleDaily datasets. At the same time, there was no statistically significant change in performance in the CCKS2017, Resume, and E-Commerce datasets. [Limitations] Using only 32×32 pixels images of Chinese simplified characters may affect the extraction of orthographic features. [Conclusions] While phonetic and orthographic features can enhance the representation of Chinese characters, they also introduce noise. They lead to varying impacts on model performance across different corpora and entities.

Select

Social Short Text Expansion Based on Two-Layer Heterogeneous Network

Wu Shufang, Wang Hongbin, Zhu Jie, Chen Ting

Data Analysis and Knowledge Discovery. 2024, 8(10): 77-88. https://doi.org/10.11925/infotech.2096-3467.2023.0703

Abstract (155) PDF (113) HTML (70)

Knowledge map

Save

[Objective] This paper aims to expand social short texts by leveraging heterogeneous relationships in social networks. It addresses the issues of fragmentation and the use of internet slang in social short texts. [Methods] First, we measured the unevenness of hotspot words in social information based on dispersion, which improved the TF-IDF method to obtain initial features. Then, we constructed a two-layer heterogeneous social network consisting of three sub-networks based on the heterogeneous relationships in social networks. Finally, the importance of users, text similarity, and user recognition of social texts are quantified to obtain multiple extended sources and expand social short texts. [Results] Compared with the existing short text feature expansion methods, the proposed model’s precision, recall, and F1 value improved by about 13%, 19%, and 18%, respectively. [Limitations] We did not consider the influence of indirect relationships on the construction of heterogeneous social networks is not considered. [Conclusions] Using the heterogeneous relationships in social networks can obtain more reasonable expansion sources and effectively expand social short texts.

Select

Semantic Discovery of Online Health Information Based on Improved CasRel Entity-Relationship Extraction Model

Cheng Quan, Jiang Shihui, Li Zhuozhuo

Data Analysis and Knowledge Discovery. 2024, 8(10): 112-124. https://doi.org/10.11925/infotech.2096-3467.2023.0638

Abstract (426) PDF (115) HTML (238)

Knowledge map

Save

[Objective] This paper aims to achieve semantic discovery and relation extraction from a large amount of complex user-generated information from an online healthcare platform. [Methods] First, we constructed a semantic discovery model for online health information based on an improved CasRel model. Then, we introduced the ERNIE-Health pre-trained model, which is more suitable for the healthcare domain, into the text encoding layer of the CasRel-based model. Finally, we used a multi-level pointer network in the entity and relation decoding layer to annotate and fuse subject features for relations and object decoding via neural networks. [Results] Compared to the original model, the improved CasRel entity-relation extraction model increased the F₁-scores of entity recognition and entity-relation extraction tasks for online health information semantic discovery by 7.62% and 4.87%, respectively. [Limitations] The overall effectiveness of the model still needs to be validated with larger datasets and empirical studies on health information from different disease types. [Conclusions] Three sets of comparative experiments validated the effectiveness of the improved CasRel entity-relation extraction model for online diabetes health information semantic discovery tasks.

Select

Text Sentiment Classification Algorithm Based on Prompt Learning Enhancement

Huang Taifeng, Ma Jing

Data Analysis and Knowledge Discovery. 2024, 8(3): 77-84. https://doi.org/10.11925/infotech.2096-3467.2023.0004

Abstract (884) PDF (509) HTML (440)

Knowledge map

Save

[Objective] This paper aims to improve the low accuracy of sentiment classification using the pre-trained model with insufficient samples.[Methods] We proposed a prompt learning enhanced sentiment classification algorithm Pe(prompt ensemble)-RoBERTa. It modified the RoBERTa model with integrated prompts different from the traditional fine-tuning methods. The new model could understand the downstream tasks and extract the text’s sentiment features. [Results] We examined the model on several publicly accessible Chinese and English datasets. The average sentiment classification accuracy of the model reached 93.2% with fewer samples. Compared with fine-tuned and discrete prompts, our new model’s accuracy improved by 13.8% and 8.1%, respectively. [Limitations] The proposed model only processes texts for the sentiment dichotomization tasks. It did not involve the more fine-grained sentiment classification tasks. [Conclusions] The Pe-RoBERTa model can extract text sentiment features and achieve high accuracy in sentiment classification tasks.

Select

A Multilingual Sentiment Analysis Model Based on Continual Learning

Zhao Jiayi, Xu Yuemei, Gu Hanwen

Data Analysis and Knowledge Discovery. 2024, 8(10): 44-53. https://doi.org/10.11925/infotech.2096-3467.2023.0714

Abstract (480) PDF (135) HTML (240)

Knowledge map

Save

[Objective] This study addresses the performance degradation due to catastrophic forgetting when multilingual models handle tasks in new languages. [Methods] We proposed a multilingual sentiment analysis model, mLMs-EWC, based on continual learning. The model incorporates continual learning into multilingual models, enabling it to learn new language features while retaining the linguistic characteristics of previously learned languages. [Results] In continual sentiment analysis experiments involving three languages, the mLMs-EWC model outperformed the Multi-BERT model by approximately 5.0% in French and 4.5% in English tasks. Additionally, the mLMs-EWC model was evaluated on a lightweight distilled model, showing an improvement of up to 24.7% in English tasks. [Limitations] This study focuses on three widely used languages, and further validation is needed to assess the model’s generalization capability to other languages. [Conclusions] The proposed model can alleviate catastrophic forgetting in multilingual sentiment analysis tasks and achieve continual learning on multilingual datasets.

Select

Aspect-Based Sentiment Analysis Based on PRM-GCN

Yu Bengong, Cao Chengwei

Data Analysis and Knowledge Discovery. 2024, 8(10): 54-65. https://doi.org/10.11925/infotech.2096-3467.2023.0722

Abstract (352) PDF (124) HTML (185)

Knowledge map

Save

[Objective] This paper aims to address the problem in current aspect-based sentiment analysis research, where the use of sentiment knowledge to enhance syntactic dependency graphs overlooks syntactic reachability and positional relationships between words and does not adequately extract semantic information. [Methods] We proposed an aspect-based sentiment analysis model based on a position-weighted reachability matrix and multi-space semantic information extraction. First, we used a reachability matrix to incorporate syntactic reachability relationships between words into the syntactic dependency graph, and we employed position-weighting to adjust the matrix to enhance contextual feature extraction. Then, we integrated the sentiment features with the enhanced dependency graph to extract aspect word features. Third, we use the multi-head self-attention mechanism combined with a graph convolutional network (GCN) to learn contextual semantic information from multiple feature spaces. Finally, we fused feature vectors containing positional information, syntactic information, affective knowledge, and semantic information for sentiment polarity classification. [Results] Compared to the best-performing models, the proposed model improved accuracy on the Lap14, Rest14, and Rest15 datasets by 1.00%, 1.25%, and 0.76%. When using BERT, the PRM-GCN- BERT model’s accuracy on the Lap14, Rest14, Rest15, and Rest16 datasets increased by 0.50%, 0.22%, 1.98%, and 0.31%. [Limitations] The proposed model was not applied to Chinese or other language datasets. [Conclusions] The proposed model enhances feature aggregation in graph convolutional networks, improves contextual feature extraction, and boosts semantic learning effectiveness, thereby significantly improving the accuracy of aspect-based sentiment analysis.

Select

A Review on Methods for Domain Knowledge Evolution Analysis

Li Xuesi, Zhang Zhixiong, Wang Yufei, Liu Yi

Data Analysis and Knowledge Discovery. 2024, 8(1): 1-15. https://doi.org/10.11925/infotech.2096-3467.2023.1280

Abstract (622) PDF (1998) HTML (239)

Knowledge map

Save

[Objective] Domain knowledge evolution analysis has been a long-standing research topic in the field of Library and Information Science. This paper provides a comprehensive review of the research methods related to the domain knowledge evolution analysis, both nationally and internationally, aiming to offer valuable references for future studies in this area. [Coverage] We conducted searches in CNKI and Web of Science using keywords related to domain knowledge evolution. The search results were manually evaluated and analyzed, and a total of 84 key literatures closely related to the methods of domain knowledge evolution analysis were selected for review. [Methods] By reviewing the research literature, we clarified the relevant concepts of domain knowledge evolution. Based on this, we classified the existing domain knowledge evolution analysis methods into three categories: citation-based, structure-based and content-based. For each category, we first elucidated the theoretical basis, then explained their basic analytical frameworks and highlighted the relevant advances. Finally, we summarized the existing methods of domain knowledge evolution analysis and provided perspectives. [Results] The three categories of existing methods for domain knowledge evolution analysis rely on their respective scientific theories. With the advancement of technology and the improvement of data resources, these methods are continuously deepening and improving the analytical framework for the study of evolution. Although significant research achievements have been made, there has been no breakthrough in the research perspective of knowledge evolution analysis, and the limitations within the current research paradigm remain unresolved. [Limitations] The review analysis was based on selected literature, which may not have comprehensively covered all relevant research. [Conclusions] Based on the summary and analysis of the current research, we believe that the following two directions are worth focusing on in the future research on domain knowledge evolution analysis: first, exploring new entry points for domain knowledge evolution analysis, and second, attempting to integrate existing research methods to improve the limitations of current analytical approaches.

Select

ULEO: Unified Language of Experiment Operations for Representation of Synthesis Protocols

Fu Yun, Zhu Liya, Li Dan, Sun Mengge, Zhang Jianfeng, Liu Xiwen

Data Analysis and Knowledge Discovery. 2024, 8(1): 30-39. https://doi.org/10.11925/infotech.2096-3467.2023.0867

Abstract (316) PDF (2284) HTML (79)

Knowledge map

Save

[Objective] This study addresses the unified representation issue of experimental operation verbs in synthetic experiment protocols, which provides high-quality experimental protocol data for science intelligence and robotics. [Methods] We utilized a collaborative approach driven by data and expert knowledge to identify and standardize experimental operation verbs from literature and patent texts related to synthesis. First, we used advanced open-source large models like ChatGLM2-6B to identify experimental operation verbs. Then, we combined Wu-Palmer and cosine similarity to standardize these verbs. Finally, we assessed their classification accuracy with expert knowledge. [Results] The study identified 149 operation verbs for inorganic synthetic experiments and 141 operation verbs for organic synthetic experiments. Expert judgment revealed that many of the 124 operation terms appearing in both groups do not possess distinct category characteristics. Therefore, we merged the two categories to have 166 experimental operation verbs representing the operations in organic, inorganic, and hybrid synthesis experiments. [Limitations] The study only employed basic prompt engineering techniques to direct the large model to recognize experimental operation verbs from publicly accessible datasets. This study focused on operation terms involved in synthesis, engineering, and basic steps without considering operation terms in dynamic, analytical, and name reactions. [Conclusions] This study establishes a unified language for representing experimental operations in synthesis, applicable to organic, inorganic, and hybrid synthesis reactions. It could inform the future development of scientific robotics experiments.

Highlights

Please choose a citation manager

Content to export

模态框（Modal）标题

Highlights

Please choose a citation manager

Content to export