• 2024
  • No.1
  • Published:25 January 2024
  • ISSN: 2096-3467
  • Directed by: Chiness Academy of Sciences
  • Sponsored by: National Science Library, Chinese Academy of Sciences
      25 January 2024, Volume 8 Issue 1 Previous Issue   
    For Selected: View Abstracts Toggle Thumbnails
    A Review on Methods for Domain Knowledge Evolution Analysis
    Li Xuesi, Zhang Zhixiong, Wang Yufei, Liu Yi
    2024, 8 (1): 1-15.  DOI: 10.11925/infotech.2096-3467.2023.1280
    Abstract   HTML ( 31 PDF(717KB) ( 183 )  

    [Objective] Domain knowledge evolution analysis has been a long-standing research topic in the field of Library and Information Science. This paper provides a comprehensive review of the research methods related to the domain knowledge evolution analysis, both nationally and internationally, aiming to offer valuable references for future studies in this area. [Coverage] We conducted searches in CNKI and Web of Science using keywords related to domain knowledge evolution. The search results were manually evaluated and analyzed, and a total of 84 key literatures closely related to the methods of domain knowledge evolution analysis were selected for review. [Methods] By reviewing the research literature, we clarified the relevant concepts of domain knowledge evolution. Based on this, we classified the existing domain knowledge evolution analysis methods into three categories: citation-based, structure-based and content-based. For each category, we first elucidated the theoretical basis, then explained their basic analytical frameworks and highlighted the relevant advances. Finally, we summarized the existing methods of domain knowledge evolution analysis and provided perspectives. [Results] The three categories of existing methods for domain knowledge evolution analysis rely on their respective scientific theories. With the advancement of technology and the improvement of data resources, these methods are continuously deepening and improving the analytical framework for the study of evolution. Although significant research achievements have been made, there has been no breakthrough in the research perspective of knowledge evolution analysis, and the limitations within the current research paradigm remain unresolved. [Limitations] The review analysis was based on selected literature, which may not have comprehensively covered all relevant research. [Conclusions] Based on the summary and analysis of the current research, we believe that the following two directions are worth focusing on in the future research on domain knowledge evolution analysis: first, exploring new entry points for domain knowledge evolution analysis, and second, attempting to integrate existing research methods to improve the limitations of current analytical approaches.

    Figures and Tables | References | Related Articles | Metrics
    Review of Interpretable Machine Learning for Information Resource Management
    Liu Zhifeng, Wang Jimin
    2024, 8 (1): 16-29.  DOI: 10.11925/infotech.2096-3467.2023.0244
    Abstract   HTML ( 21 PDF(1085KB) ( 265 )  

    [Objective] This paper systematically summarizes the research on interpretable machine learning methods and their applications for information resource management. It identifies possible areas of improvements, and provides insights for future research. [Coverage] We searched interpretable machine learning papers from CNKI and Web of Science. A total of 44 related articles were retrieved for review. [Methods] First, from the machine learning process, we constructed a general interpretable machine learning framework. Then, we thoroughly reviewed the classification of interpretable machine learning methods. Finally, we discussed the interpretable machine learning applications for information resource management. [Results] The general interpretable machine learning framework consists of three different modules: pre-explanation, explainable models, and post-explanation. Post-explanation methods have been widely applied in health informatics, online public opinion, scientometrics, and social network user behavior, with the help of commonly used methods including SHAP and feature importance analysis. Many existing research are lack of diversity and integration in applied methods, insufficient exploration of causal relationships, inadequate explanations for multi-source heterogeneous data, and the need for broadening domain applications. [Limitations] This review focuses on the applications and shortcomings of interpretable machine learning. It does not delve into the algorithm principles. [Conclusions] In future research, efforts should be made to strengthen the integration of interpretable machine learning methods, explore interpretable machine learning based on causal machine learning, introduce interpretable machine learning methods for multi-source heterogeneous data. We should also broaden applications in various domains such as information recommendation, information retrieval, and informetrics.

    Figures and Tables | References | Related Articles | Metrics
    ULEO: Unified Language of Experiment Operations for Representation of Synthesis Protocols
    Fu Yun, Zhu Liya, Li Dan, Sun Mengge, Zhang Jianfeng, Liu Xiwen
    2024, 8 (1): 30-39.  DOI: 10.11925/infotech.2096-3467.2023.0867
    Abstract   HTML ( 24 PDF(1769KB) ( 126 )  

    [Objective] This study addresses the unified representation issue of experimental operation verbs in synthetic experiment protocols, which provides high-quality experimental protocol data for science intelligence and robotics. [Methods] We utilized a collaborative approach driven by data and expert knowledge to identify and standardize experimental operation verbs from literature and patent texts related to synthesis. First, we used advanced open-source large models like ChatGLM2-6B to identify experimental operation verbs. Then, we combined Wu-Palmer and cosine similarity to standardize these verbs. Finally, we assessed their classification accuracy with expert knowledge. [Results] The study identified 149 operation verbs for inorganic synthetic experiments and 141 operation verbs for organic synthetic experiments. Expert judgment revealed that many of the 124 operation terms appearing in both groups do not possess distinct category characteristics. Therefore, we merged the two categories to have 166 experimental operation verbs representing the operations in organic, inorganic, and hybrid synthesis experiments. [Limitations] The study only employed basic prompt engineering techniques to direct the large model to recognize experimental operation verbs from publicly accessible datasets. This study focused on operation terms involved in synthesis, engineering, and basic steps without considering operation terms in dynamic, analytical, and name reactions. [Conclusions] This study establishes a unified language for representing experimental operations in synthesis, applicable to organic, inorganic, and hybrid synthesis reactions. It could inform the future development of scientific robotics experiments.

    Figures and Tables | References | Related Articles | Metrics
    Predicting User Churn of Smart Home-based Care Services Based on SHAP Interpretation
    Liu Tianchang, Wang Lei, Zhu Qinghua
    2024, 8 (1): 40-54.  DOI: 10.11925/infotech.2096-3467.2022.1168
    Abstract   HTML ( 18 PDF(1914KB) ( 112 )  

    [Objective] This study constructs a user churn prediction model for smart home-based care services. It utilizes the SHAP interpretation method to analyze the impact of different features on user churn. [Methods] First, we retrieved more than 300,000 community home-based care service orders from 2019 to 2021. Then, we incorporated the RFM model (RFM-MLP), the Maslow’s hierarchy of demand theory, the Anderson model, and the Boruta algorithm to identify 11 characteristics across three categories: user values, service selections, and individual features. Third, we chose the XGBoost model from the five established machine learning models for the best performance in predicting user churn. Finally, we employed the SHAP interpretation method to examine the feature impact, dependence, and single-sample analysis. [Results] The predictive model achieves high accuracy and F1 score of approximately 87%. Noteworthy features for predicting user churn on smart home-based care services include domestic service purchase numbers, use length, and user age. [Limitations] Our data was from a single region. The data quality and algorithm complexity could be improved in the future. [Conclusions] The SHAP interpretation method effectively balances accuracy and interpretability in machine learning prediction models. The insights gained provide a foundation for optimizing operational strategies and content design on smart home-based care service platforms.

    Figures and Tables | References | Related Articles | Metrics
    Sentiment Analysis with Abstract Meaning Representation and Dependency Grammar
    Li Xuelian, Wang Bi, Li Lixin, Han Dixuan
    2024, 8 (1): 55-68.  DOI: 10.11925/infotech.2096-3467.2022.1259
    Abstract   HTML ( 16 PDF(959KB) ( 120 )  

    [Objective] This paper aims to combine the deep semantic representation and surface syntactic structure of natural language sentences. [Methods] We proposed an integration strategy based on semantic and syntactic rule concatenation and utilized it for the aspect-based sentiment analysis. This strategy used the answer set programming language (ASP) to represent abstract meaning representation (AMR), dependency grammar (DEP), and part of speech (POS) as ASP facts. It also integrated the DEP and POS through rule body extension based on AMR rules. Therefore, a sentence’s two or more language features were concatenated into the rule body. Based on this strategy, we developed the AMR-DEP-POS-C and AMR-DEP-C models. [Results] We examined the new methods on eight publicly available review datasets. The AMR-DEP-POS-C achieved a complementary relationship between semantics and syntax and performed better than the baseline methods based on semantic, syntactic, and deep learning. [Limitations] Our new models rely on the accuracy of the existing AMR and DEP parsers. [Conclusions] AMR-DEP-POS-C can effectively integrate different language features and bring new research perspectives and tools for aspect-based sentiment analysis.

    Figures and Tables | References | Related Articles | Metrics
    Personalized Recommendation Algorithm with Review Sentiments and Importance
    Li Hui, Hu Yaohua, Xu Cunzhen
    2024, 8 (1): 69-79.  DOI: 10.11925/infotech.2096-3467.2022.1270
    Abstract   HTML ( 17 PDF(946KB) ( 230 )  

    [Objective] To address the data sparsity issue and explore the impacts of emotional expression on user feature learning, this paper proposes a personalized recommendation algorithm based on sentiment and the importance of online reviews. [Methods] First, we used the BERT pre-trained language model to generate the vector representation of review texts. Then, we fed them into a Bi-GRU network to learn their semantic features. We also assigned weights to the review vector using sentiment weights and attention mechanisms. Finally, we utilized the DeepFM algorithm for deep interaction between user and product features to predict the user’s rating of the products. [Results] We examined the proposed model with the Amazon product data dataset. Our model reduced the RMSE and MAE metrics by up to 24.43% and 31.44% compared to the baseline models. Compared with the attention mechanism, our method reduced the RMSE and MAE metrics by up to 2.59% and 3.89%. [Limitations] The sentiment analysis method cannot represent the users’ emotional tendencies towards the different attributes of the product. [Conclusions] The proposed method considers the influence of user sentiment on user feature expression, improving the recommendation accuracy.

    Figures and Tables | References | Related Articles | Metrics
    Dynamic Movie Recommendation Considering Long-Term and Short-Term Interest and Its Evolution
    Liu Rui, Chen Ye
    2024, 8 (1): 80-89.  DOI: 10.11925/infotech.2096-3467.2022.1162
    Abstract   HTML ( 9 PDF(919KB) ( 98 )  

    [Objective] This paper proposes a personalized dynamic recommendation model for movies. It considered the evolution of long-term interest and short-term interest, capturing the dynamic changes of users’ interests to improve the accuracy of recommendation. [Methods] Firstly, users’ interest is divided into the long-term interest and the short-term interest based on their psychological motivation. And then the model used interest rating and attention frequency to calculate the interest values. Secondly, the model combined the time window with the forgetting function to obtain the time weight. The short-term interest value and the time weight are combined to reflect the evolution of short-term interest. Finally, the model constructed a user-project scoring matrix to predict the score of target user, by integrating the movie score with the long-term and the short-term interest values. [Results] Taking the data set of Douban as an example, the score prediction error of the method was smaller overall than that of other recommendation methods, and it performed best on MAE (1.0031) and RMSE (1.2160), and the number of neighbors is 20 when reaching the optimal values of MAE and RMSE. [Limitations] The explicit feedback information and the implicit feedback information are needed to calculate long-term and short-term interest values, so the computational complexity of the proposed method is relatively high. [Conclusions] The recommendation method can accurately capture the dynamic change of user interest, effectively reduce the error of score prediction, and improve the accuracy of recommendation.

    Figures and Tables | References | Related Articles | Metrics
    Identifying User Satisfaction Levels and Evolution Patterns in Exploratory Search
    Zhao Yiming, Chen Zhan, Zhang Fan
    2024, 8 (1): 90-103.  DOI: 10.11925/infotech.2096-3467.2022.1281
    Abstract   HTML ( 10 PDF(920KB) ( 89 )  

    [Objective] This paper identifies the user satisfaction levels in exploratory search and reveals the interaction and evolution between user satisfaction and reconstruction patterns of queries. [Methods] First, we retrieved the characteristics of user queries and their temporal sequences. Then, we used four supervised learning algorithms to predict user satisfaction levels. Third, we identified the interaction between user satisfaction and query reformulations. Finally, we developed new recommendation strategies for query reformulation in intelligent exploratory search assistance. [Results] We examined the proposed model with an open benchmark dataset, and the model’s prediction accuracy reached 74%, surpassing existing baseline models. There is a significant association between user satisfaction and query reformulation patterns. [Limitations] User satisfaction represents only one of the search perspectives. Future research should focus on constructing a comprehensive and unified description and classification system for users in exploratory search. [Conclusions] The proposed model further enhances the performance of the user satisfaction prediction. It provides theoretical support for intelligent search assistance strategy.

    Figures and Tables | References | Related Articles | Metrics
    Knowledge Distillation with Few Labeled Samples
    Liu Tong, Ren Xinru, Yin Jinhui, Ni Weijian
    2024, 8 (1): 104-113.  DOI: 10.11925/infotech.2096-3467.2022.1155
    Abstract   HTML ( 10 PDF(2371KB) ( 51 )  

    [Objective] This paper uses the knowledge distillation method to improve the performance of a small-parameter model guided by the high-performance large-parameter model with insufficient labeled samples. It tries to address the issue of sample scarcity and reduce the cost of large-parameter models with high performance in natural language processing. [Methods] First, we used noise purification to obtain valuable data from an unlabeled corpus. Then, we added pseudo labels and increased the number of labeled samples. Meanwhile, we added the knowledge review mechanism and teaching assistant model to the traditional distillation model to realize comprehensive knowledge transfer from the large-parameter model to the small-parameter model. [Results] We conducted text classification and sentiment analysis tasks with the proposed model on IMDB, AG_ NEWS, and Yahoo!Answers datasets. With only 5% of the original data labeled, the new model’s accuracy rate was only 1.45%, 2.75%, and 7.28% less than the traditional distillation model trained with original data. [Limitations] We only examined the new model with text classification and sentiment analysis tasks in natural language processing, which need to be expanded in the future. [Conclusions] The proposed method could achieve a better distillation effect and improve the performance of the small-parameter model.

    Figures and Tables | References | Related Articles | Metrics
    Classifying Ancient Chinese Text Relations with Entity Information
    Tang Xuemei, Su Qi, Wang Jun
    2024, 8 (1): 114-124.  DOI: 10.11925/infotech.2096-3467.2022.1367
    Abstract   HTML ( 11 PDF(1329KB) ( 90 )  

    [Objective] This paper integrates entity information with pre-trained language models, which help us classify ancient Chinese relations. [Methods] Firstly, we utilized special tokens in the input layer of the pre-trained model to mark the positions of entity pairs. We also appended entity-type descriptions following the original relation sentences. Secondly, we extracted semantic information of entities from the output of the pre-trained language model. Thirdly, we employed a CNN model to incorporate positional information of each token relative to the start and end entities into the model. Finally, we concatenated sentence representations, entity semantic representations, and CNN outputs and passed them through a classifier to obtain relation labels. [Results] Compared to pre-trained language models, our new model’s Macro F1 score was 3.5% higher on average. [Limitations] Analysis of the confusion matrix reveals a tendency for errors in predicting relations with the same entity type pairs. [Conclusions] Combining entity information and pre-trained language models enhances the effectiveness of ancient Chinese relation classification.

    Figures and Tables | References | Related Articles | Metrics
    Identifying Structural Elements of Scholarly Abstracts with ERNIE-DPCNN
    Hu Zhongyi, Shui Diancheng, Wu Jiang
    2024, 8 (1): 125-144.  DOI: 10.11925/infotech.2096-3467.2022.1359
    Abstract   HTML ( 20 PDF(943KB) ( 307 )  

    [Objective] This paper proposes an effective model to extract key elements from unstructured abstracts of academic literature automatically. [Methods] First, we used the ERNIE model to represent the abstracts. Then, we utilized the DPCNN to extract semantic features. Finally, we built the identification model. [Results] We evaluated the proposed model using a library and information science dataset. The precision, recall, and F1-score values were all above 0.95, which outperformed benchmark models. [Limitations] Since the corpus used in this study is from a specific domain, more research is needed to assess the model’s performance in other fields. [Conclusions] The proposed model can represent the abstract more comprehensively, improving the structural elements’ identification performance from unstructured abstracts.

    Figures and Tables | References | Related Articles | Metrics
    Extracting Long Terms from Sparse Samples
    Lyu Xueqiang, Yang Yuting, Xiao Gang, Li Yuxian, You Xindong
    2024, 8 (1): 135-145.  DOI: 10.11925/infotech.2096-3467.2022.1231
    Abstract   HTML ( 13 PDF(1047KB) ( 82 )  

    [Objective] This paper proposes a model combining head and tail pointers with active learning, which addresses the sparse sample issues and helps us identify long terms on weapons. [Methods] Firstly, we used the BERT pre-trained language model to obtain the word vector representation. Then, we extracted the long terms by the head-tail pointer network. Third, we developed a new active learning sampling strategy to select high-quality unlabeled samples. Finally, we iteratively trained the model to reduce its dependence on the data scale. [Results] The F1 value for long term extraction was improved by 0.50%. With the help of active learning post-sampling, we used about 50% high-quality data to achieve the same F1 value with 100% high-quality training data. [Limitations] Due to the limitation of computing power, the data set in this paper was small, and the active learning sampling strategy requires more processing time. [Conclusions] Using head-tail pointer and active learning method can extract long terms effectively and reduce the cost of data annotation.

    Figures and Tables | References | Related Articles | Metrics
    Constructing Patent Knowledge Graph with SpERT-Aggcn Model
    He Yu, Zhang Xiaodong, Zheng Xin
    2024, 8 (1): 146-156.  DOI: 10.11925/infotech.2096-3467.2022.1142
    Abstract   HTML ( 15 PDF(1502KB) ( 111 )  

    [Objective] This paper proposes an information extraction model (SpERT-Aggcn) and constructs knowledge graphs for green cooperation patents based on this model. It helps us identify nested entities and improve the accuracy of relationship extraction for knowledge graphs. [Methods] First, we utilized the SpERT-Aggcn model to extract nested entities and relationships from patent abstracts. Then, we built an ontology using Protégé and mapped the triples with the constructed ontology. [Results] In relationship extraction, the SpERT-Aggcn model’s F1 score was 2.61% higher than the SpERT model. The SpERT-Aggcn model’s F1 score was 4.42% higher than the SpERT model for the long-distance relationship extraction tasks. The constructed knowledge graph for green cooperation patents contained 699,517 entities and 3,241,805 relationships. [Limitations] The F1 score of SpERT-Aggcn for extracting short-distance relationships was lower than the SpERT model, indicating a weaker capability of the proposed model in identifying short-distance relationships. [Conclusions] The proposed model could help us construct better knowledge graphs.

    Figures and Tables | References | Related Articles | Metrics
2023, Vol. 7 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.3 No.2 No.1
2022, Vol. 6 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.2/3 No.1
2021, Vol. 5 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.3 No.2 No.1
2020, Vol. 4 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.2/3 No.1
2019, Vol. 3 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.3 No.2 No.1
2018, Vol. 2 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.3 No.2 No.1
2017, Vol. 1 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.3 No.2 No.1
2016, Vol. 32 No.12 No.11 No.10 No.9 No.7-8 No.6
No.5 No.4 No.3 No.2 No.1
2015, Vol. 31 No.12 No.11 No.10 No.9 No.7-8 No.6
No.5 No.4 No.3 No.2 No.1
2014, Vol. 30 No.12 No.11 No.10 No.9 No.7 No.6
No.5 No.4 No.3 No.2 No.1
2013, Vol. 29 No.12 No.11 No.10 No.9 No.7 No.6
No.5 No.4 No.3 No.2 No.1
2012, Vol. 28 No.12 No.11 No.10 No.9 No.7 No.6
No.5 No.4 No.3 No.2 No.1
2011, Vol. 27 No.12 No.11 No.10 No.9 No.7 No.6
No.5 No.4 No.3 No.2 No.1
2010, Vol. 26 No.12 No.11 No.10 No.9 No.7 No.6
No.5 No.4 No.3 No.2 No.1
2009, Vol. 25 No.12 No.11 No.10 No.9 No.7-8 No.6
No.5 No.4 No.3 No.2 No.1
2008, Vol. 24 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.3 No.2 No.1
2007, Vol. 23 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.3 No.2 No.1
2006, Vol. 22 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.3 No.2 No.1
2005, Vol. 21 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.3 No.2 No.1
2004, Vol. 20 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.3 No.2 No.1
2003, Vol. 19 No.6 No.5 No.4 No.3 No.2 No.1
2002, Vol. 18 No.6 No.5 No.4 No.3 No.2 No.1
2001, Vol. 17 No.6 No.5 No.4 No.3 No.2 No.1
2000, Vol. 16 No.6 No.5 No.4 No.3 No.2 No.1
1999, Vol. 15 No.6 No.5 No.4 No.3 No.2 No.1
1998, Vol. 14 No.6 No.5 No.4 No.3 No.2 No.1
1997, Vol. 13 No.6 No.5 No.4 No.3 No.2 No.1
1996, Vol. 12 No.6 No.5 No.4 No.3 No.2 No.1
1995, Vol. 11 No.6 No.5 No.4 No.3 No.2 No.1
1994, Vol. 10 No.6 No.5 No.4 No.3 No.2 No.1
1993, Vol. 9 No.4 No.3 No.2 No.1
1992, Vol. 8 No.4 No.3 No.2 No.1
1991, Vol. 7 No.4 No.3 No.2 No.1
1990, Vol. 6 No.4 No.3 No.2 No.1
1989, Vol. 5 No.4 No.3 No.2 No.1
1988, Vol. 4 No.4 No.3 No.2 No.1
1987, Vol. 3 No.4 No.3 No.2 No.1
1986, Vol. 2 No.4 No.3 No.2 No.1
1985, Vol. 1 No.4 No.3 No.2 No.1
Manuscript Center
  • Position
    Stay Connected
    Subscribe
      Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn