The explosive growth of Artificial intelligence (AI) has both positive and negative impacts on many aspects of human society and environment. While academia and industry are optimistic about the positive impacts on enhancing education, environment sustainability, healthcare systems and quality, and transportation of people and goods, experts are concerned about the potential harm and danger resulting from AI’s negative impacts if they are not contained. This essay discusses why it is imperative to emphasize the trustworthiness of AI and current developments in assuring trustworthy AI. The convenience and efficiency brought about by AI applications are embraced by both experts and general public, but how to contain the potential negative impacts and possible harms and dangers that come from ill- and/or even evil-purposed actors is a tremendous, complex challenge. Establishing trustworthy AI is considered as a major approach to fighting and containing the negative impacts of AI. Efforts in trustworthy AI include two broad areas: policies and regulations from governments and research and development (R&D) from academia and industry sectors. The policies and regulations focus on the ethical, legal, and robustness principles to provide guidance for R&D in trustworthy AI. In research publications, a commonly shared view is that trustworthy AI should have properties of reliability, safety, security, privacy, availability, and usability. For different population groups, the requirements for trustworthy AI may vary. One important development in trustworthy AI is the shift from model-centric AI to data-centric AI. The paradigm of data-centric AI emphasizes data quality through systematic design of datasets used for machine learning modeling, which include data design, data sculpting, and data strategies with data policy throughout the whole data design, sculpting, and strategy process. Both policy and technical developments in shaping trustworthy AI and containing the negative impact of AI present many new research and development opportunities for academia and industry.
[Objective] This study explores the explainability mechanisms in explainable recommendation models from the perspectives of embedding and post-processing. [Coverage] Literature searches were conducted in Google Scholar and CNKI using the keywords “explainable recommendation”, “interpretable recommendation” and “explainable AI”. After topic filtering, a total of 64 representative papers on explainability methods were selected and reviewed using the backward snowballing technique. [Methods] From the embedding perspective, the explainability methods for recommendations were studied by analyzing four aspects: knowledge graphs, deep learning, attention mechanisms, and multi-task learning. From the post-processing perspective, the explainability methods were explored by analyzing five aspects: predefined templates, sentence retrieval, natural language generation, reinforcement learning, and knowledge graphs. The explainability methods were compared in detail in terms of their logical reasoning, performance characteristics, and limitations. Finally, the study provided an outlook on the pressing issues that need to be addressed in explainability research. [Results] Explainability can effectively enhance the persuasiveness of recommendation systems, improve the user experience, and is a crucial approach to increase the transparency and trustworthiness of recommendation systems. [Limitations] The study did not address the evaluation metrics for explainability algorithms. [Conclusions] Although existing explainability methods can satisfy the explanation requirements of many applications to a certain extent, there are still numerous challenges in research areas such as conversational interactive explanations and causal explanations.
[Objective] The study investigates the characteristics of user querying behavior in a generative artificial intelligence (GAI) environment, and evaluates the suitability and effectiveness of GAI technology in search engines. [Methods] Behavioral data were collected through user experiments and questionnaires. Statistical analyses, including the non-parametric Wilcoxon test and the chi-square test, were performed to compare the variations in user search behavior patterns across different search engine environments. [Results] Compared to traditional search engines, the GAI-based search engine shows an average increase of 5.61 characters in query length, an extension of 8.92 seconds in query construction time, and an increase of 1.25 words in task descriptions. In particular, the use of translation and system-following strategies rose to 29.30% and 12.11%, respectively. In addition, users’ subjective satisfaction scores rose by 0.88 points. [Limitations] The study did not examine broader user search behaviors, such as browsing and utilization of search results. [Conclusions] While GAI technology can enhance search engines and improve users’ search experience, it also poses challenges related to high cognitive load, low credibility, and complex interactions.
[Objective] This paper uses the stem cell field as an empirical example to investigate the evolution of research teams, the changes in author influence, and their relationships. [Methods] We employed the Leiden algorithm to extract community structures in the collaboration network. Then, we determined the leaders based on the influence variance. We utilized the Pearson correlation coefficient to explore the relationship between the community size evolution and leader node influence changes. [Results] The research team exhibited distinct leader characteristics, with a degree variance significantly higher than the degree variance in the random network. 11 out of 15 leaders continuously led the team across all time slots. Approximately 80% of the community’s size and leader centrality showed a linear correlation, with R-values close to 1 and a P-value less than 0.05. [Limitations] This paper simplifies complex leader models by selecting only the most representative expert. [Conclusions] Key figures lead the team over a considerable period, and changes in leader influence are strongly correlated with team size.
[Objective] By constructing a three-dimensional research framework of “scenario-problem-method” for altmetrics, this article aims to enrich the research design of altmetrics analysis and promote the healthy and sustainable development of altmetrics. [Methods] Drawing on mature frameworks in science of science and informetrics, and combining the characteristics of altmetrics, a research framework is constructed from four dimensions: application scenarios, research questions, key methods, and exploratory methods. [Results] The application scenarios of altmetrics are summarized in scientific evaluation, scientific communication and knowledge diffusion. Specifically, indicator application, influencing factors and indicator construction are proposed for scientific evaluation scenarios. Communication strategies, communication structures, communication trends, and science-social interaction are proposed for scientific communication scenarios. And research questions on knowledge diffusion include diffusion strategy, diffusion structure and diffusion effect. And we combine three key analytical methods of causal inference, network analysis, and machine learning to explain each research design according to the problem. [Limitations] The frameworks proposed in this study faces challenges in terms of operability and implementation. Further empirical testing is needed in the future. [Conclusions] The framework proposed in this article is conducive to promoting altmetrics to enter the connotative development phase.
[Objective] This paper constructs a new mobile visual search model to address the problem of convenient retrieval and scenario awareness of narrative murals. [Methods] We built a graph for the mural scenario with contexts as elements based on context awareness theory and information foraging theory. Then, we extracted murals’ global and local visual features, combining multiple models, and performed feature matching with dot-product. In specific contexts, we realized scenario association with the context graph and scene mobile visual search facilitating user perception and understanding. [Results] When searching for murals related to time, place, person, and events, the proposed model’s average mAP value reached 0.840, outperforming models like VGG16, BOW_KAZE, and HOG. [Limitations] We did not consider the impacts of user’s scenarios on the search intent. [Conclusions] The proposed mobile visual search model can effectively search for context-related murals and has reference values for public cultural institutions to launch scenario-based mobile visual search services.
[Objective] This paper constructs a weak signal identification model for disruptive technologies based on machine learning. It aims to discover early-stage disruptive technologies and explore their disruptive potential to existing mainstream technologies. [Methods] By summarizing the core characteristics of disruptive technology’s weak signals, we designed a Disruptive Index-Patent (DI-P) based on the patent citation categories. We also constructed historical disruptive technology corpora and designed a weak signal identification model for disruptive technologies based on machine learning. Machine learning models such as Logistic Regression, Gaussian Naive Bayes, Stochastic Gradient Descent, Gradient Boosting Trees, and Random Forests were selected for comprehensive prediction. Finally, we explored the future disruptive paths of technology’s weak signals through link prediction. [Results] We conducted an empirical analysis in hydrogen storage and used the DI-P based on citation categories to obtain historical disruptive technical corpora. Its accuracy and AUC values were better than RDI and DI. By comparing the weak signals of disruptive technologies with high-value patents, we can identify potential future disruptive paths from the perspectives of cost, efficiency, and security. [Limitations] The empirical field is relatively single, data sources are limited to patent data and strategic planning, and the prediction model has limited accuracy. [Conclusions] By combining machine learning models with link prediction methods, we can identify signals of disruptive technologies and their disruption paths with precision and fine granularity.
[Objective] The existing fake news detection models based on dissemination patterns could not sufficiently explore and integrate the users’ sentiment preferences. This paper aims to address this issue and improve the accuracy of these models. [Methods] We constructed a fake news detection model, USPGCN, which integrated news dissemination mode and communicator emotional preferences. Firstly, we examined the emotional preference characteristics of communicators from their historical posts and enriched the news text features with their emotional characteristics. Secondly, based on the news dissemination patterns, we combined the communicator’s sentiment preferences and the news dissemination patterns using the graph convolutional neural network and mixed pooling functions. Thirdly, we integrated the enriched news text features and the pooling function results. Finally, we fed these data into a classifier to obtain the final classification results. [Results] Compared with baseline models on the publicly available datasets GossipCop and PolitiFact, the new model’s precision reached 0.973 9 and 0.904 8, respectively, outperforming the baseline models and demonstrating its effectiveness. [Limitations] This study does not consider some cases, such as communicators sharing news due to trends. [Conclusions] The method integrating news dissemination patterns and communicator sentiment preferences can effectively improve the accuracy of fake news detection.
[Objective] In order to maintain the order of the e-commerce market, it is particularly important to develop effective fake review identification technology. This paper aims to solve the data imbalance problem in fake review identification and the catastrophic forgetting problem in the model learning process. [Methods] This paper proposes a fake review identification method based on incremental learning to solve the data imbalance problem, and introduces elastic weight consolidation technology to alleviate the catastrophic forgetting problem that may occur in the model during the learning process. [Results] Experiments were conducted on the YelpCHI, YelpNYC, and YelpZIP datasets. Compared with the existing advanced method (En-HGAN), the F1 scores of our model on the three datasets increased by 17.2%, 16.1%, and 13.3%, respectively, and the AUC scores increased by 12.8%, 13.8%, and 13.6%, respectively. [Limitations] The current method still has room for improvement when dealing with extremely imbalanced data sets. The catastrophic forgetting caused by incremental learning still exists. [Conclusions] The experimental results show that the proposed method is effective in identifying false reviews and can provide technical support for the integrity construction of the e-commerce market.
[Objective] This paper proposes an early warning model for public opinion reversals based on sliding window topic dissimilarity value, aiming to predict event reversals timely and accurately. [Methods] Comments on reversal events from social platforms were taken as the research object. First, we set a sliding window on the time series data. Then, we extracted topics from the windows using a topic model to build the topic dissimilarity value algorithm. Third, we input texts into the trained sentiment analysis model to calculate emotional fluctuations. Finally, two indicators at different times were input into a time anomaly detection sequence to determine whether reversals occurred. [Results] We created a dataset using comments on representative public opinion reversal events that occurred between 2018 and 2022 to conduct experimental verification. The accuracy of the proposed model on our dataset reached 98.15%, with F1 scores for positive and negative sentiments being 98.90% and 99.30%, respectively. [Limitations] It is difficult to accurately predict reversals in long-duration events with multiple contributing indicators of topic dissimilarity and emotional fluctuations. [Conclusions] There is a correlation between public opinion reversal and the comment topics and public sentiment. The proposed model performs well in detecting public opinion reversals.
[Objective] This study uses deep learning methods to analyze user-generated content, with the aim of exploring how customer perceived value is influenced by personality traits, which is significant for online platform behavioral analysis. [Methods] A framework based on group knowledge and fine-grained text decomposition is proposed. First, a topic model identifies the perceived factor framework of the product or service, and reviews are decomposed accordingly. The enhanced Doc2Vec-IOVO multi-classification strategy calculates detailed perceived value scores, while NLP deep learning models assess users’ Big Five personality traits. Secondly, the study evaluates the impact of personality on perceived value and examines how different personality traits affect this impact. Finally, the predictive value of personality measures will be explored. [Results] The enhanced strategy significantly improved multi-level emotion recognition, achieving a maximum accuracy of 96.50%, an improvement of 18.28 percentage points over the baseline model. Incorporating new features increased recognition accuracy by up to 2.66 percentage points. Neuroticism, extraversion, conscientiousness, and openness have a significant effect on perceived value, with neuroticism having a negative effect and the others a positive effect. Extraversion and neuroticism exert a stronger influence than other traits, demonstrating the significant predictive value of personality indicators, using personality indicators to accurately predict user behaviour increases by 3.82~7.72 percentage points. [Limitations] The personality recognition model is based solely on the James English stream of consciousness dataset, and lacks data in other languages and fields. [Conclusions] The fine-grained perceptual factor analysis and text-based personality recognition methods proposed can replace traditional survey methods and help companies to efficiently and cost-efficiently analyze user psychology, predict perceptual tendencies, and adjust business strategies.
[Objective] This paper addresses the scarcity of query-focused text summarization datasets and explores methods to meet the personalized query needs of researchers. [Methods] Based on ChatGPT and prompt engineering, this paper constructed a generation and self-verification prompt chain. It proposed an automated data annotation framework using large language models like ChatGPT as “data annotators”. Then, we constructed the AMTQFSum dataset, consisting of query-focused summaries of academic conference records in natural language processing. [Results] AMTQFSum demonstrates superior data volume and length distribution. Using the UniEval model, AMTQFSum outperformed existing QFS datasets with an average score improvement of 85% and 33%. We examined the benchmark effectiveness of the AMTQFSum dataset on six classic extractive and abstractive query-focused summarization models. The BART-based model achieved the best results, with ROUGE-1/2/L scores reaching 52.53%, 35.61%, and 44.80%, respectively. [Limitations] The dataset only cover a narrow range of fields. [Conclusions] The large language model data annotation method based on prompt chains provides a feasible solution for automated data annotation. The AMTQFSum dataset provides a foundational resource for for query-focused summarization tasks.
[Objective] This paper aims to automate extracting key technical information from complex patent texts and to overcome the dependency on robust domain knowledge annotations in traditional natural language processing models. [Methods] We proposed an unsupervised key information extraction method based on knowledge self-distillation in the large language model. By employing a multiple-role strategy, we conducted a structured analysis of Derwent’s rewritten patent abstracts. This method enhanced the ability of large language models to extract and structurally analyze key content through the knowledge self-distillation strategy. [Results] In the entity and relation extraction tasks, our method’s recall rate reached 95.40% and 51.49%, respectively. The accuracy of the structural analysis format reached 100%. We also achieved an F1-score of 5.01% on the RE-DocRED dataset, a public dataset for the relation triplet extraction task, under unsupervised and zero-shot settings. [Conclusions] The proposed method can effectively extract key information from patent texts without data annotation.
[Objective] This paper aims to improve the accuracy of automatic extraction of technical words and function effects of patents. [Methods] First, ChatGPT is used as the Teacher-model, and ChatGLM3 is used as the Student-model. Through knowledge distillation, the training data extracted by ChatGPT are used to fine-tune ChatGLM3, resulting in multiple technical word extraction models and a function word extraction model. These models are performed to extract technical words and function words from the abstract, the first claim, and the technical effect segments of patents, respectively. [Results] Compared to ChatGPT, the fine-tuned technical word extraction models and the function word extraction model show higher accuracy and lower recall rates. The ChatGLM3 fine-tuning model of the first claim has the highest accuracy of 0.734 and F1 values of 0.724, respectively. The accuracy of the function word extraction model reached 0.649, which was higher than the accuracy of the commercial tool’s 0.530. [Limitations] This study needs to be further optimized in the following aspects. The technical field and patent language are single, the amount of verification data is small, and the data cleaning rules are not comprehensive enough. [Conclusions] This research scheme improves the accuracy of large language models in automatically extracting technical effects through knowledge distillation operation. Additionally, this study supports mining cutting-edge innovative and hotspot technologies from patents, facilitating higher quality intelligent patent analysis.
[Objective] This paper proposes an answer selection method integrating question classification with the RoBERTa model. It aims to address the issues in existing pre-trained models, such as insufficient utilization of semantic interaction information between question-answer sentences and unstable accuracy during fine-tuning. [Methods] We introduced an EAT annotation approach that retained the original entity semantics and combined it with multi-sentence joint-RoBERTa modeling to construct an answer selection model. Additionally, we employed a two-stage fine-tuning process for transfer learning to enhance the model’s stability during fine-tuning. [Results] The proposed method achieved P@1, MAP, and MRR scores of 0.843, 0.896, and 0.903 on the WiKiQA dataset. On the TrecQA dataset, the scores reached 0.955, 0.944, and 0.974, respectively. Moreover, this method enhanced the stability of the model’s accuracy convergence process. [Limitations] For complex questions of the types “abbreviations (ABBR)” and “descriptions (DESC)”, the new method cannot effectively extract key entities from the answer sentences. Therefore, we cannot use the classification information to enhance semantic interaction modeling between question-answer sentences. [Conclusions] The proposed method could effectively improve model performance and robustness.
[Objective] This study aims to construct an argumentation structure suitable for civil judgment documents and to achieve automated extraction of argumentation elements. [Methods] Based on the Toulmin’s argument model, we constructed an argumentation structure for civil judgment documents to guide the annotation of a corpus of argumentation in civil judgment documents. We then proposed a Context-Aware Multi-Head Attention Argumentation Element Classification Model (CAMA-AECM) for the automatic extraction of argumentation elements. [Results] The proposed model showed superior performance on different datasets of argumentation subjects. In terms of Macro-F1 score, the model achieved maximum improvements of 1.73%, 5.72%, and 3.92% on the datasets corresponding to the plaintiff, defendant, and court, respectively. [Limitations] Due to the cost and scale of constructing argumentation corpora, we did not explore the argumentation structure and features of judgment documents for all civil cases. [Conclusions] This model can effectively identify argumentation elements, not only enhancing the in-depth exploration of argumentation knowledge in judgment documents, but also providing a new automated tool for judgment document analysis.
[Objective] In response to the scarcity of annotated data in the current research on entity relationship extraction and graph construction in the field of intangible cultural heritage (ICH), a lightly annotated relationship extraction scheme is proposed. [Methods] Using silk weaving domain texts as the data source, the SREP model is constructed, integrating domain-specific terminology dictionaries and LTP tools for entity recognition. Subsequently, the BERT model is utilized to vectorize the representation of entities and their contextual text, and various clustering algorithms are applied to different feature combinations for relationship extraction experiments to determine the optimal algorithm and feature combination. The Bootstrapping method is then employed for active learning to expand the instances of relationships. Finally, the extracted relationship triples are imported into Gephi to construct a domain-specific knowledge graph. [Results] The experimental results indicate that the K-means algorithm, combining entity intermediate text features with entity type features, achieved the best results in relationship extraction experiments, identifying five types of relationships. During the relationship instance expansion phase, the LR algorithm is more suitable for active learning methods, with an accuracy rate of 0.860, an improvement of 0.105 over the baseline. [Limitations] The effectiveness of the model needs further verification on larger datasets and relationship extraction in different fields. [Conclusions] The model proposed in this study can effectively extract entity relationships from ICH texts and achieve semantic mining and utilization of structured ICH texts, reducing the dependence on annotated data.
[Objective] This paper designs an automatic construction algorithm for clustering labels and extracts representative phrases from clustered text groups to summarize their main contents and reveal the common information within the texts. [Methods] We proposed a phrase-level clustering label automatic construction algorithm based on Improved Association Rules (IAR). It selected representative word combinations in text clusters and mapped them back to the original text to obtain phrase-form labels. We adjusted the traditional association rule metrics, added new distinctiveness metrics, and designed a combination of metric weights. We also developed a labeling scheme for clustering label data and manually labelled a dataset of short research question sentences to evaluate the algorithm’s performance. [Results] The proposed algorithm achieves good performance on the dataset, with a ROUGE1-F1 score of 78.39%, effectively constructing concise and accurate labels automatically. [Limitations] Our method only constructed labels from the cluster texts without considering external lexicons, such as hypernyms. [Conclusions] This paper presents an effective clustering label automatic construction algorithm through improved association rules. It significantly enhances the interpretability of text clustering results and supports readers’ quick understanding of the cluster content.
[Objective] This research aims to achieve the automatic quantification of semantic evaluation metrics for scientific papers using large language models, supporting the study of semantic evaluation of scientific literature. [Methods] First, we extracted rhetorical moves related to evaluation metrics from scientific papers with three levels of prompt detail—standard, simplified, and detailed. Then, we compared the effectiveness of these prompts. Third, we fine-tuned a large language model with a small number of annotated samples to develop a model for quantifying semantic evaluation metrics. [Results] Based on the semantic content of the papers, we analyzed the “difficulty of experimental conditions” dimension. The proposed model achieved the best performance. With a training sample size of 100, its Micro-Acc and Fuzzy-Acc reached 0.72 and 0.87, respectively. [Limitations] The experiment only included scientific papers in computer science, and we need to explore the effectiveness of the proposed method across different disciplines was not explored. [Conclusions] The proposed method demonstrates high accuracy and reliability in evaluating scientific papers. Increasing the level of detail in prompts significantly improves the quantification effect. While increasing the number of samples during the fine-tuning stage improves overall performance, the degree of improvement varies across different scoring ranges.
[Objective] This paper addresses the insufficient text mining of needs and patent documents and inaccurate matching between needs and technology in existing research. [Methods] First, we combined the TDAM multi-task learning framework and the F-term patent identification method to design a more precise and effective requirements-technology matching process framework. Then, we verified the model’s effectiveness using the new energy vehicle field as an example. [Results] The new model’s needs-technology matching accuracy reached 0.819, nearly 13.1% higher than the S-LDA model and nearly 31.5% higher than the BiLSTM model. The proposed model’s recall rate was 0.796, with an F1 value of 0.807. [Limitations] We only collected Japanese patents, making the data sources not comprehensive enough. [Conclusions] The proposed model can generate patent technology that closely matches user needs, thereby assisting enterprises in developing technical solutions for specific consumer needs. It will also guide enterprises in determining the direction of technology research and development.
[Objective] This paper uses a knowledge graph to introduce external knowledge, combined with multi-modal fusion and confidence detection mechanisms, to explore the relationship between clinical questions and medical images. It enhances the performance in medical visual question answering(VQA) tasks. [Methods] We proposed a novel medical VQA model consisting of a text knowledge enhancement layer, an image embedding layer, a multimodal fusion layer, a confidence detection layer, and a prediction layer. The text knowledge enhancement layer embeds external knowledge graphs into the clinical question representation, the image embedding layer captures the medical image representations, the multimodal fusion layer captures the interaction between text and image, the confidence detection layer assesses the reliability of the data, and the prediction layer generates the prediction results. We conducted empirical studies on the VQA-RAD and PathVQA datasets. [Results] The optimal accuracy of the proposed model reached 59.3% and 16.2%, demonstrating the model’s effectiveness. [Limitations] We only consider a single language context and need more validation on other multilingual datasets. [Conclusions] This study significantly improves the performance of medical VQA tasks. It provides important reference values for enhancing the quality and efficiency of services in the healthcare field and other professional domains.
[Objective] This paper constructs an interactive matching model to study the inaccurate responses, insufficient precision in returning multiple results, and polysemy problems in multi-turn dialogue in the intelligent question-answering system in the financial sector. [Methods] We proposed a multi-granularity and multi-attention interaction matching model (MGMAI) based on BERT. MGMAI included a preprocessing layer, a representation layer, an attention interaction layer, a semantic aggregation layer, and a dialog selection layer. This model focused on the key information in dialogues and utilized such information to achieve efficient dialog matching. [Results] The MGMAI model was applied to two open multi-turn dialogue datasets for training and validation. The model was fine-tuned on financial data. Experimental results showed that MGMAI outperformed the DCM model by 0.019, 0.010, and 0.007 on the R10@1, R10@2, and R10@5 metrics. [Limitations] This model was only tested in intelligent question-answering systems with financial data. We did not validate its generalization ability in other fields. [Conclusions] The MGMAI model can effectively improve the accuracy of multi-turn dialogues and handle ambiguous issues facing intelligent question-answering systems in the financial sector. It shows potential application value and room for improvement.
[Objective] This study aims to design an image-based retrieval system for rural cultural and tourism destinations. It retrieves destinations that meet specific needs through image and label-based searches, thus assisting tourists in making rural tourism decisions. [Methods] We constructed a database of rural cultural and tourism destinations and images. Then, we built an image feature extraction model based on the ViT model and utilized the Milvus vector database to store image features and information on rural cultural tourism. Third, we implemented a hybrid search combining labels and depth features. Finally, we developed the system using front-end and back-end technologies. [Results] The proposed method achieved good query accuracy on the experimental datasets, with a mAP@100 of 0.7642 on a self-built dataset, surpassing baseline models. [Limitations] The new system is limited by the scale and variety of the experimental datasets, resulting in less diverse and appropriate retrieval results for some needs. [Conclusions] The proposed model can accurately retrieve related images for convenient and user-friendly services.
[Objective] A neural network model is used to solve the problem of ceramic ware types classification with few samples, and the performance of the model for ceramic ware types classification is improved by using multiscale and attention mechanism optimization. [Methods] A bottleneck structure based on coordinate attention mechanism and multiscale fusion is proposed and applied to the residual network, which innovatively introduces the relationship between scales and ultimately improves the modeling ability of the residual networks in terms of multiscale. [Results] On the public dataset of ceramic ware types images, this model achieves a classification accuracy of 95.71% with only a few samples learning, representing an improvement of 1.01 percentage points over the baseline model ResNet50. In terms of precision, recall, and F1 score metrics, the proposed model outperforms ResNeSt50 by 20.43, 20.53, and 20.52 percentage points, respectively. [Limitations] Although the model’s recognition accuracy and other metrics have increased, the efficiency of inference has decreased, and it would not be suitable for scenarios where rapid ceramic ware classification is required. [Conclusions] The multiscale improvement approach is simple and effective in ceramic ware types classification, and this optimization strategy should be prioritized when performing this type of task or similar humanity data.