Data Analysis and Knowledge Discovery

Select

Review of Recommendation Systems Based on Knowledge Graph

Zhu Dongliang, Wen Yi, Wan Zichen

Data Analysis and Knowledge Discovery. 2021, 5(12): 1-13. https://doi.org/10.11925/infotech.2096-3467.2021.0516

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper reviewed the latest achievements of recommendation systems based on the knowledge graph. [Coverage] We used “knowledge graph”, “KG”, “recommendation system”, “RS”, and “recommended system” as key words to search the Web of Science, CNKI, Wanfang and other scholarly databases. A total of 70 documents were reviewed. [Methods] First, we summarized the classification of recommendation algorithms based on knowledge graphs. Then, we sorted the development history of recommendation systems using different types of algorithms. Finally, we discussed the typical algorithms and their future development trends. [Results] The reviewed recommendation systems were based on connection, embedding and hybrid methods. The three types of algorithms have advantages and disadvantages in different scenarios. Maximizing the utilization of graph information and reducing the computing power consumption is the future direction. [Limitations] We did not include the commercial examples of the recommendation systems. [Conclusions] The knowledge graph and machine learning could effectively improve the traditional recommendation algorithms.

Select

Review of Automatic Citation Classification Based on Machine Learning

Zhou Zhichao

Data Analysis and Knowledge Discovery. 2021, 5(12): 14-24. https://doi.org/10.11925/infotech.2096-3467.2021.0608

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper summarizes the application of natural language processing and machine learning technology in automatic citation classification. [Coverage] We searched “citation classification”, “citation polarity”, “citation function” and “feature selection” with Scopus database, and retrieved a total of 46 representative literature. [Methods] These research was reviewed from the perspectives of citation classification process, tasks and methods. Then, we discussed their future development trends and challenges. [Results] The research of citation classification is shifting from multi-class to binary class. Deep learning model can classify sentiments and functions of citations simultaneously. The challenges facing automatic citation classification include single discipline corpus, controversial definition of citation contexts and unbalanced classification data. [Limitations] This review does not discuss many classification systems in the industry. [Conclusions] We need to develop the evaluation method for re-using scientific research data such as codes, data and corpus, which could help to build open science. Combining citation classification and counts could establish a multi-dimensional evaluation model. Based on the user’s search results, the system could recommend documents supporting or objecting the related research for further reading.

Select

Matching Model for Technology Supply and Demand Texts Based on Multi-Layer Semantic Similarity

Li Gang, Yu Hui, Mao Jin

Data Analysis and Knowledge Discovery. 2021, 5(12): 25-36. https://doi.org/10.11925/infotech.2096-3467.2021.0524

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a new high-accuracy-model, aiming to improve the matching of technology supply and demand texts and promote technology transfer. [Methods] First, we separated the titles and texts as two structure levels. Then, we calculated the word similarity and sentence similarity through a variety of methods. Finally, we constructed a Multi-layer Semantic Text Matching (MSTM) model based on multi-layer semantic similarity and the deep learning model. [Results] We found that different level of information yielded different matching results. The accuracy of MSTM was 96.50%, which was higher than single BERT (90.70%), DSSM (87.80%), and ESIM (87.50%). [Limitations] Our new model only considers two levels of text structures. [Conclusions] This new model can help online technology trading services match supply and demand, as well as promote technology transfer.

Select

Aspect-Level Sentiment Analysis Based on BAGCNN

Yu Bengong, Zhang Shuwen

Data Analysis and Knowledge Discovery. 2021, 5(12): 37-47. https://doi.org/10.11925/infotech.2096-3467.2021.0554

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a BERT-based Attention Gated Convolution Neural Network model (BAGCNN), aiming to improve the traditional aspect-level sentiment analysis algorithm. [Methods] First, the pre-trained BERT model generated feature representation for the texts and aspect words. Then, we introduced the Multi Head Self-attention Mechanism to solve the problem of long-distance dependence of aspect words. Finally, we selectively extracted the multi-level context features paralleling the aspect words with the Gated Convolution Neural Network. [Results] Compared to the benchmark model, the accuracy of our new model was improved by 4.24, 4.01 and 3.89 percentage points on restaurant, laptop and twitter datasets. The size of the downstream parallel structure of the model was also reduced by 1.27 MB. [Limitations] The proposed model did not work well with data sets having significantly different text length. [Conclusions] The new BAGCNN model could effectively remove the context information irrelevant to the aspect words.

Select

Automatic Classification of Citation Sentiment and Purposes with AttentionSBGMC Model

Zhou Wenyuan, Wang Mingyang, Jing Yu

Data Analysis and Knowledge Discovery. 2021, 5(12): 48-59. https://doi.org/10.11925/infotech.2096-3467.2021.0679

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a deep learning model——AttentionSBGMC to improve the automatic classification of citation sentiment and purposes. [Methods] First, we used the SciBERT pre-training model to obtain the semantic representation vector for the sentences. Then, according to the characteristics of the texts, we used the BiGRU neural network and the multi-scale convolutional neural network (Multi-CNN) to extract their temporal global features and local key features. Third, we utilized the attention model to highlight the key features by redistributing the extracted features’ weights. Finally, we finished the classification tasks with the help of linear layers. [Results] We examined the new method with two citation data sets. With Abu-Jbara data set the F1 values in three classification tasks (for subjective and objective citation emotion, positive and negative citation emotion, and citation purpose) were 86.74%, 91.14% and 84.92%, respectively. With Athar data set the F1 values in two classification tasks (for subjective and objective citation emotion, positive and negative citation emotion) were 88.50%, 86.59%, respectively. [Limitations] The proposed model was only examined on English data sets, which needs to be expanded in the future. [Conclusions] The proposed model could effectively extract the important corpus features, and automatically classify citation sentiment and purposes.

Select

Analyzing Knowledge Payment Behaviors with Information Adoption Model and Product Types

Qi Tuotuo, Bai Ruyu, Wang Tianmei

Data Analysis and Knowledge Discovery. 2021, 5(12): 60-73. https://doi.org/10.11925/infotech.2096-3467.2021.0588

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper explores the information quality of product description and the credibility of knowledge producers, aiming to investigate their impacts on users’ knowledge payment behaviors moderated by product types. [Methods] First, we retrieved data from Zhihu Live with the help of a Web crawler. Then, we studied the impacts with the robust regression and text analysis methods based on the information adoption model. We also divided knowledge payment products into the utilitarian and hedonic ones, and then compared their different action paths. [Results] The elaborateness, vividness, and relevance of product descriptions as well as the reputation, experience, and information completeness of knowledge producers positively affect knowledge payment behaviors. Compared with utilitarian products, the reputation and experience of knowledge producers in hedonic products have stronger impacts on knowledge payment behaviors. [Limitations] We did not compare the knowledge payment behaviors in different cultures, and only studied the single knowledge payment business model with cross-section data. [Conclusions] This paper summarizes the key factors affecting knowledge payment behaviors and the information adoption model. It provides practical guidelines for designing and marketing knowledge payment products.

Select

Two-layer Transmission Model of WeChat Public Account with Bass Model and SIR Model

Yang Siluo, Xiao Aoxia

Data Analysis and Knowledge Discovery. 2021, 5(12): 74-87. https://doi.org/10.11925/infotech.2096-3467.2021.0402

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper constructs a double-layer transmission model for the content transmission of WeChat public accounts with the help of Bass model. [Methods] First, we analyzed the transmission process of the WeChat official account articles. Then, we developed a two-layer model combining the Bass diffusion model and the SIR model. Third, we conducted KS test using data from the public account of “Library and Information Conference”. Finally, we analyzed the parameters and the initial conditions of the model with Python. [Results] The new model simulated the transmission process of the public account contents. The probability of readers no longer sharing, as well as the non-subscribers’ exposure to information from other's sharing, have greater impacts on the dissemination of contents. [Limitations] This model did not include the complex network model for further analysis and did not study articles accessed by more than 100,000 times. [Conclusions] The proposed model could help us monitor the dissemination of WeChat public account contents and manage online opinion.

Select

Automatic Detection and Recognition of Oracle Rubbings Based on Mask R-CNN

Liu Fang, Li Huabiao, Ma Jin, Yan Sheng, Jin Peiran

Data Analysis and Knowledge Discovery. 2021, 5(12): 88-97. https://doi.org/10.11925/infotech.2096-3467.2021.0643

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper applies the deep learning algorithm to automatically detect and recognize Oracle rubbings, aiming to improve the research and promotion of traditional culture. [Methods] Based on the Mask R-CNN algorithm, we used the three-tuple loss function and rotation angle regression technique to optimize and improve the accuracy of Oracle character classification. [Results] We examined our model with training datasets of Oracle Rubbing Images. The recall of Oracle characters reached 82%, and the detection and identification accuracy reached 95%, which met the expectations of the project. [Limitations] For the severe damaged or ambiguous texts, the performance of our new algorithm needs to be improved. [Conclusions] The proposed model has many practical values and could be further polished.

Select

Wu Shengnan, Tian Ruonan, Pu Hongjun, Liang Wenqi, Zhang Yafei, Yu Qi, He Peifeng

Data Analysis and Knowledge Discovery. 2021, 5(12): 98-109. https://doi.org/10.11925/infotech.2096-3467.2021.0583

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a new knowledge discovery method for social media, aiming to predict the topic-related opportunities and emerging topics in medicine.[Methods] We developed a method combining the Co-LDA topic model and the link prediction algorithm to identify topic association opportunities. We examined the new model with data on diabetes drugs from social media. [Results] The AUC value of the prediction for the common network link without the right topics was higher than those with the right topics, while the Katz index is the optimal one. The future research on diabetes drugs is most likely to be related to the improvement of pharmacodynamic research and treatment plans. The development of the pharmaceutical industry and the new drug indications were related. [Limitations] We did not conduct multi-level analysis with emotional and time dimensions, and the new algorithm is very complex and did not perform well with poor network connectivity. [Conclusions] The proposed method could effectively predict the topic association opportunities in the field of medicine.

Select

Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning

Zhu Lu, Deng Fang, Liu Kun, He Tingting, Liu Yuanyuan

Data Analysis and Knowledge Discovery. 2021, 5(12): 110-122. https://doi.org/10.11925/infotech.2096-3467.2021.0604

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper uses semantic auto-encoder to examine the correlation between low-level features and high-level semantics, aiming to reduce the heterogeneous gap between different modal data. It also combines semantic auto-encoder and hash learning to improve the accuracy and speed of cross-modal retrieval. [Methods] First, we used the label information to learn the semantic joint representation of features and to construct an affine matrix. Then, we combined the auto-encoder with linear regression to learn hash function. Finally, we got the optimal hash code with the help of similarity metrics. [Results] We examined our method with three open datasets of WIKI, MIRFLICKR and NUS-WIDE for four different code lengths. The average MAP value obtained by our method is 0.1135, 0.0278 and 0.0505 higher than the best results of LSSH, FSH, ACQ, DBRC, SPDH, SePH and SMH. [Limitations] Our method is mainly applicable to the linear projection of multi-modal data. However, it fails to achieve good results for nonlinear issues. [Conclusions] The proposed method effectively improves the accuracy and speed of cross-modal retrieval tasks.

Select

Extracting Drama Terms with GCN Long-distance Constrain

Ren Qiutong, Wang Hao, Xiong Xin, Fan Tao

Data Analysis and Knowledge Discovery. 2021, 5(12): 123-136. https://doi.org/10.11925/infotech.2096-3467.2021.0359

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study proposes a new term extraction model for the intangible heritage (traditional drama), which also helps us construct a term database. [Methods] First, we analyzed the drama language characteristics from term category, semantic structure, and text length perspectives. Then, we added part of speech and domain features to the character representation obtained by the BERT-BiLSTM-CRF model. Finally, we incorporated the graph convolutional network (GCN) to the new model and captured the constraint relationship of the distant words. [Results] The F1 value of the proposed model reached 91.11%, which was 1.3 percentage points higher than the baseline BERT-BiLSTM-CRF model. [Limitations] We only retrieved the experimental data from Baidu Baike and the official website of Intangible Cultural Heritage, which should have included more free texts from other sources, more categories of drama terms, as well as the external features. [Conclusions] The proposed model and the database for traditional drama terms will help us construct the knowledge graph for traditional drama.

Select

Identifying Breakthrough Patent Topics by Measuring Technological Convergence——Case Study of Solar PV Domain

Han Fang, Zhang Shengtai, Feng Lingzi, Yuan Junpeng

Data Analysis and Knowledge Discovery. 2021, 5(12): 137-147. https://doi.org/10.11925/infotech.2096-3467.2021.0240

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to identify the breakthrough topics from the core patents. [Methods] First, we retrieved the core patents from the Innography platform. Then, we identified the breakthrough innovative topics based on the measurement of the core patents’ Rao-Stirling diversity indices as well as the LDA text mining method. Finally, we conducted an empirical study to examine the proposed method with patents from the solar PV domain. [Results] We found that the core patents were mainly related to the disciplines of optics, electricity, and architecture, etc. We also identified 12 breakthrough innovative topics related to photoelectric conversion material, photovoltaic application, and thermoelectric power system. [Limitations] More research is needed to explore the measurement of technological convergence using different patent classification methods. [Conclusions] The proposed method can effectively discover the breakthrough topics from a certain domain of patents.

Select

Evaluating SMEs-Supporting Policies During COVID-19 Pandemic with K-Means Clustering

Zhao Zheng, Huang Qianqian, Tong Nannan

Data Analysis and Knowledge Discovery. 2021, 5(12): 148-157. https://doi.org/10.11925/infotech.2096-3467.2020.0320

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to better understand the overall situation of the SMEs-supporting policies during the COVID-19 pandemic, aiming to promote the effective realization of policy objectives. [Methods] First, we collected the policy texts, relationship between corporate registration and investment, as well as the COVID-19 diagnosis data. Then, we calculated the number of policies issued by each province, the scores of the three major policy evaluation metrics, the degree of disaster, the industry structure and their economic ties with Hubei Province. Finally, we used the K-means clustering method to determine the degree of deviation from the enterprise policies in each province. [Results] The degree of deviation of the policies in Beijing, Shanghai, Fujian and other provinces is “Level Ⅰ”, while the degree of deviation in Hunan, Henan, and Yunnan is “Level Ⅲ”. Therefore, more SMEs-supporting policies need to be added in the “Level Ⅲ” provinces. [Conclusions] The proposed method could effectively evaluate the enterprise supporting policies in each Chinese province.

Please choose a citation manager

Content to export

25 December 2021, Volume 5 Issue 12

模态框（Modal）标题

Please choose a citation manager

Content to export

25 December 2021, Volume 5 Issue 12