Data Analysis and Knowledge Discovery

Select

Review of Structural Diversity Studies on Social Networks

Lu Yingjie, Zhang Yinglong

Data Analysis and Knowledge Discovery. 2022, 6(8): 1-11. https://doi.org/10.11925/infotech.2096-3467.2021.1358

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper reviews the latest developments of the structural diversity studies on social networks and discusses their future directions. [Coverage] We searched the Web of Science, Microsoft Academic, DBLP, CNKI, Wanfang Data and VIP with “Structural Diversity”, “Structural Diversity and Social Networks ”. A total of 55 representative and related literature published from April 2012 to April 2022 were retrieved. [Methods] First, we traced to the source of structural diversity studies and analyzed their existing issues. Then, we examined the structural diversity research from three perspectives: model improvements, efficient algorithms, and practical applications. Finally, we discussed future works. [Results] Structural diversity is a model based on network topology features, which studies factors affecting individuals’ major decision makings. The original model has the bad universality and low precision issues. Combined with graph mining technology, the new model performs well and has been applied in many fields. [Limitations] We only summarized research on structural diversity and did not compare them with other social contagion theories. [Conclusions] Graph mining algorithm could improve the structural diversity model in group division. Structural diversity is an indicator for finding highly influential nodes and required by efficient search algorithms. Structural diversity has been applied in the fields of behavior and link predictions. Features optimizing this model merit more research to evaluated their performance.

Select

Review of “Obliteration by Incorporation” in Knowledge Diffusion

Lyu Haihua, Li Jiang

Data Analysis and Knowledge Discovery. 2022, 6(8): 12-19. https://doi.org/10.11925/infotech.2096-3467.2022.0099

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper reviews the development of research on “obliteration by incorporation”, which re-examines the essence of “citation”. [Coverage] We searched “obliteration by incorporation”, “OBI”, “knowledge diffusion” and “knowledge obsolescence” with Web of Science, Google Scholar and CNKI, and retrieved a total of 72 representative literature. [Methods] We examined the phenomenon of “obliteration by incorporation” from the perspectives of knowledge production, knowledge diffusion and knowledge obsolescence. [Results] The knowledge contribution and academic value of the obliteration by incorporation cannot be reflected by citations, which did not adequately reflect knowledge diffusion. [Limitations] More research is needed to explore the measurements of the OBI phenomenon. [Conclusions] The phenomenon of “obliteration by incorporation” was observed half a century ago but did not receive sufficient attention. The scientometrics community should re-examine the essence of citations based on the phenomenon of “obliteration by incorporation”.

Select

Influencing Factors of Patent Examination Cycle: Case Study of Artificial Intelligence in China

Ou Guiyan, Pang Na, Wu Jiang

Data Analysis and Knowledge Discovery. 2022, 6(8): 20-30. https://doi.org/10.11925/infotech.2096-3467.2021.1233

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper examines the factors and mechanism affecting the patent examination cycle in China. [Methods] We retrieved 78 254 invention patent applications in the field of artificial intelligence in China. Then, we used the Kaplan-Meier method in survival analysis and the COX proportional hazard regression model to explore the overview of patent examination. Third, we analyzed the characteristics of patent objects and subjects based on their characteristics, which explored the factors significantly affecting the patent examination cycles. [Results] In the field of artificial intelligence, the average survival period of the overall Chinese invention patent examination process was 32.81 months. The number of claims, the number of IPC classification IDs, and the number of inventors were the protective factors of the patent examination cycle and promoted its extension. The more patent citations, the shorter the time will be required to obtain authorization. Universities and scientific research institutions, as well as institutions and organizations, spent shorter time on patent examination than individuals. Patent applications from companies required longer examination cycles. [Limitations] The patent examination cycle is closely related to the examination process of the patent office and the examiners’ characteristics, which needs more fine-grained studies. [Conclusions] Combining different technical fields and the characteristics of the applicants will establish a diversified examination mode. Strengthening the use of automated technology and establishing better classification standards will improve the patent examination efficiency.

Select

Discovering Technology Opportunities with Causal Knowledge: Case Study of EV Charging Stations

Liu Linlin, Gong Daqing, Zhang Yujie, Bai Rujiang

Data Analysis and Knowledge Discovery. 2022, 6(8): 31-40. https://doi.org/10.11925/infotech.2096-3467.2021.1042

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a new method to identify technology opportunities from documents with the help of causal knowledge. [Methods] The proposed method includes three steps of automatic extraction of causal pairs, construction of causal network and discovery of matching tech-opportunities. Firstly, we used the rule matching method to automatically extract the causal pairs from multi-source data based on causal trigger words and rule templates. We also represented these pairs by triple structure. Then, we constructed the causal network including technical elements and found the demand factors in the process of use. Finally, we completed the potential causal correlation with the link prediction of causal network, which was matched with user demand factors and helped us discover tech-opportunities. [Results] We examined the proposed model with charging stations data of the EVs. We found the battery performance and charging costs are the key factors to improve technical performance and user experience. The GraphSAGE algorithm can more accurately predict the edge connection than Node2Vec, which effectively identify the potential technical opportunities. [Limitations] The accuracy of the proposed method needs to be improved. [Conclusions] The proposed method could effectively discover sci-tech innovation opportunities, as well as potential uncertain issues, which provides reference for further technology optimization and industry upgrading.

Select

Identifying Relationship of Chinese Characters with Attention Mechanism and Convolutional Neural Network

Zhao Pengwu, Li Zhiyi, Lin Xiaoqi

Data Analysis and Knowledge Discovery. 2022, 6(8): 41-51. https://doi.org/10.11925/infotech.2096-3467.2021.1079

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] The paper tries to identify the features and relationship of dynamic semantic information from the Chinese character entities. [Methods] First, we used the attention mechanism and improved convolution neural network model to automatically extract features from the training data of public corpus with character entity relationship. Then, we compared our model’s performance with the existing ones from the perspectives of entity relationship recognition efficiency, as well as entity relationship extraction effects and efficiency. [Results] The performance of CNN+Attention model is better than those of the SVM, LR, LSTM, BiLSTM and CNN model in prediction accuracy. Our new model is 0.92% higher in accuracy, 0.80% higher in recall and 0.86% higher in F1 value than the BiLSTM model with relatively better extraction effect. [Limitations] We need to examine our model with more sample data sets. [Conclusions] The proposed model could effectively improve the accuracy and recall of entity relationship extraction for Chinese characters.

Select

Crawler with Dynamic Thesaurus and Improved Shark-Search Algorithm：Case Study of Military Equipment

Ding Shengchun, Liu Kai, Fang Zhen

Data Analysis and Knowledge Discovery. 2022, 6(8): 52-60. https://doi.org/10.11925/infotech.2096-3467.2021.1125

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to address the issues facing traditional theme crawlers, such as low indexing rates and insufficient theme relevance. [Methods] We proposed a Two-step Dynamic Shark-Search (TDSS) algorithm based on Shark-Search, which divided the topic relevance calculation into the relevance of hyperlink and webpage topics. Then, we added new keywords extracted from topic-related pages to the established topic thesaurus, which improved the effectiveness of topic judgment. [Results] The TDSS crawler’s accuracy and indexing efficiency were 14.2% and 35% higher than the comparable algorithms in the same experiment environment. [Limitations] More research is needed to increase the clawer’s accuracy with excessive topic words. [Conclusions] The proposed algorithm could effectively improve the accuracy of topic information and retrieve more topic-related webpages.

Select

Evaluating Trust of Accommodation Sharing with Feature Grouping and Combination

Lv Wanying, Zhao Jie, Huang Liushen, Dong Zhenning, Liang Zhouyang

Data Analysis and Knowledge Discovery. 2022, 6(8): 61-74. https://doi.org/10.11925/infotech.2096-3467.2021.1153

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to improve trust evaluation with feature grouping and combination. The former provides replaceable features while the latter effectively reduces the feature dimensions. [Methods] First, we used the Markov Blanket to analyze the relationship among features based on their abilities to group similar features. Then, we searched within and among groups to combine features with the RVNS methods. [Results] In the case of missing features, the proposed model could effectively provide substitutes for the missing features and yielded stable trust evaluation results. The dimension of features was reduced to 1.7%, and the average accuracy of trust evaluation was above 92%. [Limitations] More research is needed to more effectively extract knowledge from dataset with missing values. [Conclusions] The proposed model could effectively address the missing data in trust evaluation.

Select

Algorithm for Entity Coreference Resolution with Neural Network and Global Reasoning

Zhou Ning, Jin Gaoya, Shi Wenqian

Data Analysis and Knowledge Discovery. 2022, 6(8): 75-83. https://doi.org/10.11925/infotech.2096-3467.2021.1162

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a model for entity coreference resolution, which integrates neural network and global reasoning. It tries to address the issues of complex entity information in the text as well as the ambiguity and sparse distribution of referential information. [Methods] First, we used the neural network model to extract the entities and their antecedents from the documents. Then, we combined the context information of the sentence to perform global reasoning. Finally, we added the reasoning results to the neural network model to improve the accuracy of entity coreference resolution. [Results] We examined our new model on the OntoNotes 5.0 dataset. The new model’s F1 score reached 74.76% under the CoNLL evaluation standard. [Limitations] More precise knowledge reasoning needs to be added. [Conclusions] Compared with the existing models, the proposed algorithm improves the coreference resolution performance and better understand text semantic information.

Select

IMTS: Detecting Fake Reviews with Image and Text Semantics

Shi Yunmei, Yuan Bo, Zhang Le, Lv Xueqiang

Data Analysis and Knowledge Discovery. 2022, 6(8): 84-96. https://doi.org/10.11925/infotech.2096-3467.2021.1245

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a fake comment detection method (IMTS) integrating image information and text semantics for Chinese e-commerce websites, aiming to address the proliferation of fake comments posted by “Internet Water Army”. [Methods] First, we used the text convolutional neural network (TextCNN) and the BERT pre-training model to extract features of the text review information, and obtained the corresponding feature vectors. Then, we integrated the reviewer features to enhance the model’s capture of the overall semantic information by splicing the review text semantics and the output features of the reviewer ID. Third, we used the Residual Network (ResNet) to extract features from pictures posted by users in comments to obtain corresponding visual features. Finally, we conducted multimodal fusion of text features and visual features to detect the fake comments. [Results] The IMTS method achieved 96.36% accuracy, 96.35% recall and 96.35% F1 value on the self-built multimodal Chinese fake comment dataset. [Limitations] The dataset in this paper was small in scale, and the BERT pre-training model was used in the text processing stage. [Conclusions] The proposed method could effectively improve the overall detection accuracy of fake comments.

Select

Early-warning Model for Undergraduate Public Opinion with Dynamic Evolution

Li Chuan, Zhu Xuefang, Fu Ziyuan

Data Analysis and Knowledge Discovery. 2022, 6(8): 97-109. https://doi.org/10.11925/infotech.2096-3467.2021.1266

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to examine the social evolution analysis and System Dynamics for early warning strategies of undergraduates’ public opinion administration. [Methods] We conducted system analysis for public opinion based on user behavior theory. We analyzed the mechanism among undergraduates, official institutions, Internet environment, public opinion elements and social media with System Dynamics (SD). Finally, we built a new SD model for the early warning system of public opinion. [Results] We evaluated our model with three simulation experiments. The influence range of control elements was verified, while the control effect of credibility was falsified. Compared with other fuzzy cognitive models, our algorithm’s ACR increased by 1.4% and the CPT reduced by 50%. [Limitations] Extracting related factors depends on the research object and environmental evolution, and our model needs to be continuously optimized in the future. [Conclusions] The proposed model creates an early warning mechanism for public opinion from the undergraduate communities.

Select

Text Semantic Representation with Structure-Function and Entity Recognition: Case Study of Medical Records

Hu Jiming, Qian Wei, Wen Peng, Lv Xiaoguang

Data Analysis and Knowledge Discovery. 2022, 6(8): 110-121. https://doi.org/10.11925/infotech.2096-3467.2021.1167

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to improve the accuracy of text representation and mining, with the help of structural and functional information from Chinese medical records. [Methods] First, we proposed a new semantic representation strategy for the texts of Chinese medical records based on their structure-function features. Then, we used the BiLSTM-CRF model to recognize named entities, which introduced structure information at the word vector level. Finally, we utilized the TextCNN model to extract local context features, which helped us obtain a vector representation with richer text semantic connotations. [Results] The precision, recall and F values of the new model reached 93.20%, 95.19% and 94.19% respectively, while the classification accuracy rate reached 92.12%. [Limitations] Future research is needed to evaluate our model with more texts and refine the structure recognition process. [Conclusions] The proposed method could effectively improve the accuracy of named entity recognition, and enrich the semantic connotation and representation of the texts.

Select

Predicting Major Infectious Diseases Based on Grey Wolf Optimization and Multi-machine Learning: Case Study of COVID-19

Qu Zongxi, Sha Yongzhong, Li Yutong

Data Analysis and Knowledge Discovery. 2022, 6(8): 122-133. https://doi.org/10.11925/infotech.2096-3467.2021.1269

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to build an accurate and effective forecasting model for major infectious diseases based on multi-machine learning, aiming to predict outbreak trends and help formulate countermeasures in advance. [Methods] We established an ensemble prediction model with three machine learning optimal weight combinations of ANFIS, LSSVM and LSTM from the Gray Wolf Optimization algorithm. Then, we assessed the model’s prediction performance with the COVID-19 epidemic data. [Results] The ANFIS, LSSVM, and LSTM were suitable for predicting confirmed cases, death cases, and recovery cases. The average R² of the proposed model reached 0.989, 0.993 and 0.987for the three scenarios. The average RMSE were 37.37%, 63.93% and 53.37% lower than the single model, respectively. [Limitations] The model needs to be examined with data sets on other major infectious diseases. [Conclusions] The ensemble prediction model based on Gray Wolf Optimization can effectively merge the advantages of multiple machine learning models to obtain stable and accurate results.

Please choose a citation manager

Content to export

25 August 2022, Volume 6 Issue 8

模态框（Modal）标题

Please choose a citation manager

Content to export

25 August 2022, Volume 6 Issue 8