Data Analysis and Knowledge Discovery

Select

Identifying Actionable Information from Online Reviews

Shang Lili, Tang Huayun, Wang Yanzhao, Zuo Meiyun

Data Analysis and Knowledge Discovery. 2022, 6(12): 1-12. https://doi.org/10.11925/infotech.2096-3467.2022.0109

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper explores methods automatically identifying actionable information from online reviews, aiming to help practitioners improve their follow-up work. [Methods] We defined our task as a sentence-level classification procedure, and proposed a span-based model (SAII). First, we encoded the input sentences based on BERT to generate token-level representation. Then, we enumerated all possible spans from the given sentences and generated informative representations with the help of attention mechanism. Third, we proposed a multi-channel filtering strategy to preserve spans close to the key element prototypes. Finally, we merged the refined span-level and context representations to predict actionable information. [Results] We examined the SAII model with two real-world datasets and found it yielded satisfactory results. Compared with the three best existing models, SAII’s F1 value increased by 7.91%/5.42%, 2.10%/2.73%, and 1.94%/1.46%. [Limitations] More research is needed to evaluate the effectiveness of our new model on multimodal datasets of different domains. [Conclusions] The SAII model could effectively identify actionable information from user-generated contents.

Select

Identifying Topic-Problem Instances Based on Syntactic Dependency Enhancement

Wang Lu, Le Xiaoqiu

Data Analysis and Knowledge Discovery. 2022, 6(12): 13-22. https://doi.org/10.11925/infotech.2096-3467.2022.0087

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to identify the defects, deficiencies, and difficulties of existing research on a given topic. [Methods] First, we transformed the topic-problem instance pair extraction to candidate phrase classification. Then, we extracted candidate phrases from the problem sentences, and constructed a syntactic dependency tree. Third, we built a syntactic dependency enhanced classification model based on BiGCN and Transformer interaction module, Fourth, we used this new model to identify the problem instances from the candidate phrases corresponding to a given topic. [Results] The proposed model effectively identified the problem instances and topic-problem instances. Its F1 value reached 83.7%, which is 2.8 percentage point higher than the baseline model. [Limitations] We did not examine the referential relationship between sentences, which may omit some problem instances and reduce the recall rates. [Conclusions] The proposed model could effectively identify the topic and problem instances.

Select

Detecting Sarcasm from Travel Reviews Based on Cross-Modal Deep Learning

Liu Yang, Ma Lili, Zhang Wen, Hu Zhongyi, Wu Jiang

Data Analysis and Knowledge Discovery. 2022, 6(12): 23-31. https://doi.org/10.11925/infotech.2096-3467.2022.0308

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] Based on the cross-modal deep learning method, this paper analyzes consumers’ sentiments in travel reviews and identifies their sarcastic expression. [Methods] First, we encoded multi-modal information. Then, we extracted the interaction information between texts and pictures with the graph neural network. Finally, we used the attention mechanism to identify multi-modal features and sarcasm. [Results] We examined the proposed model with travel reviews from Yelp. The accuracy of sarcasm detection reached 88.77%, which is better than the baseline models. [Limitations] We only examined the proposed model with reviews on Hilton hotels, which needs to be expanded in the future. [Conclusions] The proposed model could extract interaction information between different modal of data, that effectively improve the accuracy of sarcasm detection.

Select

Cross-Modal Rumor Detection Based on Adversarial Neural Network

Meng Jiana, Wang Xiaopei, Li Ting, Liu Shuang, Zhao Di

Data Analysis and Knowledge Discovery. 2022, 6(12): 32-42. https://doi.org/10.11925/infotech.2096-3467.2022.0064

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes an adversarial neural network model combining the text and image data, aiming to improve the effectiveness of rumor detection. [Methods] First, we integrated the self-attention mechanism with the Bi-directional Long Short-Term Memory network (BiLSTM) model to represent the text features. Then, we used the pre-trained VGG19 network model to represent the image features. Finally, we used the adversarial neural network to study the events’ common features. [Results] It is superior to the existing baseline models in terms of accuracy, precision, recall and F1 scores. The accuracy on Weibo and Twitter data sets is 3.6% and 3.5%, higher than the best result compared with the baseline models respectively. [Limitations] More research is needed to examine the feature association between the modal information, and bridge the semantic gap of cross-modal data. [Conclusions] The proposed model could more effectively learn feature representation and detect rumors.

Select

Influence of JIF and Journal Tier on Submission Behaviors in Different Countries——Based on Monthly Accepted Papers of NPG Journals

Li Li, Liao Yu, Yang Ming, Wang Minhao, Chen Fuyou, Shen Zhesi, Yang Liying

Data Analysis and Knowledge Discovery. 2022, 6(12): 43-52. https://doi.org/10.11925/infotech.2096-3467.2021.1346

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper mesasures the influence of Journal Impact Factor (JIF) and journal tier for scholars’ manuscript submission behaviors in different regions. [Methods] First, we collected a journal’s monthly aceptance record and its JIF from Nature Publishing Group (NPG). Then, we examined the submission behaviors with panel negative binomial regression against two journal ranking systems (JCR Quartiles and CAS journal ranking). [Results] JIF has a negative effect on the number of papers submitted by scholars from the United States, Germany, and Japan, i.e., the number of these papers decreases while the JIF increases. In contrast, the journal’s ranking in JCR had a positive effect on the number of accepted papers from all countries. The CAS journal ranking has a positive effect on the number of accepted submissions from China. After excluding the influence of international collaboration, there are still positive effects. The impact of the CAS journal ranking (27.2%) is much more significant than that of JCR journal partition table (3.7%). [Limitations] We can only access the authors’ data of accepted/published papers from WoS journals of NPG rather than the completed submission records. [Conclusions] The submission behaviors of scholars are influenced by their country of residence, independent research, and different journal ranking systems. The findings of this paper will enrich the study of factors influencing scholars’ submission behaviors across countries.

Select

Citation Characteristics of Scholarly Articles on South China Sea Based on Intentions and Sentiments

Qi Xiaoying, Li Xinwei, Yang Haiping

Data Analysis and Knowledge Discovery. 2022, 6(12): 53-69. https://doi.org/10.11925/infotech.2096-3467.2022.0708

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper analyzes the citation intentions and sentiments of academic papers on the South China Sea issue, aiming to reveal their citation behaviors and the emerging research ideas. [Methods] First, we used the feature-based SVM algorithm to automatically classify citations. Then, we built a citation classification knowledge map. Finally, we analyzed the papers’ citation characteristics from their overall distribution, national differences, and topics. [Results] The F1 macro average of the automatic classification model reached 0.75 and 0.72 on citation intentions and sentiments. Chinese scholars tend to defend China’s sovereignty over the South China Sea from a historical perspective. Their leading citating intentions are “technical basis” or “backgrounds”, while their citation sentiment is generally “neutral”. [Limitations] The corpus data size and comprehensiveness need to be expanded. [Conclusions] Chinese scholars should strengthen their arguments of the South China Sea issue from the legal perspectives.

Select

Extracting Entities from Intangible Cultural Heritage Texts Based on Machine Reading Comprehension

Fan Tao, Wang Hao, Zhang Wei, Li Xiaomin

Data Analysis and Knowledge Discovery. 2022, 6(12): 70-79. https://doi.org/10.11925/infotech.2096-3467.2022.0165

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a Question-Answering (QA) model based on machine reading comprehension (MRC) to extract entities from Intangible Cultural Heritage (ICH) texts. [Methods] First, we constructed an ICH entity sensitive attention mechanism, which captured the interaction between contexts and questions. The mechanism also helps our model focus on questions and related ICH entities. Then, we built the ICHQA model for entity extraction. [Results] We examined the ICHQA model with the ICH corpus. The ICHQA’s F1 value reached 87.139%, which was better than the existing models. We also performed ablation studies and visualized outputs of the ICHQA. [Limitations] More research is needed to examine the proposed model with other corpus from digital humanities. [Conclusions] The proposed model could effectively extract ICH entities.

Select

Identifying Influential Nodes in Social Networks by Overlapping Community Structure

Wang Yetong, Jiang Tao

Data Analysis and Knowledge Discovery. 2022, 6(12): 80-89. https://doi.org/10.11925/infotech.2096-3467.2022.0144

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes IMtoc model to maximize influences with the help of overlapping community structure, aiming to quickly identify the most influential nodes in the social networks. [Methods] First, we divided the whole social network into several overlapping communities. Then, we selected the candidates from the nodes with the largest feature vector centrality and the overlapping ones. Finally, we identified the optimal nodes from the candidates with greedy algorithm. [Results] We examined the proposed IMtoc algorithm model with the large social network Git_web_ml dataset. Its running speed was about 91% and 65% faster than the CELF and IMRank algorithms. [Limitations] There is a large overlap between the influential nodes and overlapping nodes. [Conclusions] The IMtoc algorithm could more effectively identify influencing nodes in social networks.

Select

Embedding Knowledge Graph with Negative Sampling and Joint Relational Contexts

Li Zhijie, Wang Rui, Li Changhua, Zhang Jie

Data Analysis and Knowledge Discovery. 2022, 6(12): 90-98. https://doi.org/10.11925/infotech.2096-3467.2022.0214

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a knowledge graph model based on negative sampling and joint relational contexts, aiming to improve the quality of current translation-based knowledge graph embedding models. [Methods] Firstly, we extracted the neighbors of the target instances from the original knowledge graph to generate the context vector. Then, we decided the properties of adjacent relations, which also provided information on the nature or type of a given entity. Third, we used the Concat function to aggregate contexts of the given entities of negative sampling and determined the entity attributes to be replaced. Finally, we adopted the triple embedding of the TransE model to generate negative triples, and improved the similarities of positive and negative triples. [Results] We examined the proposed model with data sets of FB15K-237 and WN18RR. The entity link was 18.3% and 29.2% higher than those of the benchmark model. Meantime, the relationship link was 0.7% better than the optimal result of the benchmark model. [Limitations] Our model only included the semantics of the relational contexts, which is very hard to determine their relative positions. [Conclusions] The proposed sampling strategy effectively improves the quality of negative triples, as well as the accuracy of knowledge graph.

Select

Identifying Untrusted Weibo Users Based on Improved Dempster-Shafer Evidence Theory

Xu Jianmin, Wang Kailin, Wu Shufang

Data Analysis and Knowledge Discovery. 2022, 6(12): 99-112. https://doi.org/10.11925/infotech.2096-3467.2022.0127

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper modifies the Dempster-Shafer evidence theory, aiming to identify untrusted Sina Weibo (Microblog) users with subjective uncertainties. [Methods] Firstly, we used the evidence distance to improve the original Dempster-Shafer evidence theory. Then, we transformed the credibility of historical posts into evidence, which was also merged to generate users’ trust interval. Finally, we identified untrusted users with the Decision Tree algorithm and the trust interval. [Results] Compared with the existing methods, our new model reduced the processing time by 287.4 seconds, increased the $F 1$ value by 31.9 percentage point, and received an optimal Chi-Square value of the consistency test. [Limitations] We only investigated the subjective uncertainties due to time decay and evidence conflict, and need to add the impacts of cognitive differences on subjective degrees. [Conclusions] The proposed method could effectively identify untrusted users from Sina Weibo.

Select

Calculating Case Similarity with Heterogeneous Property Graph

Cheng Ge, Wang Shuo, Liao Yongan, Zhang Dongliang

Data Analysis and Knowledge Discovery. 2022, 6(12): 113-122. https://doi.org/10.11925/infotech.2096-3467.2022.0303

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes an algorithm to decide judicial case similarity with heterogeneous property graphs, aiming to improve the speed and precision of case similarity comparison. [Methods] First, we constructed a heterogeneous graph for legal case properties based on their contents and other related information. Then, we transformed the text similarity to graph similarity, and combined graph attention network with neighborhood node consensus matching method. Finally, the proposed model learned the local and global information of the heterogeneous property graphs, and calculated the similarity of cases. [Results] We examined the new model on the data set for similar case matching from CAIL 2019. Our model’s performance is better than other popular methods and only required 1.02% of the latter’s FLOPs. [Limitations] The precision of our model is positively correlated with the property graph’s complexity. However, the graph constructed by offline method will not increase the algorithm’s complexity. [Conclusions] The proposed model could effectively improve the speed and precision of similarity comparison for legal cases.

Select

Predicting Stock Prices Based on RoBERTa-TCN and Sentimental Characteristics

Yan Dongmei, He Wenxin, Chen Zhi

Data Analysis and Knowledge Discovery. 2022, 6(12): 123-134. https://doi.org/10.11925/infotech.2096-3467.2022.0106

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] Thisf paper aims to improve the prediction of stock prices with the help of investors’sentimental characteristics. [Methods] First, we constructed investors’ sentimental characteristics with the RoBERTa model and extracted the stock price characteristics with the TCN network. Then, we used the attention mechanism to merge these characteristics. Finally, we constructed the new RoBERTa-TCN model for stock price prediction. [Results] Compared with the experimental results of three models LSTM, GRU and TCN on six stock datasets, RoBERTa-TCN model has an average improvement of about 0.4906 on four different evaluation indicators. [Limitations] We did not examine the impacts of trading dates on the stock prices. [Conclusions] The RoBERTa-TCN model could effectively predict stock prices.

Please choose a citation manager

Content to export

25 December 2022, Volume 6 Issue 12

模态框（Modal）标题

Please choose a citation manager

Content to export

25 December 2022, Volume 6 Issue 12