Data Analysis and Knowledge Discovery

Select

Identifying Moves of Research Abstracts with Deep Learning Methods

Zhixiong Zhang,Huan Liu,Liangping Ding,Pengmin Wu,Gaihong Yu

Data Analysis and Knowledge Discovery. 2019, 3(12): 1-9. https://doi.org/10.11925/infotech.2096-3467.2019.0266

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper compares the performance of move recognition methods with different deep learning algorithms. [Methods] Firstly, we built a large training corpus. Then, we used the traditional machine learning method SVM as a benchmark, and developed four moves recognition models based on DNN, LSTM, Attention-BiLSTM and LSTM. Finally, we conducted two rounds of experiments with sample size of 10,000 and 50,000. [Results] Attention-BiLSTM method achieved the best results in both experiments over the four methods (F1=0.9375 with the larger sample). SVM method outperformed DNN and LSTM in both experiments. While changing sample size from 10,000 to 50,000, SVM received the least increase of F1 score (0.0125), and LSTM had the largest increase of F1 score (0.1125). [Limitations] There is no universal test corpus for similar research. Therefore, our results could not be compared with the results of other studies. [Conclusions] The bi-directional LSTM network structure and attention mechanism can significantly improve the performance of move recognition. The deep learning methods work better with larger sample size.

Select

Interactive Behaviors of Online Health Community Users in Emergency

Zhe Hu,Xianjin Zha,Yalan Yan

Data Analysis and Knowledge Discovery. 2019, 3(12): 10-20. https://doi.org/10.11925/infotech.2096-3467.2018.1427

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study explores the interactive behaviors of online health community users in emergency. [Methods] Firstly, we constructed a directed matrix based on the posts and replies and illustrated the structure of this interactive network. Then, we conducted a small-world, and correlation analysis for the centrality index, structural hole index and user interactive behaviors respectively. [Results] The whole network had a small-world effect. The eigenvector centrality had significant positive correlation with the number of posts and the degree centrality had significant positive correlation with the number of replies. Meanwhile, structural holes had significant positive correlations with the number of replies. [Limitations] The data types were not diversified. [Conclusions] This study provides useful references and guidelines for the development of online health communities.

Select

Classifying Short Texts with Improved-Attention Based Bidirectional Long Memory Network

Zhiyong Tao,Xiaobing Li,Ying Liu,Xiaofang Liu

Data Analysis and Knowledge Discovery. 2019, 3(12): 21-29. https://doi.org/10.11925/infotech.2096-3467.2019.0267

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a new model based on bidirectional long-short term memory network with improved attention, aiming to address the issues facing short texts classification. [Methods] First, we used the pre-trained word vectors to digitize the original texts. Then, we extracted their semantic features with bidirectional long-short term memory network. Third, we calculated their global attention scores with the fused forward and reverse features in the improved attention layer. Finally, we obtained short texts vector representation with deep semantic features. [Results] We used Softmax to create the sample label. Compared with the traditional CNN, LSTM and BLSTM networks, the proposed model improved the classification accuracy up to 19.1%. [Limitations] The performance of our new model on long texts is not satisfactory. [Conclusions] The proposed model could effectively classify short texts.

Select

Modeling Users with Word Vector and Term-Graph Algorithm

Hui Nie

Data Analysis and Knowledge Discovery. 2019, 3(12): 30-40. https://doi.org/10.11925/infotech.2096-3467.2019.0494

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a review-based user modeling method, aiming to improve the personalized information pushing services. [Methods] Firstly, we identified product feature-specific terms from reviews with the help of pre-trained word embedding model. Then, we built a term-specific graph based on semantic correlation among feature-specific words. Finally, we used the TextRank algorithm to compute user’s interest in product features, and model their preferences for products. [Results] User model generated by our new algorithm was consistent with the manually created ones (with nearly 90% semantic correlation). Our F1-score was 0.55, better than those of the classic TF-based word bag models. [Limitations] More manually labeled data and research is needed to improve the domain-specific analysis. [Conclusions] The proposed model helps us better analyze online reviews and develop new application for recommendation system.

Select

Discovering City Profile Based on Tag Semantic Mining

Chongwu Bi,Guanghui Ye,Mingqian Li,Jieyan Zeng

Data Analysis and Knowledge Discovery. 2019, 3(12): 41-51. https://doi.org/10.11925/infotech.2096-3467.2019.0502

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This research proposes a method to discover city profile based on semantic mining, aiming to obtain public awareness of the city. [Methods] Firstly, we generated a description framework of the city profile with hierarchical structure based on tags similarity and agglomerative hierarchical clustering. Secondly, we calculated the importance of social tags to reveal semantic features of cities based on latent semantic mining. Finally, we filtered out social tags with high explanation degree of city profile, and integrated them with the description framework of city profile to establish the hierarchical structure. [Results] With users’ reviews from Zhihu, we established the structural city profiles of six provincial capitals from central China, which identified the public perception of these cities. [Limitations] More research is needed to extract high-quality social tags automatically and generate a better description framework for the city profiles. [Conclusions] The proposed method could extract city profiles from massive social tags and develop fine-grained descriptions.

Select

Analyzing Sci-Tech Topics Based on Semantic Representation of Patent References

Jinzhu Zhang,Yue Wang,Yiming Hu

Data Analysis and Knowledge Discovery. 2019, 3(12): 52-60. https://doi.org/10.11925/infotech.2096-3467.2019.0554

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper explores the content mining method for scientific references in patent (SRP) based on text semantic representation. It also improves the accuracy, comprehensiveness and interpretability of knowledge flow analysis. [Methods] Firstly, we extracted keywords and abstracts from patents to represent the SRPs and created vectors for these items. Then, we computed the distance between vectors to calculate their semantic similarities. Finally, we obtained and mapped the topics of patents and SRP contents from the field of nanotechnology. [Results] We found our method could map relationship among sci-tech topics from the content perspective effectively. [Limitations] We only conducted exploratory research with abstracts and keywords rather than full texts. [Conclusions] The proposed method improves the knowledge flow analysis of patents.

Select

Identifying Entities of Online Questions from Cancer Patients Based on Transfer Learning

Meishan Chen,Chenxi Xia

Data Analysis and Knowledge Discovery. 2019, 3(12): 61-69. https://doi.org/10.11925/infotech.2096-3467.2019.0684

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study utilizes annotated corpus with a pre-trained model, aiming to identify entities from corpus of limited annotation. [Methods] First, we collected online questions from patients with lung or liver cancers. Then we developed a KNN-BERT-BiLSTM-CRF framework combining instance and parameter transfer, which recognized named entities with small amount of labeled data. [Results] When the k value of instance-transfer was set to 3, we achieved the best performance of named entity recognition. Its F value was 96.10%, which was 1.98% higher than the performance of models with no instance-transfer techniques. [Limitations] The proposed method needs to be examined with entities of other diseases. [Conclusions] The cross-domain transfer learning method could improve the performance of entity identification.

Select

Predicting Stroke Risks with Neural Network

Juhua Wu,Shuo Zhang,Lei Tao,Shunjun Jiang

Data Analysis and Knowledge Discovery. 2019, 3(12): 70-75. https://doi.org/10.11925/infotech.2096-3467.2019.0691

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to effectively predict stroke risks, aiming to improve the diagnoses, treatments and interventions of stroke. [Methods] Firstly, we collected about 6000 inpatient medical records from a top hospital. Then, we identified 12 risk factors affecting stroke with logistic regression modeling. Thirdly, we constructed a multi-layer neural network model to predict stroke risks. Finally, we implemented the model with Python to examine its effectiveness. [Results] I. Total cholesterol and low-density lipoprotein etc. are the most important risk factors affecting the onset of stroke. II. When the number of hidden layer neurons was 7, the risk prediction model accuracy reached 97.10%.[Limitations] We need to include more risk factors and use multiple machine learning models for comparative analyses. [Conclusion] The proposed model could effectively predict the stoke risks facing patients.

Select

Choosing Portfolios Based on Bipartite Graph of P2P Lending Networks

Yong Ding,Lu Cheng,Cuiqing Jiang

Data Analysis and Knowledge Discovery. 2019, 3(12): 76-83. https://doi.org/10.11925/infotech.2096-3467.2019.0357

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a method based on recommendation algorithm, portfolio theory and the actual data of China’s online lending market, aiming to help investors make better decisions. [Methods] We collected data from Renren’s Loan Transaction and constructed a bipartite graph network graph for the P2P scenario. Then, we used the recommendation algorithm and Markowitz portfolio theory to choose the investment products. [Results] Under different K values, the accuracy of the improved bipartite graph recommendation algorithm with simple weight were 0.055, 0.044, 0.039, 0.035, 0.036 and 0.032. These results were higher than those of the user-based collaborative filtering algorithms UCF (0.022, 0.019, 0.032, 0.032, 0.033, 0.034) and item-based collaborative filtering algorithms ICF (0.007, 0.013, 0.014, 0.014, 0.014, 0.014). The recall rate was also higher than those of the other two algorithms. [Limitations] The sample dataset needs to be expanded. [Conclusions] Combining recommendation algorithm with group theory could find portfolios with better return of investments.

Select

Investing Behaviors of Core Communities in Venture Capital Network

Pingnan Ruan,Qianying Wang,Juan Yang,Yunfeng Wei

Data Analysis and Knowledge Discovery. 2019, 3(12): 84-92. https://doi.org/10.11925/infotech.2096-3467.2019.0713

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper analyzes the investing behaviors of core communities, aiming to help venture capital institutions choose syndicate partners.[Methods] First, we collected events of venture capital investments in China from 2006 to 2017. Then, we used R to extract syndicate matrix and constructed the venture capital network. Finally, we identified the needed communities with Louvain algorithm and the core community structure coefficient. [Results] Various core communities were different in investing industries, areas and stages. Members of the core community increasingly invested in information services and cultural education industries from the developed regions at the initial stage. [Limitations] The proposed network was built according to the syndication, which did include the relationship between leading and following investments. [Conclusions] Identifying the core communities will help us understand the changing behaviors of the community’s investments.

Select

Computing Text Semantic Similarity with Syntactic Network of Co-occurrence Distance

Jiao Yan,Jing Ma,Kang Fang

Data Analysis and Knowledge Discovery. 2019, 3(12): 93-100. https://doi.org/10.11925/infotech.2096-3467.2019.0737

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to break through the limitations of existing methods for text similarity calculation by synthesizing multiple text information features such as semantics, syntax and word frequency. [Methods] First, we constructed the text complex network, combining co-occurrence distance and dependency syntax. Then, we used information entropy to determine the weights of dynamics characteristics. Finally, we utilized word embedding, syntactic structure and inverted file information to avoid the loss of word structure and semantics. [Results] Compared with the syntactic network + TF-IDF algorithm, the F₁ value of the proposed algorithm increased up to 12.1%. The result was 5.8% higher than that of the co-occurrence network + semantic method. The average values of F₁ were 5.8% and 1.6% better than those of the existing methods. [Limitations] The selection of relevant indicators in feature extraction needs to be further improved, which address the importance of nodes more comprehensively. [Conclusions] Compared with the traditional methods, the proposed model could reduce the loss of text information and improve the accuracy of calculating text similarity effectively.

Select

Screening Critical Patients with Optimized Classifier Based on Multi Objective Quantum

Jing Li,Shuxiao Pan,Xueyan Li,Lijing Jia,Yuzhuo Zhao

Data Analysis and Knowledge Discovery. 2019, 3(12): 101-112. https://doi.org/10.11925/infotech.2096-3467.2019.0776

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study tries to improve the identification of emergency patients’ critical indicators. [Methods] First, we selected a swarm algorithm with multi-objective particle and quantum behaviors. Then, we combined this algorithm with machine learning classifier to propose a new method for screening the needed indicators. Finally, we compared the new method with two existing ones. [Results] The proposed method increased search scope and reduced data dimensionality, which help us obtain indicators of clinical significance. [Limitations] The calculation of indicators’ importance needs to be optimized with recursive method. [Conclusions] The proposed method could improve the recognition rates of critical patients.

Please choose a citation manager

Content to export

25 December 2019, Volume 3 Issue 12

模态框（Modal）标题

Please choose a citation manager

Content to export

25 December 2019, Volume 3 Issue 12