Home Table of Contents

25 December 2018, Volume 2 Issue 12
    

  • Select all
    |
  • Li Dong,Tong Shouchuan,Li Jiang
    Data Analysis and Knowledge Discovery. 2018, 2(12): 1-11. https://doi.org/10.11925/infotech.2096-3467.2018.0452
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper explores the relationship between the scientists’ interdisciplinary knowledge and their academic impacts. [Methods] First, we collected 200 candidates from the 2016 National Natural Science Foundation Outstanding Youth Program and their articles indexed by the Web of Science. Then, we retrieved interdisciplinary co-authorship and citation data. Third, we used Brillouin’s index as a measure of interdisciplinarity and h index as a measure of academic influence. Finally, we calculated the correlation coefficients between interdisciplinarity and academic influence. [Results] We found no significant correlation between inter-disciplinary collaboration and academic influence except for the field of biology, and no significant correlation between interdisciplinary citations and academic influence except the areas of medicine or biology. [Limitations] Deciding a scientist’s discipline based on his/her affiliation might be biased. [Conclusions] A scientist’s interdisciplinary collaborations and citations are not necessarily correlated to his/her academic influence.

  • Li He,Zhu Linlin,Yan Min,Liu Jincheng,Hong Chuang
    Data Analysis and Knowledge Discovery. 2018, 2(12): 12-22. https://doi.org/10.11925/infotech.2096-3467.2018.0393
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] The paper aims to identify useful message from open innovation community with numerous redundant and low quality information. [Methods] First, we retrieved 23,137 users’ comments on programming bugs from the official Xiaomi MIUI Forum based on the information adoption model. Then, we applied binary logistic regression method to explore factors affecting the usefulness of these comments. [Results] The timeliness of information had positive impact on their usefulness, the integrity of information also affected their usefulness, and the semantics of information had negative effects on their usefulness. The users’ previous experience did not influence the usefulness of information. However, users’ previous contribution had positive effects on the usefulness of information. [Limitations] The research data was collected from small portion of one community, which might yield biased results. [Conclusions] This paper could help us effectively identify usefulness information from open innovation communities.

  • Cheng Yong,Xu Dekuan,Lv Xueqiang
    Data Analysis and Knowledge Discovery. 2018, 2(12): 23-32. https://doi.org/10.11925/infotech.2096-3467.2018.0583
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper aims to help computer answer questions accurately based on text comprehension. [Methods] First, we proposed a neural network model based on hirrarchical interaction mechanism. We introduced various human thinking mechanism to build this model, which contained hierarchical processing, content filtering and multi-dimensional attention. Then, we ran the proposed model with dataset from Chinese Machine Reading Comprehension (CMRC) 2017. [Results] The precision of the proposed method on test-set was 0.78, which was better than the best result of other published models. [Limitations] There was no further optimization for the potential answers. [Conclusions] The proposed hierarchical interactive network improves machine’s ability to answer questions based on text comprehension.

  • Yu Chuanming,Gong Yutian,Wang Feng,An Lu
    Data Analysis and Knowledge Discovery. 2018, 2(12): 33-42. https://doi.org/10.11925/infotech.2096-3467.2018.0420
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper tries to predict stock price fluctuation with the help of big data, aiming to improve the accuracy of the forecasting and reduce the trading risks. [Methods] We proposed a new Text and Price Combined Model (TPCM) to process comments retrieved from a stock forum. Then, we employed deep representation learning algorithm to generate text feature matrix and utilized the K-means clustering method to generate text category. Finally, we used the Multi-Layer Perceptron (MLP) to predict stock price fluctuation based on the opening price, closing price and other 15 original price indicators. [Results] The accuracy of TPCM was 65.91%, which was 7.76% higher than that of the model (58.15%) employing price features only, and 11.37% higher than that of the model (54.54%) employing text features only. [Limitations] The study only used one stock to examine the proposed model. [Conclusions] Stock price forecasting could be improved through the combination of text and price, which creates novel perspectives for future studies.

  • Liu Ping,Li Yanan,Yu Cong
    Data Analysis and Knowledge Discovery. 2018, 2(12): 43-51. https://doi.org/10.11925/infotech.2096-3467.2018.0419
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper presents an approach to construct interactive knowledge map that facilitates browsing and keyword searching. [Methods] Firstly, we modeled academic resources to reveal the implicit knowledge nodes and their complex relationship. Then, we built the interactive knowledge map based on user queries, which suggested associated terms and presented results in lattice. [Results] We examined the proposed method with documents from Proceedings of the International ACM SIGIR Conference in recent 10 years. We discovered hidden knowledge structure helping users locate core concepts and improve searching. [Limitations] The recommendation of relevant concepts needs to be improved. [Conclusions] The proposed interactive knowledge map help users effectively explore the information space.

  • Cheng Xiufeng,Zhang Xinyi,Wang Ning
    Data Analysis and Knowledge Discovery. 2018, 2(12): 52-59. https://doi.org/10.11925/infotech.2096-3467.2018.0415
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper tries to identify the trending topics, aiming to help the decision-making agencies manage online public opinion. [Methods] Firstly, we proposed the criteria to detect the trending topics of Q&A community. Then, we conducted an empirical study on China’s Zhihu Q&A community using the CART decision tree algorithm. [Results] The CART decision tree predicted the trending topics. [Limitations] We only collected data from a small portion of all topics on Zhihu. More data is needed for future studies. [Conclusions] The proposed method based on the CART decision tree algorithm could effectively predict trending topics in the Q&A community, which help us choose popular contents.

  • Chen Fen,Fu Xi,He Yuan,Xue Chunxiang
    Data Analysis and Knowledge Discovery. 2018, 2(12): 60-67. https://doi.org/10.11925/infotech.2096-3467.2018.0200
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper tries to identify Weibo opinion leaders with the help of social network analysis and influence diffusion model. [Methods] First, we analyzed the opinion leaders’ characteristics based on the social network analysis. Then we optimized the existing influence diffusion model from the perspectives of impact scope and extent. Finally, we applied the new model to find opinion leaders. [Results] Compared with the models built on centrality analysis or semantic similarity, the optimized model obtained better ranking for opinion leaders, which was consistent with the Weibo data. [Limitations] Only examined the proposed method with data on GMO foods. [Conclusions] The proposed model could effectively identify the Weibo opinion leaders.

  • Feng Guoming,Zhang Xiaodong,Liu Suhui
    Data Analysis and Knowledge Discovery. 2018, 2(12): 68-76. https://doi.org/10.11925/infotech.2096-3467.2018.0391
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This study tries to address the issues facing long text representation and use CapsNet to improve the accuracy of Chinese text classification. [Methods] First, we proposed a LDA matrix and word vector to represent the long texts. Then, we constructed a Chinese classification model based on CapsNet. Third, we examined the proposed model with Sogou news corpus and the text classification corpus of Fudan University. Finally, we compared our results with those of the classic models (e.g., TextCNN, DNN and so on). [Results] The performance of CapsNet model was better than other models. The classification accuracy in five categories of short and long texts reached 89.6% and 96.9% respectively. The convergence speed of the proposed model was almost two times faster than that of the CNN model. [Limitations] The computational complexity of the model is high, which limits the size of testing corpus. [Conclusions] The proposed Chinese text representation method and the modified CapsNet model have better accuracy, convergence speed and robustness than the existing ones.

  • Xiong Huixiang,Ye Jiaxin,Jiang Wuxuan
    Data Analysis and Knowledge Discovery. 2018, 2(12): 77-88. https://doi.org/10.11925/infotech.2096-3467.2018.0358
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper tries to improve the DBSCAN algorithm and verify its feasibility and effectiveness in social tagging. [Methods] First, we analyzed the frequency of social tags for resources and their total appearances. Then, we examined the relationship between tags and resources to improve the DBSCAN clustering algorithm. Finally, we applied the new algorithm to cluster tags, and users. [Results] We ran our experiment with data from Douban Movies. The modified DBSCAN algorithm improved the inter-object and inter-cluster correlations of social taggings. [Limitations] The sample datasets need more in-depth mining. [Conclusions] The improved DBSCAN algorithm could effectively cluster social tags.

  • Wang Ying,Wu Sizhu
    Data Analysis and Knowledge Discovery. 2018, 2(12): 89-97. https://doi.org/10.11925/infotech.2096-3467.2018.0423
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper aims to convert STKOS Metathesaurus from records of relational database to RDF triples. [Methods] First, we defined the semantic schema of the STKOS based on their storage features and data characteristics. Then, we mapped the scientific terms, standard concepts, categories, as well as source concepts and terms with the help of R2RML. Finally, we converted the documents stored in relational database to RDF datasets with the R2RML parser. [Results] The proposed method could process STKOS metathesaurus automatically and generated 190 million RDF triples. All new records were stored in the Virtuoso database and could be queried with SPARQL. [Limitations] Predicates in the R2RML lacks flexibily, therefore, more complex data sets need to be splited and transformed first. [Conclusions] The proposed model shed light on future research on converting other relational database records or thesaurus to RDF datasets.

  • Fan Xinyue,Cui Lei
    Data Analysis and Knowledge Discovery. 2018, 2(12): 98-108. https://doi.org/10.11925/infotech.2096-3467.2018.0545
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper tries to identify potential targets of antineoplastic drugs, aiming to provide references for future clinical work and experiment. [Methods] First, we retrieved the targets of antineoplastic drugs from the DrugBank database, which were also combined with the protein interaction information from the HPRD database. Then, we established the PPI network for these targets with Cytoscape and calculated the topology properties of the nodes. Third, we used SPSS single factor analysis and Weka’s information gain principle to choose the variables for topological attributes. Fourth, we introduced the SMOTE algorithm to process unbalanced data sets and constructed the prediction model for antineoplastic drug targets with the decision tree method. Finally, we compared the performance of our new model with those of the classic ones. [Results] The precision of the proposed model reached 73.18%. With the help of CBioPortal, we found 16 targets’ prediction scores higher than 0.9. These targets could mutate and amplify in various tumors, which were analyzed with the case of NR5A1. [Limitations] The characteristics of target functions, sequence attributes, and other factors should also be included to construct the model. [Conclusions] The proposed model could predict the potential targets of antineoplastic drugs effectively.