Author's Guide

MORE>>
  • 2019 No.8
  • Published:25 August 2019
  • Directed by: Chiness Academy of Sciences
  • Sponsored by: National Science Library, Chinese Academy of Sciences
  • Published by: Editorial Board of New
    Data Analysis and Knowledge Discovery
      25 August 2019, Volume 3 Issue 8 Previous Issue   
    For Selected: View Abstracts Toggle Thumbnails
    Classifying Social Media Users with Machine Learning
    Gang Li,Huayang Zhou,Jin Mao,Sijing Chen
    2019, 3 (8): 1-9.  DOI: 10.11925/infotech.2096-3467.2018.1207
    Abstract   HTML   PDF (1064KB) ( 74 )

    [Objective] This paper uses multi-dimensional information of social media users to automatically classify them. [Methods] First, we defined social media users as individual, media, government, and organization. Then, we extracted the following features from user profiles: demographic characteristics, namings, and self-descriptions. Third, we created a user classification models based on machine learning algorithms and evaluated its performance with real Twitter dataset. [Results] Both precision and recall of the proposed model were greater than 83%. The naming, demographic characteristics, and self-description features posed increasing contributions to the classification model. [Limitations] The sample size needs to be expanded, which helps us better analyzed the characteristics of different users. [Conclusions] The proposed method could accurately identify four types of users, which benefits social media user classification research in the future.

    Figures and Tables | References | Related Articles | Metrics
    Sentiment Analysis for Online User Reviews Based on Tripartite Network
    Weicong Lu,Jian Xu
    2019, 3 (8): 10-20.  DOI: 10.11925/infotech.2096-3467.2018.1030
    Abstract   HTML   PDF (5343KB) ( 31 )

    [Objective] The paper proposes a tripartite network sentiment analysis method, aiming to reflect the indirect connections between nodes. [Methods] We constructed a “user-product-sentiment tag” tripartite network, which were split into three bipartite networks for network structure analysis. Then, we used the proposed tripartite network projection method to obtain the “two-sentiment one-mode” network of users and products. [Results] We obtained the association of high-weighted related nodes from NetEase Cloud music dataset, and information such as genre classifications, hot-rated songs, and fan groups. [Limitations] The large number of user nodes need to be visualized in the future. [Conclusions] Based on the formation, splitting and projection of the sentiment tripartite network, we present the indirect connection between nodes, and provide new perspectives for network sentiment analysis.

    Figures and Tables | References | Related Articles | Metrics
    Measuring Tech-Entropy of System Evolution: An Empirical Study of Patents
    Jianhua Hou,Pan Liu
    2019, 3 (8): 21-29.  DOI: 10.11925/infotech.2096-3467.2018.0904
    Abstract   HTML   PDF (635KB) ( 12 )

    [Objective] This paper measures the developments and the life cycles of the technology system with an improved technology entropy method, aiming to provide theoretical foundation for predicting technology development and decision-making of the governments. [Methods] We constructed a model measuring technological entropy based on information entropy and multiple indicators for the patented technology system. Then, we conducted an empirical analysis with the new model for carbon capture technology in China. [Results] We found that the target technology concluded the stages of sprouting, and slow growth. It is currently in the stage of rapid growth. [Limitations] The quality of the sample data needs to be improved. [Conclusions] The proposed method is an effective way to analyze the evolution trends of patent technology system, which provides a better solution for identifying the life cycle of technologies.

    Figures and Tables | References | Related Articles | Metrics
    POI Recommendation Based on Geographic and Social Relationship Preferences
    Yan Wen,Lijian Ma,Qingtian Zeng,Wenyan Guo
    2019, 3 (8): 30-39.  DOI: 10.11925/infotech.2096-3467.2018.0764
    Abstract   HTML   PDF (1295KB) ( 15 )

    [Objective] This study tries to improve the POI recommendation based on user’s geographic information and social relationships. [Methods] First, we proposed a MFDR model (MF with Distance-entropy and Refined-social-regularization), which introduced the concept of distance-entropy to refine user’s preferences and the frequency-based user-interest-matrix. Then, we applied the user-relationship-interest-matrix to refine the preferences with their social-relationship. Finally, we used the regularization-based matrix factorization method to factorize the user-preference-matrix and user-relationship-interest-matrix to ensure their consistency. [Results] We examined the new model with Gowalla and Brightkite check-in datasets, and found it outperformed existing POI recommendation algorithms. When the number of latent factors was 10 and the number of recommended POI was 10, the precision and recall of MFDR on Gowalla reached 4.47% and 9.95%. These results were 30.71% and 28.93% higher than those of traditional POI recommendation models. [Limitations] The expeimental datasets need to be expanded. [Conclusions] The proposed MFDR model based on geographical preference refinement and social-relationship preference implicit analysis is an effective way to recommend POI.

    Figures and Tables | References | Related Articles | Metrics
    Evaluating Information Services of Online Health Q&A Platform
    Chuang Hong,He Li,Lihui Peng,Yiming Xu
    2019, 3 (8): 41-52.  DOI: 10.11925/infotech.2096-3467.2018.1482
    Abstract   HTML   PDF (585KB) ( 14 )

    [Objective] This paper explores the evaluation methods for information services of online health Q&A platform, aiming to promote its sustainable development. [Methods] We introduced the SERVQUAL framework and established assessment indicators and extension evaluation model. [Results] We examined the proposed model with Dingxiang Doctor, a health Q&A platform in China, to evaluate the quality of its information services. We found its quality grade was 3 and characteristic value of grade variable was 2.955. These results indicated the Dingxiang Doctor maintains good services. However, its reliability, assurance and empathy need to be improved. [Limitations] The sample of this research is small, and the expert scoring method might be subjective. [Conclusions] The matter-element model and extension evaluation method can help us evaluate and improve the service of online health Q&A platform.

    Figures and Tables | References | Related Articles | Metrics
    Sentence Function Recognition Based on Active Learning
    Guo Chen,Tianxiang Xu
    2019, 3 (8): 53-61.  DOI: 10.11925/infotech.2096-3467.2018.1198
    Abstract   HTML   PDF (1017KB) ( 25 )

    [Objective] This paper uses active learning methods, structured abstracts and a few annotations to create a classification model for sentence functions, aiming to reduce the dependence on manually labeled corpus. [Methods] First, we trained the SVM, CNN and Bi-LSTM classifiers with structured function sentences from abstracts. With the help of active learning techniques, we predicted the function of a large number of unlabeled common abstract sentences. Third, we automatically identified uncertain samples for manual annotation, which were used to optimize the initial classifier. Finally, we used active learning to improve the performance of classifiers. [Results] We examined the new method with Library and Information Science literature. The precision, recall, and F1 values were 84.65%, 84.49%, and 84.57%, which were 3.25%, 3.24%, and 3.25% higher than those of the traditional methods. [Limitations] We only conducted five iterations to avoid massive work of manual corpus annotation. [Conclusions] Active learning method could effectively discover the difference between unlabeled corpus and existing training corpus, which also reduces the manual labeling costs. The proposed method might be used in citation and full text analysis.

    Figures and Tables | References | Related Articles | Metrics
    Collaborative Filtering Recommendation Based on Item Quality and User Ratings
    Fusen Jiao,Shuqing Li
    2019, 3 (8): 62-67.  DOI: 10.11925/infotech.2096-3467.2018.1000
    Abstract   HTML   PDF (512KB) ( 8 )

    [Objective] This paper proposes a modified collaborative filtering algorithm, aiming to improve the results of personalized recommendations. [Methods] First, we evaluated item quality and corrected user ratings based on their previous records. Then, we identified users with similar interests to generate better recommendations. [Results] We tested the new algorithm on MovieLens dataset and found the MAE was 4.7% higher than those of the traditional or other modified methods. [Limitations] The new algorithm does not address the interests drifting issues. [Conclusions] The proposed algorithm could recommend products to consumers more effectively.

    Figures and Tables | References | Related Articles | Metrics
    Extracting Keywords Based on Topic Structure and Word Diagram Iteration
    Mingzhu Sun,Jing Ma,Lingfei Qian
    2019, 3 (8): 68-76.  DOI: 10.11925/infotech.2096-3467.2018.0765
    Abstract   HTML   PDF (598KB) ( 16 )

    [Objective] This paper integrates the topic information to the TextRank model, aiming to improve the precision and recall of automatic keyword extraction. [Methods] First, we used the LDA to create a model for document topics, and obtained the topic distribution of the candidate keywords. Then, we calculated the node weights with the topic-word probability distribution features. Third, we weighted the probability distributions of document-topic and topic-word characteristics as the node’s random jump probability. Finally, we constructed a new transition matrix for word graph iteration to improve the TextRank model. [Results] We examined the proposed model with 1559 news articles from the website of Southern Weekly. When the number of extracted keywords was three, the model’s keyword extraction precision values were 4.7% and 6.5% higher than those of the original TextRank and TF-IDF algorithms. [Limitations] The fusion algorithm increased computational complexity. [Conclusions] The proposed algorithm could extract keywords more effectively.

    Figures and Tables | References | Related Articles | Metrics
    ISA Biclustering Algorithm for Group Recommendation
    Shan Li,Yehui Yao,Hao Li,Jie Liu,Karmapemo
    2019, 3 (8): 77-87.  DOI: 10.11925/infotech.2096-3467.2018.1015
    Abstract   HTML   PDF (1025KB) ( 16 )

    [Objective]This paper tries to improve the recommendation algorithm, aiming to reduce the dependence on the number of groups (k value) at the catorization stage.[Methods]Weused the ISA algorithm to modify the collaborative filtering algorithm and finish the clustering tasks from the perspectives of users and projects. Then, we created a virtual user representing the group interests based on user’s expertise. Finally, we predicted the target users’ ratings based on the new collaborative filtering algorithm.[Results]This algorithm can remove the empirical dependence of k, and improve the accuracy of collaborative filtering recommendation algorithm. The MAE was reduced to 0.697 with 200 groups and the MAE was reduced to 0.693 with 500 groups from the FilmTrust dataset. The RMSE was reduced to 1.022 with the MovieLens dataset. [Limitations]Several rounds of repeating experience are needed to improve the quality of this study.[Conclusions] This algorithm does not rely on the dependence of k, and effectively improves the performance of collaborative filtering recommendation algorithm.

    Figures and Tables | References | Related Articles | Metrics
    Predicting Breast Cancer Survival Length with Multi-Omics Data Fusion
    Huiying Qi,Yuhe Jiang
    2019, 3 (8): 88-93.  DOI: 10.11925/infotech.2096-3467.2019.0021
    Abstract   HTML   PDF (493KB) ( 5 )

    [Objective] This paper proposes a model using machine learning techniques and various omics data, aiming to better predict the survival length of breast cancer patients. [Methods] The prediction model was established with random forest algorithm. It merged four types of omics data, including gene expression, copy number variation, DNA methylation and protein expression of breast cancer cases from TCGA database. [Results] On the test data set, the model’s prediction precision reached 97.22%, and the recall was 98.13%. Compared with the exisiting models, the AUC value of our new algorithm was the highest (0.8393). [Limitations] The sample size needs to be expanded. [Conclusions] The proposed method is an effective way to predict breast cancer patients’ survival length.

    Figures and Tables | References | Related Articles | Metrics
    Ontology Reasoning for Financial Affairs with RBR and CBR
    Shaohua Qiang,Yunlu Luo,Yupeng Li,Peng Wu
    2019, 3 (8): 94-104.  DOI: 10.11925/infotech.2096-3467.2018.1137
    Abstract   HTML   PDF (1029KB) ( 8 )

    [Objective] This paper tries to predict the impacts of financial events/news on stock price with financial, non-financial and public opinion factors. [Methods] We designed a financial affairs ontology based on the Rule-Based Reasoning (RBR) and Case-Based Reasoning (CBR). Then, we created a SWRL rule-based reasoning model, which pursued the rule-based reasoning using the Dloors engine. Thirdly, we designed a topic case database to describe the structure of the financial cases. Finally, we used the model to describe, retrieve, reuse, correct and preserve the data. [Results] We conducted an empirical study to examine the reliability of rule-based reasoning and case-based reasoning with enterprise data. [Limitations] We did not compare our model with the existing methods. [Conclusions] The proposed method could predict the stock price in big data environment.

    Figures and Tables | References | Related Articles | Metrics
    Extracting New Words with Mutual Information and Logistic Regression
    Xianlai Chen,Chaopeng Han,Ying An,Li Liu,Zhongmin Li,Rong Yang
    2019, 3 (8): 105-113.  DOI: 10.11925/infotech.2096-3467.2018.1445
    Abstract   HTML   PDF (748KB) ( 12 )

    [Objective] This paper modified the method for new word extraction, which are used to improve the performance of medical text segmentation models. [Methods] With the help of traditional mutual information model, we obtained the statistics of words and strings. Then, we established a logical regression classification model with these data, and built an algorithm for new word identification. [Results] A series of experiments were carried out on the texts of electronic medical records from Dermatology Department of Xiangya Hospital. Compared with PMI, PMI 2 and PMI 3, our model with logistic regression achieved the highest accuracy of new words extraction (0.803). [Limitations] To establish the logistic regression model for classification, we have to manually judge whether or not the training strings are words. [Conclusions] The proposed model and algorithm could effectively identify new words from medical records.

    Figures and Tables | References | Related Articles | Metrics
    Identifying Frontier Topics from Funding and Paper——Case Study of Carbon Nanotube
    Bowen Liu,Rujiang Bai,Yanting Zhou,Xiaoyue Wang
    2019, 3 (8): 114-122.  DOI: 10.11925/infotech.2096-3467.2018.1297
    Abstract   HTML   PDF (635KB) ( 15 )

    [Objective] This paper analyzes the fine-grained characteristics of funding and paper data in English, aiming to identify the frontiers of scientific research. [Methods] We retrieved NSF funded projects and WOS papers in the field of carbon nanotubes, and identified their LDA themes. Then, we compared their topic novelty, intensity and similarity. [Results] We found two trending topics, five emerging topics, four dying topics and two topics with potentialities. [Limitations] We did not evaluate our method with data in Chinese. [Conclusions] Compared with methods relying on single data source or dimension, our method can identify the frontiers of scientific research more effectively.

    Figures and Tables | References | Related Articles | Metrics
2019, Vol. 3 No.7 No.6 No.5 No.4 No.3 No.2
No.1
2018, Vol. 2 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.3 No.2 No.1
2017, Vol. 1 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.3 No.2 No.1
2016, Vol. 32 No.12 No.11 No.10 No.9 No.7-8 No.6
No.5 No.4 No.3 No.2 No.1
2015, Vol. 31 No.12 No.11 No.10 No.9 No.7-8 No.6
No.5 No.4 No.3 No.2 No.1
2014, Vol. 30 No.12 No.11 No.10 No.9 No.7 No.6
No.5 No.4 No.3 No.2 No.1
2013, Vol. 29 No.12 No.11 No.10 No.9 No.7 No.6
No.5 No.4 No.3 No.2 No.1
2012, Vol. 28 No.12 No.11 No.10 No.9 No.7 No.6
No.5 No.4 No.3 No.2 No.1
2011, Vol. 27 No.12 No.11 No.10 No.9 No.7 No.6
No.5 No.4 No.3 No.2 No.1
2010, Vol. 26 No.12 No.11 No.10 No.9 No.7 No.6
No.5 No.4 No.3 No.2 No.1
2009, Vol. 25 No.12 No.11 No.10 No.9 No.7-8 No.6
No.5 No.4 No.3 No.2 No.1
2008, Vol. 24 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.3 No.2 No.1
2007, Vol. 23 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.3 No.2 No.1
2006, Vol. 22 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.3 No.2 No.1
2005, Vol. 21 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.3 No.2 No.1
2004, Vol. 20 No.12 No.11 No.10 No.9 No.8 No.7
No.6 No.5 No.4 No.3 No.2 No.1
2003, Vol. 19 No.6 No.5 No.4 No.3 No.2 No.1
2002, Vol. 18 No.6 No.5 No.4 No.3 No.2 No.1
2001, Vol. 17 No.6 No.5 No.4 No.3 No.2 No.1
2000, Vol. 16 No.6 No.5 No.4 No.3 No.2 No.1
1999, Vol. 15 No.6 No.5 No.4 No.3 No.2 No.1
1998, Vol. 14 No.6 No.5 No.4 No.3 No.2 No.1
1997, Vol. 13 No.6 No.5 No.4 No.3 No.2 No.1
1996, Vol. 12 No.6 No.5 No.4 No.3 No.2 No.1
1995, Vol. 11 No.6 No.5 No.4 No.3 No.2 No.1
1994, Vol. 10 No.6 No.5 No.4 No.3 No.2 No.1
1993, Vol. 9 No.4 No.3 No.2 No.1
1992, Vol. 8 No.4 No.3 No.2 No.1
1991, Vol. 7 No.4 No.3 No.2 No.1
1990, Vol. 6 No.4 No.3 No.2 No.1
1989, Vol. 5 No.4 No.3 No.2 No.1
1988, Vol. 4 No.4 No.3 No.2 No.1
1987, Vol. 3 No.4 No.3 No.2 No.1
1986, Vol. 2 No.4 No.3 No.2 No.1
1985, Vol. 1 No.4 No.3 No.2 No.1
Manuscript Center
  • Position
      Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn