Current Issue
    , Volume 3 Issue 5 Previous Issue    Next Issue
    For Selected: View Abstracts Toggle Thumbnails
    Information Needs of Domestic and International HCQA Users ——An Empirical Analysis
    Jing Shi,Chenlu Li,Yuxing Qian,Liqin Zhou,Bin Zhang
    2019, 3 (5): 1-10.  DOI: 10.11925/infotech.2096-3467.2018.0813
    Abstract   HTML ( 3 PDF (1077KB) ( 55 )

    [Objective] This paper identifies and analyzes the information needs of domestic and international question and answer health communities, aiming to find the patterns of regulation evolutions and explore the reasons. [Methods] First, we selected diabetes related data from ManYouBang and DailyStrength. Then, we compared topic evolution and co-occurrence, from the perspectives of theme and time, and using theme coding, social network and content analysis. [Results] The essential needs of HCQA users were “how to treat the disease”. For the chronic disease community, the “diet” theme was closely related to its co-occurrence themes. [Limitations] Our research did not examine the relevance of answers to questions, thus more in-depth study is needed on the topic evolution and content. [Conclusions] Domestic HCQA community is still developing while their foreign counterparts are stable. The former has only “question and answer attribute” while latter has both “Q&A” and “social” attributes.

    References | Related Articles | Metrics
    Predicting Stock Trends Based on News Events
    Mengji Zhang,Wanyu Du,Nan Zheng
    2019, 3 (5): 11-18.  DOI: 10.11925/infotech.2096-3467.2018.0871
    Abstract   HTML ( 8 PDF (1577KB) ( 92 )

    [Objective] This paper tries to predict stock trends with the help of deep learning models, financial data and related news events. [Methods] First, we built a classification model for news events. Then, we used the recurrent neural networks to construct a forecasting model for stock trends based on news, capital flows and corporate financial reports. [Results] The prediction accuracy was improved by the proposed model (76.22% and 77.36% for the mining and pharmaceutical manufacturing industries). [Limitations] We did not examine the different impacts of news headlines and full-texts on stock market. We only chose news events from the past one year, which needs to be expanded. [Conclusions] News events could improve the accuracy of predicting stock trends.

    References | Related Articles | Metrics
    Appraising Home Prices with HEDONIC Model: Case Study of Seattle, U.S.
    Wancheng Chen,Haoran Dai,Yinghan Jin
    2019, 3 (5): 19-26.  DOI: 10.11925/infotech.2096-3467.2018.0881
    Abstract   HTML ( 4 PDF (1405KB) ( 58 )

    [Objective] This paper proposes a model based on the HEDONIC theory, aiming to assess home prices more efficiently, cost-effectively and accurately. [Methods] We adopted the spatial analysis method to extract important features from pre-processed data. Then, we built the model with Random Forest, KNN and Neural Networks. [Results] We examined our model with property price data of Seattle (USA) from 2014 to 2015 and found its precision was 11.20% higher than the linear model. [Limitations] The sample data was not retrieved from the same time slice, which might affect the performance of our model. Using this model to assess home prices in China might be biased due to different market environment and other factors. [Conclusions] The proposed model is a reliable method to appraise property prices.

    References | Related Articles | Metrics
    Reviewing Basic Methods of Entity Resolution
    Guangshang Gao
    2019, 3 (5): 27-40.  DOI: 10.11925/infotech.2096-3467.2018.1388
    Abstract   HTML ( 1 PDF (851KB) ( 72 )

    [Objective] This paper discusses the classical entity resolution methods and logical thinking in entity resolution theory. [Coverage] Google Scholar and CNKI were respectively used to search literatures with the keywords “Entity Resolution”, “Collective Analysis”, “Crowdsourced”, “Active Learning”, “Privacy-Preserving” and “Entity Resolution” in Chinese. I then obtained a total of 86 representative literatures in conjunction with topic screening, intensive reading and retrospective method. [Methods] For each entity resolution method, the paper first summarizes and analyzes the basic idea of the method, and presents the resolution process through illustration, and then focuses on analyzing the key strategies, algorithms or techniques adopted by the existing research in the process of implementation of the method. [Results] Entity resolution is the basic operation of data quality management, and the key step to find the value of data. [Limitations] There is no in-depth analysis of the evaluation indicators and application of each entity resolution method. [Conclusions] Although existing entity resolution methods can meet the requirements of most applications to some extent, they still face challenges in data heterogeneity, privacy protection and distributed environment in the big data environment.

    References | Related Articles | Metrics
    Methods and Applications of Norwegian Model for Science and Technology Evaluation
    Qiang Liu,Yunwei Chen,Zhiqiang Zhang
    2019, 3 (5): 41-50.  DOI: 10.11925/infotech.2096-3467.2018.1222
    Abstract   HTML ( 1 PDF (1227KB) ( 43 )

    [Objective] This paper provides a comprehensive introduction to the Norwegian Model, aiming to promote the development of science and technology evaluation in China. [Methods] With case studies, this paper first discussed the implementation of the Norwegian Model, and the successful stories from regions outside of Norway. Then we explored the application of the Norwegian Model at various levels and subjects. Finally, we compared the Norwegian Model with two classic bibliometric measures. [Results] Six European countries used the Norwegian Model, a performance-based research funding system (PRFS), to promote their scientific publications. [Limitations] The Norwegian Model and its applications have been evolving, therefore, we are not able to discuss their future trends. [Conclusions] The Norwegian Model has some value in science and technology evaluation. More research is needed to explore its applications in China.

    References | Related Articles | Metrics
    Identifying Coordinate Text Blocks in Discourses
    Jingjing Pei,Xiaoqiu Le
    2019, 3 (5): 51-56.  DOI: 10.11925/infotech.2096-3467.2018.1380
    Abstract   HTML ( 5 PDF (662KB) ( 40 )

    [Objective] This paper proposes a method to identify the coordinate text blocks by semantic and layout features, which are distributed in different paragraphs. It also provides a pre-trained model for these knowledge objects. [Methods] First, we used each paragraph as a processing unit and added the layout features based on the character and word vectors. Then, we concatenated multi-dimensional features to represent each paragraph. Third, we employed the convolutional neural network (CNN) model to train the annotated data and obtained the recognition model for coordinate relationship text blocks. [Results] The proposed approach achieved a precision of 96% with manually annotated scientific papers, which was 3% higher than those of the baseline model. The recall was also improved by 2%. [Limitations] Our model can only work with HTML files. More research is needed to examine it with other data formats. [Conclusions] The proposed method is able to effectively identify coordinate text blocks in discourses, which can be used as a pre-trained model for coordinate knowledge objects.

    References | Related Articles | Metrics
    Revealing Sci-Tech Policy Evolution with Entity Relationship
    Jianhua Liu,Zhixiong Zhang,Qin Zhang
    2019, 3 (5): 57-67.  DOI: 10.11925/infotech.2096-3467.2018.1379
    Abstract   HTML ( 1 PDF (2339KB) ( 37 )

    [Objective] The paper tries to describe the evolutionary path of science and technology (S&T) policies using knowledge from documents generated in policy promotion. [Methods] We proposed a multi-index model with direct semantic relationship, direct co-occurrence relationship, in-direct co-occurrence relationship and link path attenuation index. The S&T policy entities and their relationships used in the proposed model were extracted from the policy texts. We described the S&T policy evolution paths along with time properties and then analyzed the structural features of policy entities and their relationship. [Results] We found the evolution paths of these policies at different stages, and 80% of the retrieved paths were existing in the real world. [Limitations] The proposed model relies on human comparison and interpretation. Besides, the sample size needs to be expanded. [Conclusions] This study reveals the evolutionary path of S&T policies based on related records. It expands the scope and depth of S&T policy analysis research.

    References | Related Articles | Metrics
    Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning
    Jinzhu Zhang,Yiming Hu
    2019, 3 (5): 68-76.  DOI: 10.11925/infotech.2096-3467.2018.0659
    Abstract   HTML ( 7 PDF (731KB) ( 53 )

    [Objective] This paper aims to automatically identify scientific references in patent(SRP), and then extract titles from SRP to support in-depth data mining. [Methods] Firstly, we used the Doc2Vec method to generate vectors for the patent citations. Then, we identified the SRPs with support vector machine (SVM). Third, we created vectors for the metadata (such as titles) of SRP, and extracted titles with SVM. [Results] We examined the proposed method with patent citations from the genetic field. The accuracy of SRP recognition and titles extraction reached 99.27% and 92.59% respectively. The latter was 5.96% higher than those of the traditional methods. [Limitations] Manually tagging the training set was very time consuming, and there are format requirements for the experimental data. [Conclusions] The proposed method could effectively identify and extract patent citations and titles.

    References | Related Articles | Metrics
    Classifying Short Text Complaints with nBD-SVM Model
    Bengong Yu,Yangnan Chen,Ying Yang
    2019, 3 (5): 77-85.  DOI: 10.11925/infotech.2096-3467.2018.0758
    Abstract   HTML ( 5 PDF (779KB) ( 47 )

    [Objective] This paper tries to find an effective way to classify the non-structured and short-text business complaints, aiming to improve the efficiency of corporate problem solving. [Methods] We first combined the topic model and distributed representation technique to construct a SVM input space vector. Then, we integrated ensemble learning method to build the nBD-SVM text classification model. [Results] We examined the proposed model with business complaint texts and found its precision reached 81.83%, which is much higher than the traditional methods. [Limitations] We only evaluate our model with complaints from one company. [Conclusions] The proposed nBD-SVM model could process short text business complaints effectively.

    References | Related Articles | Metrics
    Extracting Relationship of Agricultural Financial Texts with Attention Mechanism
    Yuemin Wu,Ganggui Ding,Bin Hu
    2019, 3 (5): 86-92.  DOI: 10.11925/infotech.2096-3467.2018.0818
    Abstract   HTML ( 15 PDF (710KB) ( 55 )

    [Objective] This paper proposes a new method to extract relations from Chinese texts automatically. [Methods] We retrieved annual reports of 224 listed agricultural companies from 2015 to 2017. Then we adopted the Gated Recurrent Unit algorithm based on double attention mechanism to extract the needed data. [Results] The average accuracy of our model on the agricultural financial dataset reached 78%. Compared with the Recurrent Neural Network algorithm, the average accuracy of the new model increased by about 12%. [Limitations] We only studied data from 224 companies, which needs to be expanded. [Conclusions] The proposed model can effectively extract relationship from agricultural financial texts.

    References | Related Articles | Metrics
    Health APPs and Privacy Concerns: A Three-Entities Game-theoretic Approach
    Guang Zhu,Hu Liu,Xinmeng Du
    2019, 3 (5): 93-106.  DOI: 10.11925/infotech.2096-3467.2018.0844
    Abstract   HTML ( 7 PDF (922KB) ( 42 )

    [Objective] This paper studies the usage intention of mobile health APPs (mHealth) and privacy concerns. It analyzes the interferences among various entity behaviors, aiming to improve mHealth privacy protection and increase APP usage. [Methods] Using evolutionary game theory, we proposed a model to examine the patient behaviors, mHealth APP providers and government regulations. Then we analyzed the benefits, costs and loss of different behaviors to establish the payoff matrices and evolutionarily stable strategies (ESSs). Finally, we discussed the impacts of different factors on patient behaviors. [Results] The usage intention of mHealth APP was correlated with benefits from mHealth service and probability of privacy leaking. However, the government’s regulation has few impacts on patient’s behaviors. Investments of mHealth service providers in privacy was correlated with APP usage intention, government regulations, costs and privacy loss, etc. Government regulations were correlated with costs and social credibility. [Limitations] We did not include the nonlinear benefit function in this study. Other factors, such as success rate of regulation and advertisement effects should also be examined. [Conclusions] This study promotes the development of mHealth service by analyzing the impacts of various factors on APP usage privacy protection and government regulation.

    References | Related Articles | Metrics
    Analyzing Characteristics of Interdisciplinary Research Evolutions: Case Study of Medical Informatics
    Yujie Cao,Jin Mao,Rongqing Pan,Zhichao Ba,Gang Li
    2019, 3 (5): 107-116.  DOI: 10.11925/infotech.2096-3467.2018.0905
    Abstract   HTML ( 3 PDF (929KB) ( 45 )

    [Objective] The paper explores evolution of interdisciplinary research, aiming to identify their characteristics. [Methods] We chose “Medical Informatics” as the example of interdisciplinary study and divided its evolution into different phases. Then, we introduced interdisciplinary characteristics from the perspectives of knowledge input and output. Finally, we analyzed the co-word patterns of knowledge output to reveal the research evolution features. [Results] At the beginning, developing, and stable stages of Medical Informatics research, both the interdisciplinary degree indicators and the structural properties of the co-word network were different. At the stable stage, the knowledge began to internalize and specialize while exploding. [Limitations] The sample size of interdisciplinary fields needs to be further expanded. [Conclusions] The changing of interdisciplinary research characteristics are the results of multi-disciplinary knowledge input and output.

    References | Related Articles | Metrics
    Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine
    Cheng Zhou,Hongqin Wei
    2019, 3 (5): 117-124.  DOI: 10.11925/infotech.2096-3467.2018.0674
    Abstract   HTML ( 1 PDF (581KB) ( 38 )

    [Objective] This paper proposes a new method for evaluating and classifying patent values. [Methods] With the help of value indicators, we designed a patent value analysis and classification system based on self-organizing maps (SOM) and support vector machine (SVM) techniques. We used the SOM to determine value categories, and then applied the random forest (RF) algorithm to rank value indictors based on their significance. Finally, we improved classification performance with the wrapped feature reduction method. [Results] The value tags determined by SOM effectively represented the patent values. Meanwhile, the value indictors were reduced from 14 to 10, and the classification accuracy was increased from 76.28% to 86.89%. [Limitations] Further refinement of patent values in each category is needed, which might reduce the patent value indicators. [Conclusions] The proposed SOM-RF-SVM method could support research and development activities as well as reduce the dependence on human factors.

    References | Related Articles | Metrics
    Detecting Collusive Fraudulent Online Transaction with Implicit User Behaviors
    Jiaming Liang,Jie Zhao,Zhou Jianlong,Zhenning Dong
    2019, 3 (5): 125-138.  DOI: 10.11925/infotech.2096-3467.2018.0665
    Abstract   HTML ( 4 PDF (1638KB) ( 48 )

    [Objective] This paper explores new data mining method for implicit user behaviors, aiming to improve the precision of the model for collusive fraud detection. [Methods] First, we proposed a framework for implicit user behaviors analysis. Then, we designed a two-stage algorithm to select the needed implicit features. [Results] We examined our new model with massive data from an existing e-commerce platform and found that the proposed model was more effective than the existing ones. [Limitations] The size of our experimental dataset needs to be expanded. [Conclusions] Using implicit features is an effective way to improve the precision of the collusive fraud detection model.

    References | Related Articles | Metrics
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn