Current Issue
    , Volume 3 Issue 6 Previous Issue    Next Issue
    For Selected: View Abstracts Toggle Thumbnails
    Spatio-Temporal Characteristics of WMTS Access Sessions
    Ru Li,Rui Li,Jie Jiang,Huayi Wu
    2019, 3 (6): 1-11.  DOI: 10.11925/infotech.2096-3467.2018.0767
    Abstract   HTML ( 18 PDF (3071KB) ( 65 )

    [Objective] This paper explores the spatio-temporal statistical characteristics of users’ visits to Web Map Tile Service (WMTS). [Methods] First, we identified the WMTS sessions and extracted the targets based on an efficient algorithm. Then, we studied the temporal features of user access sessions with daily session numbers, requests and duration of each session, as well as assess speed per tile. For spatial characteristics, we described the relationship between users’ locations and their access targets, such as provinces, cities, and distances. [Results] The users’ WMTS sessions possessed power-law distribution, and most of them were brief and efficient with clear objectives. Users from provinces with better information infrastructure tended to have more centralized and deeper WMTS sessions. Most of the WMTS sessions searched for targets within the same province or city, while 30% of the targets were within 43 km of the users’ city centers. [Limitations] The data was collected from users who access WMTS frequently, which needs to be expanded. [Conclusions] Describing users’ access characteristics from session granularity, helps us understand users’ geographical information needs.

    References | Related Articles | Metrics
    Ranking Answer Quality of Popular Q&A Community
    Ming Yi,Tingting Zhang
    2019, 3 (6): 12-20.  DOI: 10.11925/infotech.2096-3467.2018.0696
    Abstract   HTML ( 3 PDF (553KB) ( 39 )

    [Objective] This paper proposes a new method to rank the quality of answers from a popular Q&A community in China. [Methods] First, based on the information acceptance model, we established initial quality indicators for the answer’s perceived values. Then, we discretized these indicators with the K-Medoids clustering algorithm. Third, we reduced and weighted the indictors with the help of rough set theory. Finally, we generated the formal rankings with the weighted grey correlation analysis. [Results] We evaluated the proposed method with 2 297 answers for six different types of questions from the Q&A website of “Zhihu”. We found that the answers ranked higher generally included textual message with images. These answers were also more informative than others and involved active members of the Q&A community. [Limitations] The size of our dataset needs to be expanded, and the evaluation method of the proposed model could be optimized. [Conclusions] The proposed method is an effective way to rank the quality of answers from the Q&A community.

    References | Related Articles | Metrics
    Semantic Representation of Design Process Knowledge Reuse
    Zhu Fu,Yuefen Wang,Xuhui Ding
    2019, 3 (6): 21-29.  DOI: 10.11925/infotech.2096-3467.2018.0846
    Abstract   HTML ( 2 PDF (553KB) ( 24 )

    [Objective] This paper explores the semantic representation method of design process knowledge (DPK), aiming to effectively reuse dynamic DPK. [Methods] First, we introduced modular design idea upon reviewing existing research. Then, we analyzed the contents and characteristics of DPK, and semantically modeled DPK based on double-layer modular packaging technology. Finally, we represented the semantic model with the help of ontology representation method. [Results] We took the conceptual design of recoil system as an example to create semantic representation of its DPK with the OWL. [Limitations] We only examined the proposed method with one single case. [Conclusions] The proposed method could semantically represent and reuse knowledge of dynamic design process.

    References | Related Articles | Metrics
    Visualizing Policy Texts Based on Multi-View Collaboration
    Yanan Yang,Wenhui Zhao,Jian Zhang,Shen Tan,Beibei Zhang
    2019, 3 (6): 30-41.  DOI: 10.11925/infotech.2096-3467.2018.0827
    Abstract   HTML ( 3 PDF (8125KB) ( 22 )

    [Objective] This paper visualizes the text mining process through multi-view collaborative technique, aiming to identify the patterns and insights more effectively. [Methods] Based on the textual word vector matrix, we processed the texts of multi-policy subjects with data cleaning, TF-IDF calculation, vector space model, singular value decomposition and other methods. [Results] We examined effectivenesss of the proposed model with governmental information from Zhongguancun of Beijing during the period of January 2016 to August 2017. [Limitations] The framework could not visualize the single data points of large-scale texts. [Conclusions] Multi-view collaborative visualization is an effective way to interpretate text message.

    References | Related Articles | Metrics
    E-Coupon and Economic Performance of E-commerce
    Xiaozhou Dong,Xinkang Chen
    2019, 3 (6): 42-49.  DOI: 10.11925/infotech.2096-3467.2018.0995
    Abstract   HTML ( 4 PDF (562KB) ( 50 )

    [Objective] This paper tries to categorize consumers based on the elasticity of their electronic discount vouchers with a heterogeneous mixture model. [Methods] We built the proposed model with heterogeneous mixture model and 22,234 members’ shopping data at a large online retail portal in China. We obtained the model parameters from the maximum likelihood estimation. [Results] Compared with the controlled group, the treatment group’s usage of electronic vouchers increased by 18.6%, their average amount of spending added 43 yuan, and the overall contribution margin raised by 359 thousand yuan. [Limitations] We only adopted one explanatory variable (membership levels) to the probability function. [Conclusions] The proposed model could help companies optimize the effects of their promotions, and increase the usage of vouchers, sales and gross profits.

    References | Related Articles | Metrics
    Semantic Matching for Sci-Tech Novelty Retrieval
    Junliang Yao,Xiaoqiu Le
    2019, 3 (6): 50-56.  DOI: 10.11925/infotech.2096-3467.2018.1390
    Abstract   HTML ( 1 PDF (530KB) ( 33 )

    [Objective] This paper tries to identify semantics similar to the novelty points from preliminary searching results, aiming to retrieve needed journal articles or patents automatically. [Methods] First, we designed a deep multi-task hierarchical classification model based on Bi-GRU-ATT. Then, we trained several different hierarchical classification models using International Patent Classification Table (IPC) categories and patents. Third, we used a small amount of paper data to fine-tune the model for papers and patents. Finally, we compared the semantic categories of new points and candidate records to collect the matching ones. [Results] With two-level classification of patents under IPC (E21B), the new model’s precisions were 82.37% and 73.55% respectively, which were better than the benchmark models. For real novelty search points data, the precision of semantic matching was 88.13%, which was 15.16% higher than that of TF-IDF. [Limitations] Only examined our model with a small amount of IPC categories . [Conclusions] The proposed method improves the semantic matching of novelty search points.

    References | Related Articles | Metrics
    Discovering Domain Vocabularies Based on Citation Co-word Network
    Qikai Cheng,Jiamin Wang,Wei Lu
    2019, 3 (6): 57-65.  DOI: 10.11925/infotech.2096-3467.2018.1159
    Abstract   HTML ( 0 PDF (528KB) ( 33 )

    [Objective] This paper identifies basic vocabularies of a specific domain from academic papers, aiming to grasp the knowledge structure and development context. [Methods] We combined the citation network and the co-word analysis to construct a citation co-word network. Then, we used the PageRank algorithm to evaluate the importance of the candidate words. We examined the proposed method with 110,360 articles in computer science. [Results] Our new method was compared with the word frequency method and co-word analysis qualitatively and quantitatively. We found that the proposed method performed well, and the average precision of a blind selection experiment reached 72.6%. [Limitations] The proposed method was only examined with computer science articles. [Conclusions] The new strategies could improve the performance of basic vocabulary discovery in one specific domain.

    References | Related Articles | Metrics
    Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model
    Han Huang,Hongyu Wang,Xiaoguang Wang
    2019, 3 (6): 66-74.  DOI: 10.11925/infotech.2096-3467.2018.1226
    Abstract   HTML ( 2 PDF (1308KB) ( 48 )

    [Objective] This paper tries to identify legal terminologies automatically from the large-scale legal texts, aiming to structuralize legal big data. [Methods] We used the Conditional Random Field model as the classifier of the Active Learning algorithm, and then identify legal terms. Once the corpus was clustered by K-means, we extracted the initial list used to initiate the Active Learning algorithm with stratified sampling. Entropy was used as the basis of sample selection for Active Learning. The learning and sample selection process of active learning were carried out iteratively until the harmonic mean F value of the model was stabilized. Finally, the legal domain entity recognition model (AL-CRF) was generated. [Results] We ran the proposed model with Chinese judgment documents and found the precision and recall rates of AL-CRF model reached more than 90%, and its F value was 4.85% higher than that of the CRF model with equal labeling workload training. [Limitations] K-means clustering method is sensitive to noise and outliers, which may affect performance of the model. [Conclusions] The conditional random fields combined with active learning could reduce the workload with low-quality samples and ensure the recognition quality.

    References | Related Articles | Metrics
    Discovering Important Locations with User Representation and Trace Data
    Qingtian Zeng,Mingdi Dai,Chao Li,Hua Duan,Zhongying Zhao
    2019, 3 (6): 75-82.  DOI: 10.11925/infotech.2096-3467.2018.1085
    Abstract   HTML ( 0 PDF (2389KB) ( 27 )

    [Objective] This paper tries to discover the important locations of users, aiming to provide good data support for user behavior studies. [Methods] We presented a model for predicting important locations based on user representation. First, we proposed a vectorized representation method to predict user behaviors based on Word2Vec. Then, we constructed a user relationship network based on the similarity of user vectors to extract core users. Finally, we predicted the important locations by the behaviors of core users. [Results] The precison of important locations classifiction was 7% higher than those of the exisitng methods. Moreover, the residential and commercial areas were shown in the labeled map. [Limitations] Our method can only identify residential and business areas. [Conclusions] The proposed method could effectively find important locations and provide more supports to study user behaviors.

    References | Related Articles | Metrics
    Extracting Book Review Topics with Knowledge Base
    Ruihua Qi,Junyi Zhou,Xu Guo,Caihong Liu
    2019, 3 (6): 83-91.  DOI: 10.11925/infotech.2096-3467.2018.0887
    Abstract   HTML ( 3 PDF (1976KB) ( 32 )

    [Objective] This paper tries to extract topics from book reviews with the help of natural language semantics. [Methods] We proposed a method to retrieve the explicit and implicit topic keywords with the global semantic information from common sense knowledge base. [Results] The sentence coverage rate with the knowledge base method and the lexical diversity of the proposed method were 30.8% and 0.36% higher than those of the Double-Propagation algorithm. Then, based on the extracted topic words, we created a cluster map to identify the topic keywords identified by the nodes cluster centrality. [Limitations] There is no domain knowledge base in the field of book reviews. [Conclusions] The proposed method based on Knowledge Base improves the sentence coverage and lexical diversity of topics extracted from book reviews.

    References | Related Articles | Metrics
    Disambiguating Author Names Automatically for Institutional Repository
    Wangqiang Zhang,Zhongming Zhu,Yamei Li,Linong Lu,Wei Liu
    2019, 3 (6): 92-98.  DOI: 10.11925/infotech.2096-3467.2018.0245
    Abstract   HTML ( 2 PDF (1241KB) ( 31 )

    [Objective] This paper tries to automatically finish the disambiguation of author names in institutional repositories, and then provide human intervention mechanism at the right time. [Methods] First, we analyzed the unqiue features of the author name disambiguation. Then, we constructed a general disambiguation framework for the institutional repository. [Results] Our framework achieved good results in practice with more than 99% of precision. [Limitations] We did not examine the author names without affiliation addresses, and there may be exceptions in the alias of authors and institutions. [Conclusions] This framework could effectively disambiguate author names in institutional repositories, which helps us provide more value-added services.

    References | Related Articles | Metrics
    Deep Neural Network Learning for Medical Triage
    Kan Liu,Lu Chen
    2019, 3 (6): 99-108.  DOI: 10.11925/infotech.2096-3467.2018.0824
    Abstract   HTML ( 5 PDF (953KB) ( 45 )

    [Objective] This paper proposes an algorithm to accurately assign specialists for outpatients based on their major complaints and medical histories. [Methods] We applied the convolutional neural network model to classify the medical short texts, and learn the correlation between medical terms, which were the tasks for pre-training. Then, we examined the structure, parameters and weights of the pre-trained model with actual texts of main complaint and medical history. Finally, we modified the network to obtain the final learning outcome. [Results] The F-score of the proposed approach reached 88% with the sample dataset, which was 6% higher than that of the current best baseline model. The pre-trained model significantly improved the training efficiency. [Limitations] We did not directly work with the patient’s actual complaints at the triage desk. We only used their electronic medical records, which might yield inaccurate results. [Conclusions] The proposed triage model improves the efficiency of medical triage, and promote precision medical treatment for patients.

    References | Related Articles | Metrics
    Visualizing Knowledge Graph of Academic Inheritance in Song Dynasty
    Haici Yang,Jun Wang
    2019, 3 (6): 109-116.  DOI: 10.11925/infotech.2096-3467.2018.1240
    Abstract   HTML ( 3 PDF (1047KB) ( 43 )

    [Objective] The study constructs a knowledge graph of academic relationships among scholars in China’s Song Dynasty, aiming to provide new techniques for knowledge exploration in humanity research. [Context] Our study addresses the usability and visualization issues facing digital collections (i.e. the China Biographical Database Project), and establishes a knowledge portal for history researchers and amateurs. [Methods] First, we built the ontology of ancient Chinese scholars. Then, we transformed their relationship to RDF data for the knowledge graph. Finally, we created an online platform to demonstrate the visualization results. [Results] We created the knowledge graph with five classes and 39 relationships based on 48,018 peoples and 6,599 geographic data. The Song’s Academic Inheritance Platform integrates the RelFinder visualization tool to display the entities’ relationships in the knowledge graph. [Conclusions] This study offers practical solutions for semantic research on the China Biographical Database Project and related fields in history.

    References | Related Articles | Metrics
    Classifying Baidu Encyclopedia Entries with User Behaviors
    Zhenyu He,Xiangxiang Dong,Qinghua Zhu
    2019, 3 (6): 117-122.  DOI: 10.11925/infotech.2096-3467.2018.1209
    Abstract   HTML ( 13 PDF (599KB) ( 54 )

    [Objective] This paper classifies Baidu encyclopedia entries based on users’ information behaviors, aiming to identify entries with high potential values. [Methods] We chose the usage and recognition levels as indicators, and proposed a new entry classification model base on Boston matrix and BP neural network. [Results] We classified the Baidu encyclopedia entries automatically with usage indicators and created development strategies for each category. Our new model correctly identified each entry’s category information. [Limitations] More research is needed to study the newly generated entries and features difficult to quantify. [Conclusions] This research proposed an effective method to automatically classify online encyclopedia entries.

    References | Related Articles | Metrics
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938