Home Table of Contents

25 March 2019, Volume 3 Issue 3
    

  • Select all
    |
  • Xuhui Li,Yang Liu
    Data Analysis and Knowledge Discovery. 2019, 3(3): 1-13. https://doi.org/10.11925/infotech.2096-3467.2018.0321
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper aims to summarize spatio-temporal data modeling methods to provide a theoretical basis for the study of the organization and management of space-time knowledge. [Coverage] Literature search was conducted using the “spatial-temporal data model” and “spatio-temporal data model”, respectively, within a limited time frame and journal type, in Baidu Academic, Google Scholar, CNKI, and Engineering Village. Some relevant documents were obtained and screened according to the degree of relevance of the research topic. Finally we chose 64 related documents to review. [Methods] The spatio-temporal data model is classified according to the level of abstraction of the modeled objects, and the related researches on the spatio-temporal data model are summarized from three levels of the physical layer, logic layer and application layer. [Results] In recent years, the studies of the spatio-temporal data model in the physical layer mainly focused on the revision of the previous model. The spatio-temporal data models of the application layer focused on satisfying the specific needs of various fields. However, the researches on the logic layer need to be improved in terms of expression ability. [Limitations] There are few horizontal comparison studies of spatial-temporal data models at different levels. [Conclusions] Large-scale space-time information management and utilization will provide broad space for development of space-time data modeling in the future.

  • Hongxia Xu,Chunwang Li
    Data Analysis and Knowledge Discovery. 2019, 3(3): 14-24. https://doi.org/10.11925/infotech.2096-3467.2018.0607
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] The paper reviews knowledge extraction of scientific literature. [Coverage] We searched research literatures in CNKI and Google Scholar, and then obtained a total of 68 representive literatures on knowledge extraction. [Methods] We used literature survey method. First, we reviewd knowledge extraction in the Library & Information Science and Computer Science. Then, we classified and summarized the key extraction technology. [Results] Investigating the current research status and technological system, this paper gives the pros & cons and the roadmap of knowledge extraction technology. [Limitations] There is little comparative study on knowledge extraction is different subjects. [Conclusions] The research framework is helpful to get a thorough understanding of the present status and provides some good advice for scholars.

  • Guangshang Gao
    Data Analysis and Knowledge Discovery. 2019, 3(3): 25-35. https://doi.org/10.11925/infotech.2096-3467.2018.0784
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper discusses the mechanism of User Profiles construction process from the perspectives of design thinking and data types. [Coverage] We used Google Scholar and CNKI to search literatures with the keywords “User Personas” and “User Profiles”. Then we selected 90 representative literatures on User Personas in conjunction with topic screening, intensive reading and retrospective method. [Methods] Firstly, this paper studies the construction process of User Profiles from the perspective of design thinking, specifically combining the four perspectives of Goal-Directed, Role-Based, Engagement-Based and Fiction-Based. Second, it analyzes construction process of User Profiles from the perspective of data types, specifically combining Ontology or Concept, Subject or Topic, Interest or Preference, Behavior or Log, Multidimension or Fusion. Next, the construction methods are compared in detail from three aspects: logical ideas, performance characteristics and limitations. Finally, the next step for research on User Profiles is prospected. [Results] User Profiles technology plays a vital role in many areas such as online public opinion governance, advertising marketing and personalized services. [Limitations] There is no in-depth analysis of the evaluation indicators of User Profiles algorithms. [Conclusions] Although the existing methods of User Profiles can meet the needs of many applications to a certain extent, in the era of big data, it still faces the challenges of data sparsity, scene intelligence perception and user interest migration.

  • Zhen Zhang,Jin Zeng
    Data Analysis and Knowledge Discovery. 2019, 3(3): 36-44. https://doi.org/10.11925/infotech.2096-3467.2018.0573
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper tries to automatically extract keywords from user comments, aiming to help both buyers and sellers find valuable information. It supports the decision making of customers and provides feedbacks to improve online services. [Methods] Firstly, we defined the task of extracting keywords from user comments. Then, we proposed evaluation criteria from the perspectives of merchants and customers. Thirdly, we constructed a language model based keyword extraction method (LMKE). Finally, we collected experimental data from Meituan.com, and compared the performance of our method with two existing ones, i.e., TF-IDF and TextRank. [Results] The scores of our LMKE method were 0.7665, 0.6701, 0.6200, 0.8187, 0.7326 and 0.6743 with P@5, P@10, P@20, nDCG@5, nDCG@10 and nDCG@20. [Limitations] Our dataset was only built with user’s comments on buffet services in Wuhan, China. [Conclusions] The discriminative LMKE model has better performance than those of the TF-IDF and TextRank.

  • Shengchun Ding,Linlin Hou,Ying Wang
    Data Analysis and Knowledge Discovery. 2019, 3(3): 45-56. https://doi.org/10.11925/infotech.2096-3467.2018.0609
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] To solve the deficiencies of the static information and dynamic comments on the existing e-commerce platform, the concept of product profile was introduced. [Methods] As a method of mining, organizing, storing and displaying product information, knowledge map is introduced into the research of product profile construction, and a product profile construction method based on knowledge map is proposed. [Results] Three experiments were designed to generate the mobile phone knowledge map data layer, in which the F value of the named entity extraction experiment reached 77.52%, the F value of the evaluation object-evaluation word extraction experiment reached 76.04%, and the F value of the synonym discovery experiment was 63.16%. The experimental results verified the effectiveness of the proposed method. [Limitations] The relationship extraction in product profile construction limits the relationship category, so that the number of relationships in profile is limited; the analysis of the product market circulation dimensions is limited. [Conclusions] This study has effectively helped the shopping platform to improve product comparisons and product search mechanisms to provide users with better products and services.

  • Yue Yuan,Dongbo Wang,Shuiqing Huang,Bin Li
    Data Analysis and Knowledge Discovery. 2019, 3(3): 57-65. https://doi.org/10.11925/infotech.2096-3467.2018.0213
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] In the context of digital humanities, in order to excavate the corresponding knowledge from the Pre-Qin literature more deeply and accurately, for different parts of the set of lexicon in the class of entity extraction model on the differences in the study. [Methods] Based on the training and testing corpora consisting of “Zuo Zhuan” and “Guo Yu” which have been manually labeled by the machine, three tagging sets of different sizes are formed, with the Pre-Qin part-of-speech tagging set of Nanjing normal university as the main part, supplemented by the part-of-speech tagging sets of Peking University, the Institute of Computing Technology of Chinese Academy of Sciences and the Ministry of Education. The differences between the results of the entity extraction on the same corpus were compared by using the conditional random field and the feature templates. [Results] Comparative experiments were carried out on three part-of-speech tagging sets of different sizes in the Pre-Qin classics “Zuo Zhuan” and “Guo Yu”. The F values of the three models were 82.53%, 83.42% and 84.07%, respectively. [Limitations] Feature selection needs further improvement, and training results can be improved. [Conclusions] The result is helpful for the extraction of the named entities in the ancient literature of the Pre-Qin period. The set of part-of-speech tags constructed is suitable for the part-of-speech tagging of ancient Chinese.

  • Sisi Gui,Wei Lu,Xiaojuan Zhang
    Data Analysis and Knowledge Discovery. 2019, 3(3): 66-75. https://doi.org/10.11925/infotech.2096-3467.2018.0550
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper investigates the effectiveness of query-based features and compares the performance of two types of classifiers in a query temporal intent classification task. [Methods] This paper first reviews all query-based features and then classifies those features into three types, according to their temporal relevance, namely, atemporal, implicit temporal and explicit temporal. Then, it tests accuracy of a temporal query intent classification task, using a supervised classifier and a semi-supervised classifier individually, with various combinations of query-based features of different types. [Results] Among all tested query-based features, using explicit temporal features achieves best accuracy, especially for the feature on whether a query contains a year; The performance hardly varies across classifiers; Our best macro average accuracy of 81.14% is higher than that in previous studies with the same experimental setups. [Limitations] Due to accessibility of dataset, our experiments are done on a limited size dataset. Only existing query-based features are studied and no new feature is proposed or tested. [Conclusions] Using highly temporal relevant features can improve accuracy in temporal query intent classification task, whereas using slightly temporal relevant features could hardly improve accuracy.

  • Qingmin Liu,Changqing Yao,Chongde Shi,Xiaojie Wen,Yueying Sun
    Data Analysis and Knowledge Discovery. 2019, 3(3): 76-82. https://doi.org/10.11925/infotech.2096-3467.2018.0684
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper optimizes the vocabulary of Neural Machine Translation (NMT) in scientific and technical domain for the problem of vocabulary limitation and improves the translation performance. [Methods] Based on the word formation and Point-wise Mutual Information(PMI), the paper proposes a method to optimize the vocabulary while preserving the integrity of the lexical semanteme which reduces the number of unknown words. [Results] The NTCIR-2010 corpus and abstract of journal articles in the domain of automation and computer were selected for experiments. The experimental results were compared with the segmentation method and the sub-word method, and it proved the effectiveness of the method. [Limitations] This paper did not cover the optimization of non-Chinese characters. [Conclusions] The experiments show that in scientific and technical domain, the vocabulary optimization algorithm based on scientific word formation achieves better translation performance.

  • Xiwei Wang,Duo Wang,Qingxiao Zheng,Ya’nan Wei
    Data Analysis and Knowledge Discovery. 2019, 3(3): 83-94. https://doi.org/10.11925/infotech.2096-3467.2018.0487
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This study conducted a text analysis for social media information interaction of VR companies with the purpose of investigating how VR companies enhance their competitiveness by this type of information interaction. [Methods] We conducted text mining and text analysis of 4 VR enterprise social networking sites by using data analysis tools such as NVivo11. [Results] The results show significant difference between different companies’ information interaction. Online brand communities information interaction can help enterprises to improve the user stickiness and new product information dissemination. [Limitations] We only selected two social networking platforms and one industry in this research. [Conclusions] This study provides a new perspective of information interaction research, and makes contribution to help enterprises to enhance their competitiveness by information interaction in online brand communities.

  • Peiyao Zhang,Dongsu Liu
    Data Analysis and Knowledge Discovery. 2019, 3(3): 95-101. https://doi.org/10.11925/infotech.2096-3467.2018.0625
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper aims to correctly grasp the topic development trend by constructing a microblog topic evolution method, and it is of great significance for public sentiment warning. [Methods] Firstly, the Ship-gram model is used to train the word vector model on the text set. Input the text of each time slice into the BTM to get the candidate theme. In BTM thematic dimension, the theme word vector is constructed. Secondly, k-means algorithm is used to cluster the theme word vector to get the fused theme. And the topic evolution of the text set on time slice is established. [Results] The experimental results show that the F value of this method is 75%, which is about 10% higher than that of the topic model. This proves the feasibility of the proposed method. [Limitations] There is no definite measuring standard for topic evolution, and there is no comparison between various methods of topic evolution. [Conclusions] The proposed method can effectively extract topics at all stages and provide an effective way for network public opinion analysis.

  • Xiang Li,Xiaodong Qian
    Data Analysis and Knowledge Discovery. 2019, 3(3): 102-111. https://doi.org/10.11925/infotech.2096-3467.2018.0837
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper aims to explore the factors influencing consumer convergence in e-commerce. [Methods] Based on the BBV model, this paper optimized that model from the following two aspects in view of characteristics of the commodity-consumer binary network: selecting the nodes partially preferred and partially random and separately defining the weight distribution method of two types of nodes in the network during evolution. By comparing the evolution process and results of the model under different parameters, explored the impact of node weight, random factor and increase ratio of two types of nodes on consumer convergence. [Results] The evolution result proved that consumer convergence is influenced by node weight, random factor and increase ratio of two types of nodes. [Limitations] Only some typical parameters were selected, and the parameters lacked continuity. [Conclusions] Good initial online evaluation of product, high consumer rationality and low commodity market activity all contribute to a higher level of consumer convergence.

  • Zhiqiang Wu,Zhongming Zhu,Wei Liu,Sili Wang
    Data Analysis and Knowledge Discovery. 2019, 3(3): 112-119. https://doi.org/10.11925/infotech.2096-3467.2018.0903
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper aims to expand the function of knowledge analysis and visualization in CSpace, and realize the full integration of knowledge analysis and visualization services into the user’s knowledge utilization and knowledge innovation process. [Context] The function of knowledge analysis and visualization is an important development direction of institutional repository research and construction. Expanding their functions could provide users with better quality knowledge services in the process of knowledge dissemination and utilization. [Methods] First, we rebuilt the knowledge analysis and visualization functional framework. Then, we upgraded Solr index and optimized the associated storage structure of knowledge based on Solr sub document. We designed and implemented organization data, project data, journal data specification and management functions, used Echarts to build a modular, flexible embedded visualization tool set, improved the basic service capabilities of knowledge analysis and visualization. Finally, we optimized and reconstructed the function of knowledge analysis and visualization based on user’s knowledge application requirements. [Results] The extension of knowledge analysis and visualization function which can provide with more fine-grained knowledge analysis, flexible customization, ubiquitous map visualization and export functions in CSpace is realized, and deployed and applied in more than 30 scientific research institutions and universities. Limited by the data normative problem, the developed subject analysis function has not been put into practical use. [Conclusions] The ability building for knowledge analysis and visualization based on user needs enhance knowledge service attribute in institutional repository, and can effectively promote knowledge utilization and knowledge innovation.

  • Zhiqiang Liu,Yuncheng Du,Shuicai Shi
    Data Analysis and Knowledge Discovery. 2019, 3(3): 120-128. https://doi.org/10.11925/infotech.2096-3467.2018.0655
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    [Objective] This paper aims to solve key information extraction problems in news web pages, such as title, date, source, and text, by Hidden Markov Model (HMM). [Methods] The web document was transformed into a DOM tree and preprocessed. The information items to be extracted were mapped to state, and the observation value of the extracted items was mapped to vocabulary. The application of HMM in key information extraction of web news was studied, and the algorithm was improved. [Results] Using the improved HMM algorithm, the accuracy rate can reach 97% on average in the websites. [Limitations] The extraction model is slightly insufficient in classification ability, and it is impossible to accurately extract the slightly differences. [Conclusions] The experiment proves that this method has the advantages of high recognition accuracy, strong modeling ability, and fast training speed with small set of tracing data.