Current Issue
    , Volume 31 Issue 10 Previous Issue    Next Issue
    For Selected: View Abstracts Toggle Thumbnails
    Automatic Quality Evaluation of Social Tags
    Zhang Chengzhi, Li Lei
    2015, 31 (10): 2-12.  DOI: 10.11925/infotech.1003-3513.2015.10.02
    Abstract   HTML   PDF (581KB) ( 107 )

    [Objective] It's important to improve application performance of social tags by selecting or recommending tags with high quality automatically. [Methods] The existing research on quality evaluation of social tags are separated into content and social attributes of tags, which don't combine these two attributes to evaluate the social tags. In this paper, the authors use tag's content and social attributes to evaluate the quality of tags by statistical machine learning model. [Results] Exprimental results show that with combining content and social attributes of tags, the quality evaluaton model based on SVM outperforms other models. [Limitations] Only use the blog tag data to evaluate the quality of social tags. The performance based on the social attributes are not perfect. Some social attributes can not effectively improve the automatic classification of social tags' quality. [Conclusions] This work is useful for improving the performance of the tags organization and related application.

    References | Related Articles | Metrics
    Difference Research on Keywords Tagging Behavior for Academic User Blog——A Case Study of ScienceNet.cn
    Zhang Yingyi, Zhang Chengzhi, Chi Xuehua, Li Lei
    2015, 31 (10): 13-21.  DOI: 10.11925/infotech.1003-3513.2015.10.03
    Abstract   HTML   PDF (404KB) ( 113 )

    [Objective] This paper aims to provide the basis for optimizing the annotation system and enrich user annotations behavior research under the network environment. [Context] Differences research on keywords tagging behavior among different groups is one of the major works in user information behavior research. [Methods] To analyze the differences types of ScienceNet.cn user's annotation behavior, this paper selects keywords tagging ratio, user-generated keywords tagging ratio, user-generated keywords average number, user-generated keywords average length and user-generated keywords average reuse ratio from the perspective of the way for tagging system, keywords structure and tagging motivation. [Results] The results show that the users with different occupation, major, register time and blog published frequency have significant differences on some tagging behaviors, but the users with different gender and education have no significant differences on all the tagging behaviors. [Conclusions] Academic blog can optimize the tagging system according to the differences of different user's annotation behavior.

    References | Related Articles | Metrics
    Clustering Machine-Generated Tags with Different Quality
    Zhang Chengzhi, Gu Xiaoxue
    2015, 31 (10): 22-29.  DOI: 10.11925/infotech.1003-3513.2015.10.04
    Abstract   HTML   PDF (761KB) ( 107 )

    [Objective] Conventional tags or words clustering haven't considered the impact of clustering members' quality to clustering results. This paper aims to analyze the differences in clustering results of different quality of the clustering machine-generated tags and make suggestions to improve the clustering result with fusion of tag quality. [Methods] Firstly, fetch the data of Engadet's blogs in Chinese and English, preprocess the data and get the candidate tags, extract tags' social and content features to calculate their weight. The authors use two strategies to distinguish different quality tags and obtain different tag sets. Then calculate the similarities of these tag sets and use AP algorithm to get clustering results, which could be compared and analyzed. [Results] The experiment results show that, for both Chinese and English tags, clustering results of Top5 tags are better than Top5-10, and clustering results of weighted social attributes of tags are better than non-weighted tags. [Limitations] The method of distinguishing tags' quality is relatively simple and lacking of effective method to evaluate the quality of tags. [Conclusions] Clustering results of machine-generated tags with high quality are better than clustering results of tags with low quality. The clustering performance of machine-generated tags can be improved by weighting the social attribute. At the same time, the social attribute of tags can be used to evaluate the quality of them.

    References | Related Articles | Metrics
    Combined with Annotated Content and User Attributes for Tag Clustering
    Gu Xiaoxue, Zhang Chengzhi
    2015, 31 (10): 30-39.  DOI: 10.11925/infotech.1003-3513.2015.10.05
    Abstract   HTML   PDF (611KB) ( 389 )

    [Objective] Explore the impact of tags' annotated content and tags' user attributes and their combinations in tag clustering. [Methods] Using ScienceNet.cn blogs, extract tag feature, build a vector space model and calculate the similarities between tags where linear method and Sigmod method are used to weight them, finally use the AP algorithm to cluster the tags. [Results] Experimental evaluation results show that in subject classification, in combination of annotated content and user attributes, two types of weighting methods can improve the clustering results, and the performace of Sigmod method is optimal; while in systematic classification, the combination of these two features can't perform as well as the former one and even worse than the content feature. [Limitations] The data selected for experiment is small and the classification for estimating the clustering results is not perfect. What's more, AP clustering algorithm lacks the ability to deal with big data. [Conclusions] The combination of these two features can improve the tag clustering results in some cases, and we should focus more on tag's content in tag clustering.

    References | Related Articles | Metrics
    Survey on Hashtag Mining and Its Application
    Shao Jian, Zhang Chengzhi, Li Lei
    2015, 31 (10): 40-49.  DOI: 10.11925/infotech.1003-3513.2015.10.06
    Abstract   HTML   PDF (514KB) ( 157 )

    [Objective] The authors analyze Hashtag research, summarize the current problems in Hashtag research. After refining the theoretical and practical significance of Hashtag research, then present further research of Hashtag. [Coverage] About 60 literatures from international conferences and journals (2007-2015) are investigated. [Methods] Survey on Hashtag mining and its application and summarize different methods on Hashtag mining. The process and different methods of Hashtag mining are analyzed. [Results] There are some problems about user Hashtag using, mining and applications. [Conclusions] Further study should be focused on theory of Hashtag, e.g. motivation of Hashtag using, and reasons that affect Hashtag using. The performance of Hashtag application should be improved by combined of the methods and technologies from different disciplines.

    References | Related Articles | Metrics
    Co-evolution of Social Networks and Public Opinion Considering the Effect of Trust and Authority
    Zhu Hou
    2015, 31 (10): 50-57.  DOI: 10.11925/infotech.1003-3513.2015.10.07
    Abstract   HTML   PDF (3052KB) ( 106 )

    [Objective] Study on co-evolution rules of social networks and public opinion considering the effects of trust and authority. [Methods] Design a computational model of trust and authority, express interactive mechanism of public opinion by relative agreement model, then analyse the co-evolving process of dynamic social networks and public opinion based on computational experiment. [Results] The experiment results show that the consistency of public opinion in the scenario of dynamic social networks is lower than the static networks, and the informal groups are easier to form. The trust values follow the power-law distribution, but it is not sure that the authoritative individuals hold high trust friendship. [Limitations] The cognitive computational models are embedded into the opinion model through parameter passing, and the synergistic mode between them needs to be improved. [Conclusions] Trust and authority influence the co-evolution process significantly; the organization should control the authoritative individuals to guide the direction of public opinion.

    References | Related Articles | Metrics
    Spillover Effect of Internet Word of Mouth in Negative Events——Take the “Deadly Yuantong Express” Event for an Example
    He Yue, Song Lingxi, Qi Liyun
    2015, 31 (10): 58-64.  DOI: 10.11925/infotech.1003-3513.2015.10.08
    Abstract   HTML   PDF (684KB) ( 109 )

    [Objective] Study on spillover effect of Internet Word of Mouth on enterprise brand, as the basis for enterprise to take timely measures to deal with risks. [Methods] This paper uses information entropy method to build evaluation index system of spillover effect of Internet Word of Mouth, and make comparative analysis of spillover effect direction and intensity of Internet Word of Mouth based on “Deadly Yuantong Express” event on Sina Microblog. [Result] The experiment result shows that users produce strong negative emotional tendencies during the process of the entire event. The strength and direction of spillover effect of the event on competitive brands are different. The intensity of negative spillover effect is higher than the positive. The duration of negative spillover effect is longer than positive. [Limitations] There is lack of analyzing the spillover effect of Internet Word of Mouth on other related enterprises. [Conclusions] The proposed index system can be used to monitor the spillover effect direction and intensity of Internet Word of Mouth in negative event.

    References | Related Articles | Metrics
    A Brusher Detection Method Based on Principle Component Analysis and Random Forest
    Zhang Liyi, Zhang Jiao
    2015, 31 (10): 65-71.  DOI: 10.11925/infotech.1003-3513.2015.10.09
    Abstract   HTML   PDF (539KB) ( 212 )

    [Objective] A new model based on Principle Component Analysis and Random Forest is proposed aiming to detect Taobao brushers, decrease the dimensions of indicators and improve recognition rate. [Methods] This article uses Principle Component Analysis to reduce dimensions and uses Random Forest to classify users. In order to reflect the superiority of the detection model, it also builds detection models respectively based on KNN and SVM using the same data for different model training to compare the detection accuracy and efficiency of these models. [Results] The experimental results show that the detection model on the Principle Component Analysis and Random Forest gets 88.0% accuracy within 3 minutes. [Limitations] Most data is from third-party platforms which cannot fully reflect the all Singlebrush types. [Conclusions] The detection model on the Principle Component Analysis and Random Forest has higher detection accuracy and efficiency.

    References | Related Articles | Metrics
    Research on Follow Influence of Triadic Structure in Social Network——Take Student Relation Network as an Example
    Wu Jiang, Zhang Jinfan
    2015, 31 (10): 72-80.  DOI: 10.11925/infotech.1003-3513.2015.10.10
    Abstract   HTML   PDF (837KB) ( 136 )

    [Objective] Study on the effects of different triadic structures on follow influence in relation formation. [Methods] This paper uses questionnaires on 221 students at different time to get the dynamic evolution process of this network, and then analyzes the effects of different triadic structures on relation formation. [Results] The results show that triadic structures with reciprocity, transitivity and revesed relationship are more likely to form a new relation. [Limitations] This paper is unable to completely control the influences besides relation network. [Conclusions] The pattern of online and offline relation formation is the same, which is valuable for bussiness.

    References | Related Articles | Metrics
    Automatic Annotation of Bibliographical References in Chinese Patent Documents
    Jiang Chuntao
    2015, 31 (10): 81-87.  DOI: 10.11925/infotech.1003-3513.2015.10.11
    Abstract   HTML   PDF (431KB) ( 112 )

    [Objective] This paper aims to automatically annotate four types of bibliographical references in Chinese patent documents, such as patents, standards, papers, and other monographs public documents. [Methods] Use a pattern matching approach to annotate the references of patents, standards, and public documents, and use a two-phase machine learning approach to annotate the paper references, firstly, automatically detecte the sentences that contain citation information, then extracte 6 categories of bibliographic features from the results. [Results] The results of ten-fold cross validation show that the accuracy for annotating patents is 100%, and the precision and recall for annotating standards is 92% and 94% respectively, while the precision and recall for annotating public documents is 80% and 71% respectively. For annotating paper references, the precision and recall in phase one is 95.7% and 96.0% and in phase two is 95.3% and 94.9% respectively. [Limitations] The pattern matching approach requires analyzing a lot of patent documents manually, and the size of the training model used by the proposed machine learning approach is relatively small. [Conclusions] The performance of annotating patents and standards using a pattern matching approach achieves over 92%, and the performance of annotating papers using a machine learning approach achieves 95%.

    References | Related Articles | Metrics
    A Chinese Term Extraction System in New Energy Vehicles Domain
    He Yu, Lv Xueqiang, Xu Liping
    2015, 31 (10): 88-94.  DOI: 10.11925/infotech.1003-3513.2015.10.12
    Abstract   HTML   PDF (426KB) ( 145 )

    [Objective] The problem of Chinese term extraction in new energy vehicles domain is a key problem which needs a special method to improve the precision and recall rate. [Methods] This paper uses conditional random fields model as extraction model, select the word, word length, part of speech, dependencies, dictionary location, stop words and other characteristics as the feature templates. [Results] Experimental results show that the precision and recall are 93.12% and 90.47% respectively. This method improves the performance by 7.73% when compared with the baseline in terms of accuracy. [Limitations] This method can only improve part of the accuracy of the results. [Conclusions] Dependency as one of the conditional random fields model features can improve the precision and recall rate in new energy vehicles domain.

    References | Related Articles | Metrics
    The Design and Implementation of Open Engine System for Scientific & Technological Knowledge Organization Systems
    Wang Ying, Zhang Zhixiong, Li Chuanxi, Liu Yi, Tang Yijie, Zhou Zijian, Qian Li, Fu Honghu
    2015, 31 (10): 95-101.  DOI: 10.11925/infotech.1003-3513.2015.10.13
    Abstract   HTML   PDF (1866KB) ( 115 )

    [Objective] This paper aims to realize the sharing and utilization of the Scientific & Technological Knowledge Organization System(STKOS). [Context] An effective storage and access engine system is the prerequisite for knowledge organization system to realize its utilization. [Methods] The open engine system for STKOS is designed and implemented, which includes the semantic storage and index system, the semantic query and reasoning kernel, STKOS APIs for search, browse, association and navigation of STKOS elements, and the open query and reasoning interface for external applications. [Results] This engine system is used for the constructions of the STKOS publishing service platform and a third-party retrieval system based on STKOS. [Conclusions] The open STKOS engine system can bring convenience for science and technology literature information agencies and researchers to use STKOS.

    References | Related Articles | Metrics
    Personalized Book Recommender Service Deployment Using Apache Mahout
    Liu Dan
    2015, 31 (10): 102-108.  DOI: 10.11925/infotech.1003-3513.2015.10.14
    Abstract   HTML   PDF (966KB) ( 138 )

    [Objective] Through providing personalized book recommender service, enrich resource discovering methods and promote user awareness, book circulation under the situation of circulation decrease. [Methods] Using Apache Mahout, by normalizing circulation log data, using boolean user-based collaborative filtering recommendation with logarithm similarity algorithm, personalized book recommendations are generated and embedded in OPAC. [Results] Personalized book recommendations are embedded in OPAC, with automatic update every 10 days, and top 10 books are rendered to readers without recommendations. [Limitations] Lack of preference data, available recommenders are limited to boolean user-based recommenders. [Conclusions] The personalized book recommendation service receives attention and good fame. 7.5% readers click and read the recommendations, and about 3.1% borrow the recommended book.

    References | Related Articles | Metrics
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn