Current Issue
    , Volume 29 Issue 9 Previous Issue    Next Issue
    For Selected: View Abstracts Toggle Thumbnails
    The Interoperability Needs and Standards Framework for Institutional Repositories
    Liang Na, Zhang Xiaolin
    2013, 29 (9): 1-7.  DOI: 10.11925/infotech.1003-3513.2013.09.01
    The paper describes the three use scenarios of Institutional Repositories (IR) as knowledge management, knowledge services, and e-Research & e-Learning, emphasizes the need to consider technical, semantic, and management interoperabilities from multiple stakeholders viewpoints, constructs a needs framework for interoperability, and systematically introduces basic, extended, and management standards already in place and in development.
    References | Related Articles | Metrics
    Knowledge Organization Tool Catering to Service: Today and Future
    Xie Jing, Qian Aibing, Han Pu, Su Xinning
    2013, 29 (9): 8-14.  DOI: 10.11925/infotech.1003-3513.2013.09.02
    From the perspective of knowledge service, this paper divides knowledge organization tools into three groups: tools for basic knowledge acquisition and systematization, tools for knowledge relationship establishing, and tools for knowledge processing and visualization. Tools for basic knowledge acquisition and systematization render push services for knowledge elements. Tools for knowledge relationship establishing mainly work on the identification of knowledge relationship and support inference services together with tools for basic knowledge acquisition and systematization. Tools for knowledge processing and visualization are used in the procedure of knowledge extraction, identification and visualization. After the procedure, these tools render user-oriented services by knowledge reorganization. Finally, the paper discusses future trends of knowledge organization tools and points out the characteristics of future tools.
    References | Related Articles | Metrics
    Linking and Mapping of Library Catalogue Data Based on MapReduce
    Yu Wei, Chen Junpeng
    2013, 29 (9): 15-22.  DOI: 10.11925/infotech.1003-3513.2013.09.03
    In this paper, the MARC data is transformed to linked data, based on MapReduce model and MODS Onto-logy. Through the mapping among different linked open data sets, the library catalogue data can become part of the linked open data community and provide efficient semantic data to knowledge discovery and semantic service.
    References | Related Articles | Metrics
    Decoding Optimization in Tree Transducer based Translation Model
    Shi Chongde, Qiao Xiaodong, Wang Huilin
    2013, 29 (9): 23-29.  DOI: 10.11925/infotech.1003-3513.2013.09.04
    This paper proposes two methods to improve the efficiency of rule binarization and decoding in tree transducer based translation model. The authors convert synchronous transducer rules to four kinds of binary rules to reduce the temporary items, and propose RR-CKY decoding algorithm, which can avoid part of redundant items along with decoding. The experiments show that these two methods can reduce the number of temporary items and make decoding faster. They can also improve the quality of machine translation.
    References | Related Articles | Metrics
    Study on Keyword Extraction Using Word Position Weighted TextRank
    Xia Tian
    2013, 29 (9): 30-34.  DOI: 10.11925/infotech.1003-3513.2013.09.05
    The keyword extraction problem is taken as a word importance ranking problem. In this paper,candidate keyword graph is constructed based on TextRank, and the influences of word coverage, location and frequency are used to calculate the probability transition matrix, then, the word score is calculated by iterative method, and the top N candidate keywords are picked as the final results. Experimental results show that the proposed word position weighted TextRank method is better than the traditional TextRank method and LDA topic model method.
    References | Related Articles | Metrics
    Identifying Synonyms Based on Sentence Structure Analysis
    Yu Juan, Yin Jidong, Fei Shu
    2013, 29 (9): 35-40.  DOI: 10.11925/infotech.1003-3513.2013.09.06
    A new method of identifying synonyms is proposed for the purpose of reducing the deviation when calculating the semantic similarity between two different terms or phrases. The method first analyzes sentence structures of the concerned terms (or phrases), and then calculates the semantic similarity between two terms (or phrases) based on Tongyici Cilin (a Chinese thesaurus). This method weights each word in the concerned terms (or phrases) equally to reduce identifying errors made by gravity-centre-backward methods. Experiments show that the proposed method of identifying synonyms is accurate and has good potentials for text mining and semantic retrieval applications.
    References | Related Articles | Metrics
    Fast Duplicate Detection for Chinese Texts Based on Semantic Fingerprint
    Li Gang, Mao Jin, Chen Jinghao
    2013, 29 (9): 41-47.  DOI: 10.11925/infotech.1003-3513.2013.09.07
    Oriented to Chinese texts, text features are firstly extracted to generate semantic fingerprints by performing the Simhash algorithm. The Hamming Distances between semantic fingerprints are applied to determine the similarity between texts. Then, as the last step of the entire process of detecting duplicates for Chinese text, the Single-Pass clustering algorithm is integrated to cluster the generated semantic fingerprints, after which the clusters of fingerprints are the final results. By comparing with the Shingle algorithm, the experiment shows that the Simhash approach is superior at both precise and robustness, and the Simhash approach is capable to process large amount of texts due to its rapidness.
    References | Related Articles | Metrics
    Authorship Identification of Chinese UGC Based on Stylistics
    Lv Yingjie, Fan Jing, Liu Jingfang
    2013, 29 (9): 48-53.  DOI: 10.11925/infotech.1003-3513.2013.09.08
    The characteristics of information network such as openness and virtuality make it difficult for authorship identification. Therefore, this paper proposes the approach of authorship identification of Chinese UGC based on stylistics. The authors integrate four types of features including lexical, syntactic, structural and content-specific features to compose writing-style features, and then use text classification technologies for authorship identification. The experimental results demonstrate that the proposed approach can be used for authorship identification of Chinese UGC efficiently.
    References | Related Articles | Metrics
    An Automatic Term Extraction System of Improved C-value Based on Effective Word Frequency
    Xiong Liyan, Tan Long, Zhong Maosheng
    2013, 29 (9): 54-59.  DOI: 10.11925/infotech.1003-3513.2013.09.09
    Existing Chinese term automatic extraction methods focus on the high-frequency characteristics and unithood indicators of terms, while low frequency terms and termhood indicators lack of effective treatment methods. In response to these problems, this paper introduces the background corpus into C-value method and proposes the concepts of word field distribution degree and effective word frequency. Then the paper automatically extracts the terms by calculating EC-value (Effective C-value) of candidate terms, and improves the extraction performance of low-frequency terms combined with the term cluster recognition and mining. The term extraction experiment in the computer field shows that the proposed improved method (EC-value method) can measure the termhood of terms more effectively, and improve the extraction performance of low-frequency terms.
    References | Related Articles | Metrics
    Research on the Credibility of Online Chinese Product Reviews
    Meng Meiren, Ding Shengchun
    2013, 29 (9): 60-66.  DOI: 10.11925/infotech.1003-3513.2013.09.10
    This paper aims at filtering the lower credible online Chinese product reviews to offer valuable reviews for consumers’ purchase decision. Based on the deep analysis of the online Chinese product reviews’ characteristics, also with some related works, the authors make an empirical analysis on the credibility factors through questionnaires. According to the results of the empirical analysis, the authors select content integrity, emotional balance, review timeliness and clarity of the identity of the publisher as four features, use CRFs as reviews credibility’s classification model, and conduct feature combination experiments to get the best feature combination. The experiments achieve significant results, and the correct rates of the classification model are all above 75%. The research results of this paper can improve the existing artificial effectiveness evaluation method, thus offering new methods and thoughts for optimized filtering of the online reviews.
    References | Related Articles | Metrics
    Study on Network Information Ecological Chain of Chinese Shopping Websites
    Li Beiwei, Xu Yue, Shan Jimin, Wei Changlong, Zhang Xinqi, Fu Jinxin
    2013, 29 (9): 67-73.  DOI: 10.11925/infotech.1003-3513.2013.09.11
    Taking the information ecological chain of Chinese online shopping websites as the research object,this paper establishes an evaluation index system. Selecting 20 shopping websites as example, it grasps the distribution and characteristics of information ecological chain of Chinese online shopping websites through the factor analysis and cluster analysis, and the 20 shopping websites are classified according to the similarity of their development situation. Finally, aiming at the problems existing in the development, corresponding countermeasures and suggestion are put forward.
    References | Related Articles | Metrics
    Research on Microblog Ranking Strategy with the Social Relations
    Tang Xiaobo, Fang Xiaoke
    2013, 29 (9): 74-81.  DOI: 10.11925/infotech.1003-3513.2013.09.12
    The emergence of social media makes the environment of retrieving changed. Since the shortcomings of retrieving ranking in microblog, this paper analyzes the microblogging social network relationship, and proposes microblogging ranking strategy with the social relations. That means, social strength is added to the traditional PageRank ranking algorithm, and some related indicators including people popularity, information popularity, information quality, the time factor and some others are considered. The experimental results show that AVG has a higher accuracy, and it can obtain more social relationships compared with conventional ranking algorithm.
    References | Related Articles | Metrics
    Person Name Attribute Knowledge Mining and Its Application for Query Classification
    Zhang Mei, Duan Jianyong, Xu Jichao
    2013, 29 (9): 82-87.  DOI: 10.11925/infotech.1003-3513.2013.09.13
    There are many name entity queries in the Web logs, and person name queries are more than half of these queries. This paper uses Web logs and Wikipedia information to construct the person name knowledge base for the query recommendation. Firstly the person name entities are mined from Web logs and the attributes of these entities are combined by extracting from Wikipedia. With the help of the person name knowledge, the person names in the user queries are classified by the attribute patterns and statistic methods. Then related attribute knowledge is used to recommend the user Intents. The results show that the person name knowledge can be used effectively in the query classification.
    References | Related Articles | Metrics
    Research on Short Text Clustering Algorithm for User Generated Content
    Zhao Hui, Liu Huailiang
    2013, 29 (9): 88-92.  DOI: 10.11925/infotech.1003-3513.2013.09.14
    To solve the problem of weak semantic description ability of short text feature in user generated content, and the traditional K-means algorithm for document clustering is sensitive to the initial clustering center, this paper proposes that the semantic features information of short text can be supplied by feature extension based on the concept, link structure and category system of Wikipedia. Then the weighted complex network of short text set is built by the semantic relation of texts, and text clustering is achieved by node partitioning community based on K-means algorithm whose initial clustering center is chosen according to the synthetic characteristics of network nodes. Results of experiment show that the algorithm proposed by this paper can improve the effect of short text clustering.
    References | Related Articles | Metrics
    A Research of Knowledge Sharing Community Discovery Based on Interaction History Between Peers in P2P Networks
    Gao Haiyan, Dou Yongxiang, Qi Yilan
    2013, 29 (9): 93-98.  DOI: 10.11925/infotech.1003-3513.2013.09.15
    In this paper, a P2P community discovery method based on interaction history of knowledge sharing is proposed.At first, a research on the generation of interaction history during the knowledge sharing process is conducted,the user interaction network is formed on the basic of interaction history and the similarities between users are calculated. Then, the clustering analysis approach is used to discover the self-organized P2P knowledge sharing community. Finally, an experiment is designed to verify the feasibility and efficiency of the method.
    References | Related Articles | Metrics
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn