Data Analysis and Knowledge Discovery

Select

The Interoperability Needs and Standards Framework for Institutional Repositories

Liang Na, Zhang Xiaolin

New Technology of Library and Information Service. 2013, 29(9): 1-7. https://doi.org/10.11925/infotech.1003-3513.2013.09.01

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

The paper describes the three use scenarios of Institutional Repositories (IR) as knowledge management, knowledge services, and e-Research & e-Learning, emphasizes the need to consider technical, semantic, and management interoperabilities from multiple stakeholders viewpoints, constructs a needs framework for interoperability, and systematically introduces basic, extended, and management standards already in place and in development.

Select

Knowledge Organization Tool Catering to Service: Today and Future

Xie Jing, Qian Aibing, Han Pu, Su Xinning

New Technology of Library and Information Service. 2013, 29(9): 8-14. https://doi.org/10.11925/infotech.1003-3513.2013.09.02

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

From the perspective of knowledge service, this paper divides knowledge organization tools into three groups: tools for basic knowledge acquisition and systematization, tools for knowledge relationship establishing, and tools for knowledge processing and visualization. Tools for basic knowledge acquisition and systematization render push services for knowledge elements. Tools for knowledge relationship establishing mainly work on the identification of knowledge relationship and support inference services together with tools for basic knowledge acquisition and systematization. Tools for knowledge processing and visualization are used in the procedure of knowledge extraction, identification and visualization. After the procedure, these tools render user-oriented services by knowledge reorganization. Finally, the paper discusses future trends of knowledge organization tools and points out the characteristics of future tools.

Select

Linking and Mapping of Library Catalogue Data Based on MapReduce

Yu Wei, Chen Junpeng

New Technology of Library and Information Service. 2013, 29(9): 15-22. https://doi.org/10.11925/infotech.1003-3513.2013.09.03

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

In this paper, the MARC data is transformed to linked data, based on MapReduce model and MODS Onto-logy. Through the mapping among different linked open data sets, the library catalogue data can become part of the linked open data community and provide efficient semantic data to knowledge discovery and semantic service.

Select

Decoding Optimization in Tree Transducer based Translation Model

Shi Chongde, Qiao Xiaodong, Wang Huilin

New Technology of Library and Information Service. 2013, 29(9): 23-29. https://doi.org/10.11925/infotech.1003-3513.2013.09.04

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper proposes two methods to improve the efficiency of rule binarization and decoding in tree transducer based translation model. The authors convert synchronous transducer rules to four kinds of binary rules to reduce the temporary items, and propose RR-CKY decoding algorithm, which can avoid part of redundant items along with decoding. The experiments show that these two methods can reduce the number of temporary items and make decoding faster. They can also improve the quality of machine translation.

Select

Study on Keyword Extraction Using Word Position Weighted TextRank

Xia Tian

New Technology of Library and Information Service. 2013, 29(9): 30-34. https://doi.org/10.11925/infotech.1003-3513.2013.09.05

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

The keyword extraction problem is taken as a word importance ranking problem. In this paper,candidate keyword graph is constructed based on TextRank, and the influences of word coverage, location and frequency are used to calculate the probability transition matrix, then, the word score is calculated by iterative method, and the top N candidate keywords are picked as the final results. Experimental results show that the proposed word position weighted TextRank method is better than the traditional TextRank method and LDA topic model method.

Select

Identifying Synonyms Based on Sentence Structure Analysis

Yu Juan, Yin Jidong, Fei Shu

New Technology of Library and Information Service. 2013, 29(9): 35-40. https://doi.org/10.11925/infotech.1003-3513.2013.09.06

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

A new method of identifying synonyms is proposed for the purpose of reducing the deviation when calculating the semantic similarity between two different terms or phrases. The method first analyzes sentence structures of the concerned terms (or phrases), and then calculates the semantic similarity between two terms (or phrases) based on Tongyici Cilin (a Chinese thesaurus). This method weights each word in the concerned terms (or phrases) equally to reduce identifying errors made by gravity-centre-backward methods. Experiments show that the proposed method of identifying synonyms is accurate and has good potentials for text mining and semantic retrieval applications.

Select

Fast Duplicate Detection for Chinese Texts Based on Semantic Fingerprint

Li Gang, Mao Jin, Chen Jinghao

New Technology of Library and Information Service. 2013, 29(9): 41-47. https://doi.org/10.11925/infotech.1003-3513.2013.09.07

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Oriented to Chinese texts, text features are firstly extracted to generate semantic fingerprints by performing the Simhash algorithm. The Hamming Distances between semantic fingerprints are applied to determine the similarity between texts. Then, as the last step of the entire process of detecting duplicates for Chinese text, the Single-Pass clustering algorithm is integrated to cluster the generated semantic fingerprints, after which the clusters of fingerprints are the final results. By comparing with the Shingle algorithm, the experiment shows that the Simhash approach is superior at both precise and robustness, and the Simhash approach is capable to process large amount of texts due to its rapidness.

Select

Authorship Identification of Chinese UGC Based on Stylistics

Lv Yingjie, Fan Jing, Liu Jingfang

New Technology of Library and Information Service. 2013, 29(9): 48-53. https://doi.org/10.11925/infotech.1003-3513.2013.09.08

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

The characteristics of information network such as openness and virtuality make it difficult for authorship identification. Therefore, this paper proposes the approach of authorship identification of Chinese UGC based on stylistics. The authors integrate four types of features including lexical, syntactic, structural and content-specific features to compose writing-style features, and then use text classification technologies for authorship identification. The experimental results demonstrate that the proposed approach can be used for authorship identification of Chinese UGC efficiently.

Select

An Automatic Term Extraction System of Improved C-value Based on Effective Word Frequency

Xiong Liyan, Tan Long, Zhong Maosheng

New Technology of Library and Information Service. 2013, 29(9): 54-59. https://doi.org/10.11925/infotech.1003-3513.2013.09.09

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Existing Chinese term automatic extraction methods focus on the high-frequency characteristics and unithood indicators of terms, while low frequency terms and termhood indicators lack of effective treatment methods. In response to these problems, this paper introduces the background corpus into C-value method and proposes the concepts of word field distribution degree and effective word frequency. Then the paper automatically extracts the terms by calculating EC-value (Effective C-value) of candidate terms, and improves the extraction performance of low-frequency terms combined with the term cluster recognition and mining. The term extraction experiment in the computer field shows that the proposed improved method (EC-value method) can measure the termhood of terms more effectively, and improve the extraction performance of low-frequency terms.

Select

Research on the Credibility of Online Chinese Product Reviews

Meng Meiren, Ding Shengchun

New Technology of Library and Information Service. 2013, 29(9): 60-66. https://doi.org/10.11925/infotech.1003-3513.2013.09.10

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper aims at filtering the lower credible online Chinese product reviews to offer valuable reviews for consumers’ purchase decision. Based on the deep analysis of the online Chinese product reviews’ characteristics, also with some related works, the authors make an empirical analysis on the credibility factors through questionnaires. According to the results of the empirical analysis, the authors select content integrity, emotional balance, review timeliness and clarity of the identity of the publisher as four features, use CRFs as reviews credibility’s classification model, and conduct feature combination experiments to get the best feature combination. The experiments achieve significant results, and the correct rates of the classification model are all above 75%. The research results of this paper can improve the existing artificial effectiveness evaluation method, thus offering new methods and thoughts for optimized filtering of the online reviews.

Select

Study on Network Information Ecological Chain of Chinese Shopping Websites

Li Beiwei, Xu Yue, Shan Jimin, Wei Changlong, Zhang Xinqi, Fu Jinxin

New Technology of Library and Information Service. 2013, 29(9): 67-73. https://doi.org/10.11925/infotech.1003-3513.2013.09.11

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Taking the information ecological chain of Chinese online shopping websites as the research object,this paper establishes an evaluation index system. Selecting 20 shopping websites as example, it grasps the distribution and characteristics of information ecological chain of Chinese online shopping websites through the factor analysis and cluster analysis, and the 20 shopping websites are classified according to the similarity of their development situation. Finally, aiming at the problems existing in the development, corresponding countermeasures and suggestion are put forward.

Select

Research on Microblog Ranking Strategy with the Social Relations

Tang Xiaobo, Fang Xiaoke

New Technology of Library and Information Service. 2013, 29(9): 74-81. https://doi.org/10.11925/infotech.1003-3513.2013.09.12

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

The emergence of social media makes the environment of retrieving changed. Since the shortcomings of retrieving ranking in microblog, this paper analyzes the microblogging social network relationship, and proposes microblogging ranking strategy with the social relations. That means, social strength is added to the traditional PageRank ranking algorithm, and some related indicators including people popularity, information popularity, information quality, the time factor and some others are considered. The experimental results show that AVG has a higher accuracy, and it can obtain more social relationships compared with conventional ranking algorithm.

Select

Person Name Attribute Knowledge Mining and Its Application for Query Classification

Zhang Mei, Duan Jianyong, Xu Jichao

New Technology of Library and Information Service. 2013, 29(9): 82-87. https://doi.org/10.11925/infotech.1003-3513.2013.09.13

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

There are many name entity queries in the Web logs, and person name queries are more than half of these queries. This paper uses Web logs and Wikipedia information to construct the person name knowledge base for the query recommendation. Firstly the person name entities are mined from Web logs and the attributes of these entities are combined by extracting from Wikipedia. With the help of the person name knowledge, the person names in the user queries are classified by the attribute patterns and statistic methods. Then related attribute knowledge is used to recommend the user Intents. The results show that the person name knowledge can be used effectively in the query classification.

Select

Research on Short Text Clustering Algorithm for User Generated Content

Zhao Hui, Liu Huailiang

New Technology of Library and Information Service. 2013, 29(9): 88-92. https://doi.org/10.11925/infotech.1003-3513.2013.09.14

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

To solve the problem of weak semantic description ability of short text feature in user generated content, and the traditional K-means algorithm for document clustering is sensitive to the initial clustering center, this paper proposes that the semantic features information of short text can be supplied by feature extension based on the concept, link structure and category system of Wikipedia. Then the weighted complex network of short text set is built by the semantic relation of texts, and text clustering is achieved by node partitioning community based on K-means algorithm whose initial clustering center is chosen according to the synthetic characteristics of network nodes. Results of experiment show that the algorithm proposed by this paper can improve the effect of short text clustering.

Select

A Research of Knowledge Sharing Community Discovery Based on Interaction History Between Peers in P2P Networks

Gao Haiyan, Dou Yongxiang, Qi Yilan

New Technology of Library and Information Service. 2013, 29(9): 93-98. https://doi.org/10.11925/infotech.1003-3513.2013.09.15

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

In this paper, a P2P community discovery method based on interaction history of knowledge sharing is proposed.At first, a research on the generation of interaction history during the knowledge sharing process is conducted,the user interaction network is formed on the basic of interaction history and the similarities between users are calculated. Then, the clustering analysis approach is used to discover the self-organized P2P knowledge sharing community. Finally, an experiment is designed to verify the feasibility and efficiency of the method.

Please choose a citation manager

Content to export

25 September 2013, Volume 29 Issue 9

模态框（Modal）标题

Please choose a citation manager

Content to export

25 September 2013, Volume 29 Issue 9