Data Analysis and Knowledge Discovery

Select

Semantic Pattern Mapping Between RDBMS and Linked Data Based on Open Source Software

Bai Haiyan, Liang Bing

New Technology of Library and Information Service. 2011, 27(7/8): 1-7. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.01

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

The concept model of RDBMS and linked data builds on basement of real world entity, property and their relationships. So it is possible to build mapping between them. The core of semantic pattern mapping is to construct and express the linking relationships. The language of open source software D2R supports to execute SQL of RDBMS and transfers relationships between different entities, inside same entity and among outside data sets into RDF linkage through core language element ClassMap, PropertyBridge and their properties.

Select

An Analysis of Fedora CMA

Shi Hongbo, Wu Zhenxin

New Technology of Library and Information Service. 2011, 27(7/8): 8-13. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.02

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper does a deep research on content model architecture's structure, application mechanism and the scalability, flexibility and inheritability with the use of CMA. Finally, based on two cases, it provides a preliminary discussion of how to use CMA to preserve complex digital content.

Select

Research and Implementation of Textual Similarity in Distributed Environment

Zhao Huaming

New Technology of Library and Information Service. 2011, 27(7/8): 14-20. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.03

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Aiming at the performance issue and limitation on data set size in the process of mass-data mining of traditional similarity algorithm, this paper takes unstructured textual data as research subject and introduces the method of Hadoop distributed textual similarity algorithm, which combines Hive data mining platform with PostgreSQL RMDB, and describes the basic technical ideas, implementations and the empirical research in details. The testing result shows that Hive SQL can effectively simplify the complexity of distributed data mining but its real-time performance should be improved.

Select

Design and Implementation of Web System Multi-stage Distributed Caching Mechanism

Wang Ke, Zhou Qiang, Li Chunwang

New Technology of Library and Information Service. 2011, 27(7/8): 21-25. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.04

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This article introduces a common Web system multi-stage distributed caching mechanism design scheme and the realization method based on the open source software. The program includes multi-granularity organizations, multi-level physical device stored cache management methods, and the cache key formation mechanism and other technologies. Then the cache efficiency evaluation model including single machine and distributed cache acceleration principles and the efficiency test experiment which proves the validity of the scheme are presented.

Select

Targeted Websites Distributed and Precise Harvest System for Network Monitoring Technology

Xie Jing, Qu Yunpeng, Liu Jianhua

New Technology of Library and Information Service. 2011, 27(7/8): 26-31. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.05

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

By analyzing the existing open-source framework collection system, an accurate acquistition system is designed and developed based on Crawler4j. So the system can meet the real-time monitoring of collection of resources and accuracy requirements. And the paper introduces the design and implementation of the system.

Select

Building a Virtual Book Platform by MegaZine 3

Wei Chengfu, Nie Hua

New Technology of Library and Information Service. 2011, 27(7/8): 32-36. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.06

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Special collections is the basis of each library to be different from other libraries and to exist independently. Virtual books can be simply, intuitively and realistically to show special collections resources of library online, and this is an effective supplement to traditional file browsing. In order to enable the readers to appreciate the library's special collections resources online, Peking University Library designs and realizes a virtual book platform with MegaZine 3. The test shows that MegaZine 3 can be a useful and effective tool for showing special collections resources online.

Select

Usage Statistics of Institutional Repository Based on Faceted Search Engine Solr

Yao Xiaona, Zhu Zhongming

New Technology of Library and Information Service. 2011, 27(7/8): 37-40. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.07

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper adopts Solr to improve the usage statistics of Chinese Academy of Sciences Institutional Repository. The results show that the improved system can achieve fast response speed even on massive data.

Select

Research on the Sensitivity and Specificity of Search Engines

Zhang Liyi, Chen Mingying

New Technology of Library and Information Service. 2011, 27(7/8): 41-46. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.08

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper analyzes the evaluation indexes of Web search engines using the epidemiological screening theory without gold standard. User experience score and user judgment are used as the prior information of Bayes estimation. Then it maks use of the MCMC(Markov Chain Monte Carlo)technology to estimate the sensitivity,specificity and detection rate of Baidu and Google(Simplified Chinese).

Select

Research on PostgreSQL-based TMX Storage and Implementation of Corpus Retrieval Platform

Dong Gui

New Technology of Library and Information Service. 2011, 27(7/8): 47-55. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.09

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper firstly provides an analysis of the architecture and limitations of current corpus retrieval system. Then it researches on TMX-based storage structure and corresponding matching algorithm. Finally, it addresses the functions of the system description. It aims to explore the ways of processing corpus in a deeper level for corpus retrieval system and to demonstrate its feasibility.

Select

A Cloud Storage Model Based on P2P

Wang Yamin, Liu Xiaowei, Han Xueling

New Technology of Library and Information Service. 2011, 27(7/8): 56-61. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.10

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

After analyzing the problems of current cloud storage, this paper presents a new cloud storage model based on P2P. This model applies Chord arithmetic in managing nodes and handing out clients' requests, which solves the problems from the centralized structured architecture such as SPOF, performance bottleneck and so on, and realizes load balancing. The model takes advantage of storage clusters to manage users' data, which simplifies the difficulty of system management. Also a replica management strategy is applied in this model, which achieves better scalability, fault tolerance and enhanced performance.

Select

Design and Implementation of Visual Co-word and Cluster Analyzer

Xing Meifeng, Xu Deshan

New Technology of Library and Information Service. 2011, 27(7/8): 62-67. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.11

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

By analyzing the advantages and disadvantages of the existing bibliometric software, the purpose of scientific research and workflow based on the bibliometric method, this paper establishes a variety of bibliographic entry dictionary, combines and corrects keywords effectively, integrates the process of statistics, co-word and the clustering. Then it designs and completes a sort of visual co-word and cluster analyzer system.

Select

N-gram Based on Cluster Label Extracting Algorithm for English Paper

Wu Suhui, Cheng Ying, Zheng Yanning, Pan Yuntao

New Technology of Library and Information Service. 2011, 27(7/8): 68-75. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.12

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

In this paper, a novel cluster label extracting algorithm for English paper based on N-gram is proposed. Before the clustering, this algorithm first uses N-gram to generate the field phrases list by prior learning in the large-scale corpus,then clusters the English paper using K-means algorithm. Finally, the highest score N-gram terms from the cluster is extracted as the label. In the score calculation, if the term exists in the field phrases list, it is set double weight. Experimental results show that the quality of cluster label is improved. Furthermore, an improved TFIDF calculation method is developed,and a new R@N method to evaluate the cluster label is proposed.

Select

Text Feature Selection Method Based on Particle Swarm Optimization

Lu Yonghe, Cao Lichao

New Technology of Library and Information Service. 2011, 27(7/8): 76-81. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.13

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

From the perspective of the overall impact of text features on the result of text categorization, a text feature selection method based on particle swarm optimization (PSOTFS)is proposed; to mine the text feature selection rules by PSO algorithm. At first, PSOTFS uses CHI to preselect the text features, then uses PSO algorithm to precisely select the text features from the preselected text features. PSOTFS uses a particle to represent a feature selection rule and the set of feature selection rules corresponds with a particle swarm. At the same time, the classification precision is used as the fitness function and grouping is used to reduce the dimensions of the particles. The experiment result shows that the text categorization effectiveness of PSOTFS is better than that of CHI, information gain, document frequency and mutual information.

Select

Approximately Duplicate Data Cleaning Algorithm Based on Improved Edit Distance

Ye Huanzhuo, Wu Di

New Technology of Library and Information Service. 2011, 27(7/8): 82-90. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.14

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Similarity calculation is a key issue in the process of approximately duplicate data cleaning,and edit distance algorithm is widely used in this application. Based on the traditional edit distance algorithm, by analyzing the sequence length, synonyms and other factors which affect the similarity of the results, an improved approximately duplicate data cleaning algorithm based on semantic edit distance is proposed. This algorithm used synonyms thesaurus and normalized distance metric, and it can be applied to similar records identification process. Experimental results show that the calculating results by this improved algorithm become more in line with the sentence semantic information and people's cognitive experience. Thereby, the method effectively improves the accuracy and precision of detect approximately duplicate data.

Select

Information Retrieval System Based on Negative Association Rules and Frequent Itemsets Mining

Huang Mingxuan, Yu Ru

New Technology of Library and Information Service. 2011, 27(7/8): 91-96. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.15

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

A novel model of information retrieval system based on negative association rules and frequent itemsets mining is proposed, and its designing conception and the function of each module are expounded. And some key techniques to implement the model and searching algorithm are also expatiated. The results of experiment show that the proposed model can improve and enhance the performance of information retrieval effectively .

Select

Design and Implementation of Semantic-based Sentiment Mining System

Li Gang, Wang Zhongyi

New Technology of Library and Information Service. 2011, 27(7/8): 97-103. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.16

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Due to the complexity of natural language, there are still some problems existing in sentiment mining such as: domain dependence of sentiment words, implicit features recognition, synonym recognition, the calculation of the features' sentiment strengths and so on. To solve these problems, this paper proposes a sentiment mining method based on topic map. This method, which makes full use of the semantic relationships between feature words and sentiment words, can improve the accuracy of the sentiment mining to certain extent.

Select

Topic Evolution Based on Seminal Document and Topic Model

Shan Bin, Li Fang

New Technology of Library and Information Service. 2011, 27(7/8): 104-109. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.17

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper presents a new method to infer the LDA topic evolution automatically based on seminal documents. The semantic distribution of the seminal documents is used to guide the successive model and link topics between consecutive time slices. The experiments are based on NIPS dataset and Chinese newswire of NPC and CPPCC,and the results show that the method can not only get the correct evolutions in various forms, but also avoid those related topics without evolution relationship.

Select

Study on Testing and Improving Nonlinear Evaluation Methods for Academic Journals

Yu Liping, Pan Yuntao, Wu Yishan

New Technology of Library and Information Service. 2011, 27(7/8): 110-115. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.18

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

In nonlinear evaluation, sometimes an abnormal phenomenon occurs where the final evaluation score decreases while the value of component indicators increases. Regression adjustment method, a new method for test and improvement, is suggested as a solution to the above abnormality.

Select

Research on Duplicated Literature Deletion Method Based on Cross-database Search

Hao Dan, Zhou Jinhui, Guan Bei, Wang Yanxi, Han Jixin

New Technology of Library and Information Service. 2011, 27(7/8): 116-120. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.19

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper takes the statistic on publications by authors and affiliations as the background.Special reasons that cause data redundancy in cross-database searching are analyzed, and four duplicate removal methods including Cross Chinese Database ID, Cross English Database ID, DOI and “Title & Type” are proposed and applied in literature statistics work effectively, which can better solve the cross-database redundancy problems between different databases.

Select

Application of 3D Virtual Browsing Technology in Digital Library Construction Based on Virtools——3D Books Navigation System of Capital Normal University Library

Wang Shuo

New Technology of Library and Information Service. 2011, 27(7/8): 121-126. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.20

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Taking Capital Normal University Library 3D virtual books navigation system as an example, the paper introduces the application case of virtual books navigation in our country based on technology of 3DsMax and Virtools. It mainly discusses how to create 3D models and realize the interactivity when the users visit the system via Web OPAC or URL. The system implements and realizes virtual books searching and path navigation, real-time messages exchanging, multi-media sharing functions as well as a real virtual library ramble scene.

Select

Design and Implementation of Library Bibliography Information Self SMS Push Service

Zhou Hong, Zhang Bei, Jiang Airong, Zhang Chengyu

New Technology of Library and Information Service. 2011, 27(7/8): 127-131. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.21

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

In order to give a better service to patrons by new technologies, Tsinghua University Library supplies library bibliography information self SMS push service, which is based on the information extraction of OPAC,the collection of patrons'mobile phone number by self-building Web page, the building of structured database, and the database synchronization feature of “Qixintong” SMS system.

Select

MELINETSⅡ Design and Implementation of Group Acceptance Inspection of Interviews ——Take Guangxi University Library for Example

Tang Xiaoxin

New Technology of Library and Information Service. 2011, 27(7/8): 132-136. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.22

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

A function module of group acceptance in the library interview system is added to avoid the trivial details in the procedure to accept the books. It can achieve the aim to improve acceptance speed and meet the needs of library outsourcing service. The opinions and the process are presented in details, and the key technologies and solution are introduced.

Please choose a citation manager

Content to export

25 August 2011, Volume 27 Issue 7

模态框（Modal）标题

Please choose a citation manager

Content to export

25 August 2011, Volume 27 Issue 7