Home Table of Contents

25 August 2011, Volume 27 Issue 7
    

  • Select all
    |
  • Bai Haiyan, Liang Bing
    New Technology of Library and Information Service. 2011, 27(7/8): 1-7. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.01
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    The concept model of RDBMS and linked data builds on basement of real world entity, property and their relationships. So it is possible to build mapping between them. The core of semantic pattern mapping is to construct and express the linking relationships. The language of open source software D2R supports to execute SQL of RDBMS and transfers relationships between different entities, inside same entity and among outside data sets into RDF linkage through core language element ClassMap, PropertyBridge and their properties.
  • Shi Hongbo, Wu Zhenxin
    New Technology of Library and Information Service. 2011, 27(7/8): 8-13. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.02
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    This paper does a deep research on content model architecture's structure, application mechanism and the scalability, flexibility and inheritability with the use of CMA. Finally, based on two cases, it provides a preliminary discussion of how to use CMA to preserve complex digital content.
  • Zhao Huaming
    New Technology of Library and Information Service. 2011, 27(7/8): 14-20. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.03
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    Aiming at the performance issue and limitation on data set size in the process of mass-data mining of traditional similarity algorithm, this paper takes unstructured textual data as research subject and introduces the method of Hadoop distributed textual similarity algorithm, which combines Hive data mining platform with PostgreSQL RMDB, and describes the basic technical ideas, implementations and the empirical research in details. The testing result shows that Hive SQL can effectively simplify the complexity of distributed data mining but its real-time performance should be improved.
  • Wang Ke, Zhou Qiang, Li Chunwang
    New Technology of Library and Information Service. 2011, 27(7/8): 21-25. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.04
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    This article introduces a common Web system multi-stage distributed caching mechanism design scheme and the realization method based on the open source software. The program includes multi-granularity organizations, multi-level physical device stored cache management methods, and the cache key formation mechanism and other technologies. Then the cache efficiency evaluation model including single machine and distributed cache acceleration principles and the efficiency test experiment which proves the validity of the scheme are presented.
  • Xie Jing, Qu Yunpeng, Liu Jianhua
    New Technology of Library and Information Service. 2011, 27(7/8): 26-31. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.05
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    By analyzing the existing open-source framework collection system, an accurate acquistition system is designed and developed based on Crawler4j. So the system can meet the real-time monitoring of collection of resources and accuracy requirements. And the paper introduces the design and implementation of the system.
  • Wei Chengfu, Nie Hua
    New Technology of Library and Information Service. 2011, 27(7/8): 32-36. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.06
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    Special collections is the basis of each library to be different from other libraries and to exist independently. Virtual books can be simply, intuitively and realistically to show special collections resources of library online, and this is an effective supplement to traditional file browsing. In order to enable the readers to appreciate the library's special collections resources online, Peking University Library designs and realizes a virtual book platform with MegaZine 3. The test shows that MegaZine 3 can be a useful and effective tool for showing special collections resources online.
  • Yao Xiaona, Zhu Zhongming
    New Technology of Library and Information Service. 2011, 27(7/8): 37-40. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.07
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    This paper adopts Solr to improve the usage statistics of Chinese Academy of Sciences Institutional Repository. The results show that the improved system can achieve fast response speed even on massive data.
  • Zhang Liyi, Chen Mingying
    New Technology of Library and Information Service. 2011, 27(7/8): 41-46. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.08
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    This paper analyzes the evaluation indexes of Web search engines using the epidemiological screening theory without gold standard. User experience score and user judgment are used as the prior information of Bayes estimation. Then it maks use of the MCMC(Markov Chain Monte Carlo)technology to estimate the sensitivity,specificity and detection rate of Baidu and Google(Simplified Chinese).
  • Dong Gui
    New Technology of Library and Information Service. 2011, 27(7/8): 47-55. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.09
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    This paper firstly provides an analysis of the architecture and limitations of current corpus retrieval system. Then it researches on TMX-based storage structure and corresponding matching algorithm. Finally, it addresses the functions of the system description. It aims to explore the ways of processing corpus in a deeper level for corpus retrieval system and to demonstrate its feasibility.
  • Wang Yamin, Liu Xiaowei, Han Xueling
    New Technology of Library and Information Service. 2011, 27(7/8): 56-61. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.10
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    After analyzing the problems of current cloud storage, this paper presents a new cloud storage model based on P2P. This model applies Chord arithmetic in managing nodes and handing out clients' requests, which solves the problems from the centralized structured architecture such as SPOF, performance bottleneck and so on, and realizes load balancing. The model takes advantage of storage clusters to manage users' data, which simplifies the difficulty of system management. Also a replica management strategy is applied in this model, which achieves better scalability, fault tolerance and enhanced performance.
  • Xing Meifeng, Xu Deshan
    New Technology of Library and Information Service. 2011, 27(7/8): 62-67. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.11
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    By analyzing the advantages and disadvantages of the existing bibliometric software, the purpose of scientific research and workflow based on the bibliometric method, this paper establishes a variety of bibliographic entry dictionary, combines and corrects keywords effectively, integrates the process of statistics, co-word and the clustering. Then it designs and completes a sort of visual co-word and cluster analyzer system.
  • Wu Suhui, Cheng Ying, Zheng Yanning, Pan Yuntao
    New Technology of Library and Information Service. 2011, 27(7/8): 68-75. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.12
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    In this paper, a novel cluster label extracting algorithm for English paper based on N-gram is proposed. Before the clustering, this algorithm first uses N-gram to generate the field phrases list by prior learning in the large-scale corpus,then clusters the English paper using K-means algorithm. Finally, the highest score N-gram terms from the cluster is extracted as the label. In the score calculation, if the term exists in the field phrases list, it is set double weight. Experimental results show that the quality of cluster label is improved. Furthermore, an improved TFIDF calculation method is developed,and a new R@N method to evaluate the cluster label is proposed.
  • Lu Yonghe, Cao Lichao
    New Technology of Library and Information Service. 2011, 27(7/8): 76-81. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.13
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    From the perspective of the overall impact of text features on the result of text categorization, a text feature selection method based on particle swarm optimization (PSOTFS)is proposed; to mine the text feature selection rules by PSO algorithm. At first, PSOTFS uses CHI to preselect the text features, then uses PSO algorithm to precisely select the text features from the preselected text features. PSOTFS uses a particle to represent a feature selection rule and the set of feature selection rules corresponds with a particle swarm. At the same time, the classification precision is used as the fitness function and grouping is used to reduce the dimensions of the particles. The experiment result shows that the text categorization effectiveness of PSOTFS is better than that of CHI, information gain, document frequency and mutual information.
  • Ye Huanzhuo, Wu Di
    New Technology of Library and Information Service. 2011, 27(7/8): 82-90. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.14
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    Similarity calculation is a key issue in the process of approximately duplicate data cleaning,and edit distance algorithm is widely used in this application. Based on the traditional edit distance algorithm, by analyzing the sequence length, synonyms and other factors which affect the similarity of the results, an improved approximately duplicate data cleaning algorithm based on semantic edit distance is proposed. This algorithm used synonyms thesaurus and normalized distance metric, and it can be applied to similar records identification process. Experimental results show that the calculating results by this improved algorithm become more in line with the sentence semantic information and people's cognitive experience. Thereby, the method effectively improves the accuracy and precision of detect approximately duplicate data.
  • Huang Mingxuan, Yu Ru
    New Technology of Library and Information Service. 2011, 27(7/8): 91-96. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.15
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    A novel model of information retrieval system based on negative association rules and frequent itemsets mining is proposed, and its designing conception and the function of each module are expounded. And some key techniques to implement the model and searching algorithm are also expatiated. The results of experiment show that the proposed model can improve and enhance the performance of information retrieval effectively .
  • Li Gang, Wang Zhongyi
    New Technology of Library and Information Service. 2011, 27(7/8): 97-103. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.16
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    Due to the complexity of natural language, there are still some problems existing in sentiment mining such as: domain dependence of sentiment words, implicit features recognition, synonym recognition, the calculation of the features' sentiment strengths and so on. To solve these problems, this paper proposes a sentiment mining method based on topic map. This method, which makes full use of the semantic relationships between feature words and sentiment words, can improve the accuracy of the sentiment mining to certain extent.
  • Shan Bin, Li Fang
    New Technology of Library and Information Service. 2011, 27(7/8): 104-109. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.17
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    This paper presents a new method to infer the LDA topic evolution automatically based on seminal documents. The semantic distribution of the seminal documents is used to guide the successive model and link topics between consecutive time slices. The experiments are based on NIPS dataset and Chinese newswire of NPC and CPPCC,and the results show that the method can not only get the correct evolutions in various forms, but also avoid those related topics without evolution relationship.
  • Yu Liping, Pan Yuntao, Wu Yishan
    New Technology of Library and Information Service. 2011, 27(7/8): 110-115. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.18
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    In nonlinear evaluation, sometimes an abnormal phenomenon occurs where the final evaluation score decreases while the value of component indicators increases. Regression adjustment method, a new method for test and improvement, is suggested as a solution to the above abnormality.
  • Hao Dan, Zhou Jinhui, Guan Bei, Wang Yanxi, Han Jixin
    New Technology of Library and Information Service. 2011, 27(7/8): 116-120. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.19
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    This paper takes the statistic on publications by authors and affiliations as the background.Special reasons that cause data redundancy in cross-database searching are analyzed, and four duplicate removal methods including Cross Chinese Database ID, Cross English Database ID, DOI and “Title & Type” are proposed and applied in literature statistics work effectively, which can better solve the cross-database redundancy problems between different databases.
  • Wang Shuo
    New Technology of Library and Information Service. 2011, 27(7/8): 121-126. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.20
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    Taking Capital Normal University Library 3D virtual books navigation system as an example, the paper introduces the application case of virtual books navigation in our country based on technology of 3DsMax and Virtools. It mainly discusses how to create 3D models and realize the interactivity when the users visit the system via Web OPAC or URL. The system implements and realizes virtual books searching and path navigation, real-time messages exchanging, multi-media sharing functions as well as a real virtual library ramble scene.
  • Zhou Hong, Zhang Bei, Jiang Airong, Zhang Chengyu
    New Technology of Library and Information Service. 2011, 27(7/8): 127-131. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.21
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    In order to give a better service to patrons by new technologies, Tsinghua University Library supplies library bibliography information self SMS push service, which is based on the information extraction of OPAC,the collection of patrons'mobile phone number by self-building Web page, the building of structured database, and the database synchronization feature of “Qixintong” SMS system.
  • Tang Xiaoxin
    New Technology of Library and Information Service. 2011, 27(7/8): 132-136. https://doi.org/10.11925/infotech.1003-3513.2011.07-08.22
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save
    A function module of group acceptance in the library interview system is added to avoid the trivial details in the procedure to accept the books. It can achieve the aim to improve acceptance speed and meet the needs of library outsourcing service. The opinions and the process are presented in details, and the key technologies and solution are introduced.