Data Analysis and Knowledge Discovery

Select

Targeted Websites Harvest System Based on Nutch

Xu Jian,Zhang Zhixiong

New Technology of Library and Information Service. 2009, 25(4): 1-6. https://doi.org/10.11925/infotech.1003-3513.2009.04.01

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

The paper analyzes typical open source Web crawl software, such as Nutch, Heritrix, WCT, and Web-Harvest. Following the analyzed result, it puts forward a targeted websites harvest system based on Nutch. Four key issues of this system are discussed emphatically, which are the initial seed websites selection, the harvest process management, the web page content denoising, and discovering of new seed websites.

Select

To Build Knowledge Organization Systems of Digital Library Based on Open Source Software

Bai Haiyan,Jiang Bo

New Technology of Library and Information Service. 2009, 25(4): 7-13. https://doi.org/10.11925/infotech.1003-3513.2009.04.02

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This article analyzes the levels and structure of knowledge organization system in digital library, emphasizes on four components -KOS building and management, KOS interoperation, KOS storage and administration, semantic metadata generation.Related open source software is chosen and application of each structure in the process of digital library knowledge organization is introduced. Finally, it proposes practical example on building knowledge organization system in digital library.

Select

Analysis of Index Strategies in Web Archive

Sun Zhiru,Wu Zhenxin,Qu Yupeng

New Technology of Library and Information Service. 2009, 25(4): 14-18. https://doi.org/10.11925/infotech.1003-3513.2009.04.03

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This article summarizes several typical index strategies through analyzing Web Archive projects with Wayback as access tool, also gives preliminary analysis for the scope of application, merits and faults of each strategy. Thus hopes to give companies of this area some reference.

Select

Localization of the Open Source Full-text Retrival Engine Based on Lucene

Wu Pengfei ,Ma Fengjuan, Li Wenge,Guo Peng

New Technology of Library and Information Service. 2009, 25(4): 19-22. https://doi.org/10.11925/infotech.1003-3513.2009.04.04

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper introduces the system architecture, indexing and retrieval process, and language analyzer of Lucene. According to the disadvantage of Lucene that it can only make one-word and two-word segmentation, this paper develops a Chinese-English language analyzer — ZH_CNAnalyzer. At last, an indexing and retrieval example of ZH_CNAnalyzer is given.

Select

A Method for Generating Co-occurrence Matrix of Mass Data Based on Hadoop

Yang Daiqing,Zhang Zhixiong

New Technology of Library and Information Service. 2009, 25(4): 23-26. https://doi.org/10.11925/infotech.1003-3513.2009.04.05

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Mass data processing is a focal point of information techniques. This paper introduces architecture of open source parallel system-Hadoop, analyzes the MapReduce programming framework based on Hadoop, and proposes a method for generating co-occurrence matrix of mass data through multiple MapReduce operations.

Select

Research on the Mechanism of Grid Service Description in Digital Library

Zhang Ziran,Dong Hui

New Technology of Library and Information Service. 2009, 25(4): 27-32. https://doi.org/10.11925/infotech.1003-3513.2009.04.06

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper introduces grid service description techniques for multi-attributed DL grid, namely, setting uniform standard of metadata for each feature and describing each feature by its corresponding metadata standard. It discusses the levels of service semantic description in DL grid and establishes the semantic description model of DL grid’s service based on Ontology.

Select

Comparative Study of Several Standards for Compound Digital Object in Digital Repository

Ma Jianxia

New Technology of Library and Information Service. 2009, 25(4): 33-39. https://doi.org/10.11925/infotech.1003-3513.2009.04.07

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper introduces several standards of compound digital object,METS，MPEG-21 DIDL and OAI-ORE. The basic data models, applications and characters of these standards are analyzed and their processes of digital objects are compared.

Select

Design of Micro Self-evaluation Software for Library Website’s User Satisfaction Factor

Ding Shengchun,Li Chong,Li Li,Wang Xiaoqing

New Technology of Library and Information Service. 2009, 25(4): 40-43. https://doi.org/10.11925/infotech.1003-3513.2009.04.08

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

In order to find out the micro factors that impact user’s satisfaction about library websites, this paper puts forward a flexible self-evaluation system. With this system, the library websites can choose suitable micro evaluation plan according to their own needs, and diagnose by themselves the underlying factors that impact user’s satisfaction. The system is highly user-definable, and the library managers can create experts’ weight-surveying questionnaires by using its indicator templates or by using self-defined indicators. Finally, the surveying data are analyzed and showed with 3D visualization graphics, and the micro factors needs to be improved are found out.

Select

Topology of the Knowledge Communication Network in Virtual Communities——Based on CSDN

Peng Hongbin,Wang Jun

New Technology of Library and Information Service. 2009, 25(4): 44-49. https://doi.org/10.11925/infotech.1003-3513.2009.04.09

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper gives a systemic discussion on the Knowledge Communication Network (KCN) drawn from CSDN, trying to mine the character of the knowledge communication in virtual communities. Firstly, the authors analysis properties of the statistics, and point out that the small-world effect and scale-free property do exist in the network. Then find out the two important motifs in knowledge communication through analyzing the triangle of the network.

Select

The POS &|Mining Study on Search Engine’s Query Log

Lai Maosheng,Qu Peng

New Technology of Library and Information Service. 2009, 25(4): 50-56. https://doi.org/10.11925/infotech.1003-3513.2009.04.10

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

The paper analyzes the query logs in March, 2007, from Sogou search engine. POS tagging is used to get the characters of high frequency POS results. Web users use nouns as primary and verbs as complementary methods in Web queries; but other parts of speech seldom appear in the queries. The empty words in natural language, such as “的”, do not appear in the high frequency POS results very often. Queries in the Web searching are different from natural language in syntax to a certain degree and they have shared characters at the same time. Web users’ use nouns to do concept-focused retrieval and keywords are still the primary method to search on the Web. The high frequency results of POS tagging partially obey the Zipf’s law.

Select

Research on Conversation Policy in Knowledge Interchanging between Agents

Zhang Shaolong

New Technology of Library and Information Service. 2009, 25(4): 57-63. https://doi.org/10.11925/infotech.1003-3513.2009.04.11

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

The process of knowledge interchanging between agents is a complex process. It needs a conversation policy to manipulate the activities of agents. The paper proposes a method based on extended KQML language to simulate the hand-shaking mechanism in the TCP protocol. The method can deal well with the problems in the interchanging such as establishing a conversation, assurance of message delivery, et al.

Select

The Personalized Product Recommendation Method Based on Weighted XML Model

Li Shuqing

New Technology of Library and Information Service. 2009, 25(4): 64-69. https://doi.org/10.11925/infotech.1003-3513.2009.04.12

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper puts forward a new method for constitution of user preference model based on weighted XML data structure, with each node appends weight value for representing users’ personalized information.It also designs a new arithmetic to compare similarity of weighted XML model. Finally, this paper discusses the implementation of personalized product recommendation system based on this user preference model at detail.

Select

Research and Realization of Key Techniques of Network Subject Knowledge Database

Tan Chunmei,Duan Weihua,Cao Songqiang

New Technology of Library and Information Service. 2009, 25(4): 70-74. https://doi.org/10.11925/infotech.1003-3513.2009.04.13

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

With visual studio.NET development platform，C#，XML, a network subject knowledge database system has been designed and developed.Key techniques such as HTML Web pages metadata acquisition and XML files production，knowledge point mining，data fast transformation between XML files of network subject knowledge and relation database are researched in this paper.

Select

The Migratory of CAIRIC Local System from Physical Servers to Virtual Machine

Wu Zhiqiang,Xu Ge,Li Ning

New Technology of Library and Information Service. 2009, 25(4): 75-78. https://doi.org/10.11925/infotech.1003-3513.2009.04.14

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This article discusses the migratory solutions of CAIRIC local system from physical servers to virtual machine based on library practice. The authors accomplish the CAIRIC local systems migratation and updation successfully, using backup and virtual machine techniques. It provides a valuable example for constructing library service system platform based on the virtual machine technique for the future.

Select

Binarization for Document Image Based on Multi-scale Conditional Random Fields

Liu Kun,Lv Xueqiang,Wang Tao,Shi Shuicai

New Technology of Library and Information Service. 2009, 25(4): 79-81. https://doi.org/10.11925/infotech.1003-3513.2009.04.15

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper proposes a new algorithm based on multi-scale conditional random fields. This algorithm treats the binarization as a tagging process, using mCRF to label every pixel in the image, so as to realize the binarization of the full image. MCRF of discriminate model can accommodate any of the non-independent features, which makes full use of information in the image. From the result can see this algorithm is better than common threshold method in effect.

Select

Research on Topic Maps-based Tourism Document Organization Method

Li Qingmao

New Technology of Library and Information Service. 2009, 25(4): 82-87. https://doi.org/10.11925/infotech.1003-3513.2009.04.16

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Selecting the Aba Zang and Qiang Autonomous Region’s tourism documents as information resources, the author analyzes the topic and topic type selection principle for the organization of tourism documents according to topic maps, defines the associations among topics in tourism documents, proposes a methodological approach to the construction of topic maps for tourism documents, and displays the effect of the organization of topic maps.

Select

An Extraction Model of Experience and Evaluation Article

Wu Shixian,Zhang Bilan

New Technology of Library and Information Service. 2009, 25(4): 88-92. https://doi.org/10.11925/infotech.1003-3513.2009.04.17

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

In this paper, an extraction model of experience and evaluation article is proposed, and an evaluation experiment about experience and evaluation article extraction from blogs is achieved. This model depends on collocation degree and distance of experience object, experience action, and experience evaluation instead of syntax analysis. The results of the experiment show that, the system based on this model achive high extraction precision.

Select

Design and Application of Geographic Information System in the University Library

Liu Weihong

New Technology of Library and Information Service. 2009, 25(4): 93-97. https://doi.org/10.11925/infotech.1003-3513.2009.04.18

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

The system implements the application of GIS in the management about the information of the university library. For setting up the system, this paper uses spatial query and spatial analysis functions of GIS, and sets up spatial basic geographic information system model, then it associates the spatial data and the attribute date. Users who have different authorities can manage, retrieval, query, analyse and apply the resources of library in a virtual environment. Readers can easily query spatial position through the resources’s attribute date and also they can obtain attribute data of their interested areas.

Please choose a citation manager

Content to export

25 April 2009, Volume 25 Issue 4

模态框（Modal）标题

Please choose a citation manager

Content to export

25 April 2009, Volume 25 Issue 4