Data Analysis and Knowledge Discovery

Select

Analysis of State-of-the-Art Knowledge Extraction Technologies

Zhang Zhixiong,Wu Zhenxin,Liu Jianhua,Xu Jian,Hong Na,Zhao Qi

New Technology of Library and Information Service. 2008, 24(8): 2-11. https://doi.org/10.11925/infotech.1003-3513.2008.08.01

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Based on the analysis of some state-of-the-art knowledge extraction systems, i.e., MnM, KIM, Text2Onto, Amilcare and Melita, it brings forward that two kinds of technologies, i.e., machine learning and natural language analysis, are developed respectively and get benefits from the inter-reference. On machine learning aspect, some new methods, such as Adaptive Information Extraction, Open Information Extraction, are put forward and have a trend toward Ontology Learning. On nature language analysis aspect, the methods of Pattern-Based Annotation and Semantic Annotation get more attention than ever, and have a trend toward Ontology Based Information Extraction. Besides, Controlled Language Information Extraction method is introduced to reduce the cost of Ontology Construction and allow non-specialists to create or edit ontological data using simple nature language.

Select

Automatic Term Recognition——An Important Method for Text Mining on Scientific Literature

Liu Jianhua,Zhang Zhixiong,Xu Jian,Xu Yandong

New Technology of Library and Information Service. 2008, 24(8): 12-17. https://doi.org/10.11925/infotech.1003-3513.2008.08.02

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Automatic Term Recognition(ATR) is a key process of knowledge technology such as knowledge extraction and text mining. To enrich the text mining theories and methods based on term recognition, support constructing related systems, it refers to some main existing methods for ATR, find key problems of the process. Through researches on related programs and systems, existing term resources, we could choose the best one for ourselves’ ATR system.

Select

Review on Techniques of Entity Relation Extraction

Xu Jian,Zhang Zhixiong,Wu Zhenxin

New Technology of Library and Information Service. 2008, 24(8): 18-23. https://doi.org/10.11925/infotech.1003-3513.2008.08.03

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Entity relation extraction is a very important task in text information extraction domain. It first summarizes the development of entity relation extraction related to MUC and ACE, and then points out that main difficulties exist in the process of relation extraction are acquisition of training dataset, acquisition of templates, and co-reference resolution. Based on the analysis of recent related literatures, systems and projects, it concludes the entity relation extraction methods as follows:templates method, lexicon driven method, machine learning method, Ontology driven method, and hybrid method. The analysis of these methods can help to build more efficient entity relation extraction system in further step.

Select

A Research on the Methodological of Text Visualization

Zhao Qi,Zhang Zhixiong,Sun Tan

New Technology of Library and Information Service. 2008, 24(8): 24-30. https://doi.org/10.11925/infotech.1003-3513.2008.08.04

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Text visualization is a method which uses computer technology to make a graphical show of the specific text resources. This paper analyzes the current text visualization characteristics through analysis of the typical text visualization system. There are four different classes of text visualization, including based on vocabulary, based on article, based on time series, based on topic which reflects the main text visualization techniques. The final part is about how text visualization used in the information environment now.

Select

A Method for Automatic Keyword Extraction and Filtration from Medical Texts

Yin Shumei,Zhang Zhixiong,Wu Zhenxin

New Technology of Library and Information Service. 2008, 24(8): 31-36. https://doi.org/10.11925/infotech.1003-3513.2008.08.05

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Seeing that the keyword or key phrase can represent the feature of text, keyword extraction and filtration has great significance for information retrieval, information extraction and knowledge discovery. This paper first investigates current keyword extraction methods. Then it uses existing thesaurus and tools in the medical field and BM25F model in proposing a method for keyword extraction and filtration from medical texts. The proposed method mainly solves two key problems:identification and extraction of keywords, evaluation of keyword value and filtration of keywords. This paper applies the method on documents in the field of osteoarthritis from the year 2001 to 2007, and verifies its effectiveness, which offers an effective way for extracting keywords in knowledge discovery.

Select

Sense Disambiguation of Chinese Segmentation Based on Bi-direction Matching Method and HMM

Mai Fanjin,Wang Ting

New Technology of Library and Information Service. 2008, 24(8): 37-41. https://doi.org/10.11925/infotech.1003-3513.2008.08.06

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper puts forward a model which can eliminate sense ambiguity of Chinese segmentation. This model segments word based on MM and RMM at first. Then it compares the segmentation results with each other, and output a more accurate result for the segmentation. The process can be divided into three parts:discovery, extraction and disambiguation. The test result shows that this model is able to reduce the error rate of segmentation, which is caused by the ambiguity of word segmentation.

Select

A Personalized Web Pages Recommendation Model Based on Sequential Patterns

Yi Ming

New Technology of Library and Information Service. 2008, 24(8): 42-47. https://doi.org/10.11925/infotech.1003-3513.2008.08.07

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper proposes a personalized Web pages recommendation model based on sequential patterns. Firstly, this model extracts the Web transaction set by Web usage preparation. Secondly, it applies a sequential patterns algorithm to discover frequent (contiguous) sequences. Finally, the model utilizes frequent (contiguous) sequences tree to generate user interest view and provides personalized recommendation set.

Select

A Semantic and Personalized Query Expansion Method Based on Users’Interests

Zhang Kezhuang,Liu Youhua,Huang Fang,Li Yin

New Technology of Library and Information Service. 2008, 24(8): 48-52. https://doi.org/10.11925/infotech.1003-3513.2008.08.08

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper proposes a new query expansion method which combines user modeling research and the research of query expansion based on Ontology,realizing the personalized semantic query expansion. And it divides the process of personalized semantic query expansion into two stage ——the mapping from keywords to the concepts included in the user modeling and the semantic extension at the level of Ontology, and the algorithm of each stage is gaven in this papar. The experiment indicates that this method can enhance the accuracy ratio and the recall of the information retrieval, and meet personalized needs in the certain extent.

Select

The Improved Model of Web Information Retrieval Based on Fuzzy Rough Set

Fan Hongxia

New Technology of Library and Information Service. 2008, 24(8): 53-57. https://doi.org/10.11925/infotech.1003-3513.2008.08.09

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

In view of the problem that the traditional information retrieval model can’t process uncertainty knowledge perfectly, the author combines rough set and fuzzy set theory, and puts forward an improved model of Web information retrieval based on fuzzy rough set. At the same time, the author proposes a key algorithm and a performance evaluation method performonce based on the model.The model is helpful to raise efficiency of information retrieval, and is valuable both in theory and application.

Select

Research on the Auto Indexing Technology About Web Page Based on UCL

Shen Jing,Zhou Jinzhi,Ma Jianguo

New Technology of Library and Information Service. 2008, 24(8): 58-62. https://doi.org/10.11925/infotech.1003-3513.2008.08.10

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Indexing frame of Web page information is constructed referring to Dublin core metadata. Web page’s characteristic information is extracted. Web page information auto-indexing is realized by using ADO technology. The experiment result indicates that the accurate rate of mapping indexing information to Web page reaches 100%.Finally, classification and indexing technology are applied to the intelligent agent termination of the complementary architecture network. The effectiveness of UCL indexing method is proved. The experiment result indicates that through Web page information auto-classification and auto-indexing technology based on the UCL, active service of information is realized and user’s individual demand is satisfied.

Select

Algorithm for Mining Association Rule Based on the Identifier Lists of Transactions

Wang Qiang

New Technology of Library and Information Service. 2008, 24(8): 63-69. https://doi.org/10.11925/infotech.1003-3513.2008.08.11

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper designs and implements an algorithm named TidlistApriori for mining association rule based on the identifier lists of transactions in database using Java.The results of experiment comparing TidlistApriori with Apriori based on Hash-Tree indicate that this algorithm can improve the efficiency of finding frequent item sets, and TidlistApriori can be used as efficient tool for mining topic association.

Select

Development of a Text Mining System Based on the Co-occurrence of Bibliographic Items in Literature Databases

Cui Lei, Liu Wei,Yan Lei,Zhang Han,Hou Yuefang,Huang Yingna,Zhang Hao

New Technology of Library and Information Service. 2008, 24(8): 70-75. https://doi.org/10.11925/infotech.1003-3513.2008.08.12

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper presents a text mining system based on the co-occurrence of bibliographic items in literature databases. This system produces the principal bibliometric indicators of a given document set oriented to PubMed and Web of Science, and some of results are presented by visualization techniques. Further more, it provides cluster analysis and association analysis by investigating the co-occurrence data of high-frequent MeSH terms, high-productive authors, highly-cited papers and highly-cited authors. Using these approaches users can mining the potential association rules among MeSH terms, and engage scientometric investigations.

Select

The Development and Application of Web Hyperlink Analyzer

Peng Chen,Jin Qi

New Technology of Library and Information Service. 2008, 24(8): 76-78. https://doi.org/10.11925/infotech.1003-3513.2008.08.13

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Aiming at the deficiencies of existenting analytical tools of Web,a reasonable Web application program and develop process are provided by using the Java to open out Webstat,and it is applied into the web evaluating.The practice shows that the project has the characteristics of to be prone operation, all-around and systemic result and practicability.

Select

Design and Implementation of Remote Access System for National Science Library

Wang Xiaomei

New Technology of Library and Information Service. 2008, 24(8): 79-83. https://doi.org/10.11925/infotech.1003-3513.2008.08.14

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

This paper introduces the design and implementation of remote access system for National Science Library of Chinese Academy of Sciences. This system has implemented the function of single sign on based on SAML, authorization，access management and reverse proxy, and it helps research users to visit the digital resources which is purchased by their institutes anytime and anywhere.

Select

Design and Implementation of Book Cover Service in OPAC of Tsinghua University Library

Zhou Hong,Zhang Bei,Dou Tianfang,Jiang Airong

New Technology of Library and Information Service. 2008, 24(8): 84-87. https://doi.org/10.11925/infotech.1003-3513.2008.08.15

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

A book cover service in a Mashup mode on the OPAC has been designed and developed in Tsinghua University Library. When patrons access in OPAC, book covers will be displayed in the result pages seamlessly, so patrons can use them intuitionisticly. This article introduces the design and implementation of this book cover data source server, puts emphasis on the design ideas of the outside book cover data source, the method on building this data source by Servlet technology and how to connect the server with the library management system.

Select

Metadata Paralleling Harvesting Framework Research and Amelioration Based on Grid

Zhang Fuzhi,Han Jinghua,Wang Fei

New Technology of Library and Information Service. 2008, 24(8): 88-91. https://doi.org/10.11925/infotech.1003-3513.2008.08.16

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

After discussing metadata paralleling harvesting framework, the paper presents an improved metadata paralleling harvesting framework based on digital library grid, mobile agent and OAI framework. Then it describes the major components and the function of the modules in this framework. Experiment results show that this framework overcomes the shortcomings of low performance and searching inefficiency and so on which exist in previous paralleling metadata harvesting framework.

Select

Design and Implementation of a Service Management System of the Digital Library

Zhuang Jilin

New Technology of Library and Information Service. 2008, 24(8): 92-96. https://doi.org/10.11925/infotech.1003-3513.2008.08.17

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

Specially for the distributed and loosely coupled service system of the digital library consisting of many application services, this paper brings forward a solution based on application layer to monitoring the network service of the digital library, realizes the target of managing all the services which can be visited, gives the formula to compute the service performance and availability. At last, it discusses the design and implementation method of a service management system of the digital library.

Select

Design and Implementation of Electronic Resources Usage Statistics Gateway System

Yan Xiaodi,Shao Jing,Zhou Qi,Ye Jian

New Technology of Library and Information Service. 2008, 24(8): 97-100. https://doi.org/10.11925/infotech.1003-3513.2008.08.18

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

An Electronic Resource Access Gateway System used for solving the usage statistics problem of networked electronic resources has been developed in Xi’an Jiaotong University Library. This paper describes the system design ideas and implementation in detail, including how to obtain valuable data, how to analyze data and access to the needed information, generation of the statistical report of electronic resources and existing problems.

Please choose a citation manager

Content to export

25 August 2008, Volume 24 Issue 8

模态框（Modal）标题

Please choose a citation manager

Content to export

25 August 2008, Volume 24 Issue 8