Advanced Search
DAKD
Home
Journal Information
Aims and Scopes
Editorial Board
For Authors
Peer-Review Process
Instruction for Authors
Publishing Ethic Statement
Contact Us
中文
Advanced Search
Current Issue
, Volume 27 Issue 7
Previous Issue
Next Issue
For Selected:
View Abstracts
Download Citations
EndNote
Reference Manager
ProCite
BibTeX
RefWorks
Toggle Thumbnails
Select
Semantic Pattern Mapping Between RDBMS and Linked Data Based on Open Source Software
Bai Haiyan, Liang Bing
2011,
27
(7/8): 1-7. DOI:
10.11925/infotech.1003-3513.2011.07-08.01
Abstract
The concept model of RDBMS and linked data builds on basement of real world entity, property and their relationships. So it is possible to build mapping between them. The core of semantic pattern mapping is to construct and express the linking relationships. The language of open source software D2R supports to execute SQL of RDBMS and transfers relationships between different entities, inside same entity and among outside data sets into RDF linkage through core language element ClassMap, PropertyBridge and their properties.
References
|
Related Articles
|
Metrics
Select
An Analysis of Fedora CMA
Shi Hongbo, Wu Zhenxin
2011,
27
(7/8): 8-13. DOI:
10.11925/infotech.1003-3513.2011.07-08.02
Abstract
This paper does a deep research on content model architecture's structure, application mechanism and the scalability, flexibility and inheritability with the use of CMA. Finally, based on two cases, it provides a preliminary discussion of how to use CMA to preserve complex digital content.
References
|
Related Articles
|
Metrics
Select
Research and Implementation of Textual Similarity in Distributed Environment
Zhao Huaming
2011,
27
(7/8): 14-20. DOI:
10.11925/infotech.1003-3513.2011.07-08.03
Abstract
Aiming at the performance issue and limitation on data set size in the process of mass-data mining of traditional similarity algorithm, this paper takes unstructured textual data as research subject and introduces the method of Hadoop distributed textual similarity algorithm, which combines Hive data mining platform with PostgreSQL RMDB, and describes the basic technical ideas, implementations and the empirical research in details. The testing result shows that Hive SQL can effectively simplify the complexity of distributed data mining but its real-time performance should be improved.
References
|
Related Articles
|
Metrics
Select
Design and Implementation of Web System Multi-stage Distributed Caching Mechanism
Wang Ke, Zhou Qiang, Li Chunwang
2011,
27
(7/8): 21-25. DOI:
10.11925/infotech.1003-3513.2011.07-08.04
Abstract
This article introduces a common Web system multi-stage distributed caching mechanism design scheme and the realization method based on the open source software. The program includes multi-granularity organizations, multi-level physical device stored cache management methods, and the cache key formation mechanism and other technologies. Then the cache efficiency evaluation model including single machine and distributed cache acceleration principles and the efficiency test experiment which proves the validity of the scheme are presented.
References
|
Related Articles
|
Metrics
Select
Targeted Websites Distributed and Precise Harvest System for Network Monitoring Technology
Xie Jing, Qu Yunpeng, Liu Jianhua
2011,
27
(7/8): 26-31. DOI:
10.11925/infotech.1003-3513.2011.07-08.05
Abstract
By analyzing the existing open-source framework collection system, an accurate acquistition system is designed and developed based on Crawler4j. So the system can meet the real-time monitoring of collection of resources and accuracy requirements. And the paper introduces the design and implementation of the system.
References
|
Related Articles
|
Metrics
Select
Building a Virtual Book Platform by MegaZine 3
Wei Chengfu, Nie Hua
2011,
27
(7/8): 32-36. DOI:
10.11925/infotech.1003-3513.2011.07-08.06
Abstract
Special collections is the basis of each library to be different from other libraries and to exist independently. Virtual books can be simply, intuitively and realistically to show special collections resources of library online, and this is an effective supplement to traditional file browsing. In order to enable the readers to appreciate the library's special collections resources online, Peking University Library designs and realizes a virtual book platform with MegaZine 3. The test shows that MegaZine 3 can be a useful and effective tool for showing special collections resources online.
References
|
Related Articles
|
Metrics
Select
Usage Statistics of Institutional Repository Based on Faceted Search Engine Solr
Yao Xiaona, Zhu Zhongming
2011,
27
(7/8): 37-40. DOI:
10.11925/infotech.1003-3513.2011.07-08.07
Abstract
This paper adopts Solr to improve the usage statistics of Chinese Academy of Sciences Institutional Repository. The results show that the improved system can achieve fast response speed even on massive data.
References
|
Related Articles
|
Metrics
Select
Research on the Sensitivity and Specificity of Search Engines
Zhang Liyi, Chen Mingying
2011,
27
(7/8): 41-46. DOI:
10.11925/infotech.1003-3513.2011.07-08.08
Abstract
This paper analyzes the evaluation indexes of Web search engines using the epidemiological screening theory without gold standard. User experience score and user judgment are used as the prior information of Bayes estimation. Then it maks use of the MCMC(Markov Chain Monte Carlo)technology to estimate the sensitivity,specificity and detection rate of Baidu and Google(Simplified Chinese).
Related Articles
|
Metrics
Select
Research on PostgreSQL-based TMX Storage and Implementation of Corpus Retrieval Platform
Dong Gui
2011,
27
(7/8): 47-55. DOI:
10.11925/infotech.1003-3513.2011.07-08.09
Abstract
This paper firstly provides an analysis of the architecture and limitations of current corpus retrieval system. Then it researches on TMX-based storage structure and corresponding matching algorithm. Finally, it addresses the functions of the system description. It aims to explore the ways of processing corpus in a deeper level for corpus retrieval system and to demonstrate its feasibility.
Related Articles
|
Metrics
Select
A Cloud Storage Model Based on P2P
Wang Yamin, Liu Xiaowei, Han Xueling
2011,
27
(7/8): 56-61. DOI:
10.11925/infotech.1003-3513.2011.07-08.10
Abstract
After analyzing the problems of current cloud storage, this paper presents a new cloud storage model based on P2P. This model applies Chord arithmetic in managing nodes and handing out clients' requests, which solves the problems from the centralized structured architecture such as SPOF, performance bottleneck and so on, and realizes load balancing. The model takes advantage of storage clusters to manage users' data, which simplifies the difficulty of system management. Also a replica management strategy is applied in this model, which achieves better scalability, fault tolerance and enhanced performance.
References
|
Related Articles
|
Metrics
Select
Design and Implementation of Visual Co-word and Cluster Analyzer
Xing Meifeng, Xu Deshan
2011,
27
(7/8): 62-67. DOI:
10.11925/infotech.1003-3513.2011.07-08.11
Abstract
By analyzing the advantages and disadvantages of the existing bibliometric software, the purpose of scientific research and workflow based on the bibliometric method, this paper establishes a variety of bibliographic entry dictionary, combines and corrects keywords effectively, integrates the process of statistics, co-word and the clustering. Then it designs and completes a sort of visual co-word and cluster analyzer system.
References
|
Related Articles
|
Metrics
Select
N-gram Based on Cluster Label Extracting Algorithm for English Paper
Wu Suhui, Cheng Ying, Zheng Yanning, Pan Yuntao
2011,
27
(7/8): 68-75. DOI:
10.11925/infotech.1003-3513.2011.07-08.12
Abstract
In this paper, a novel cluster label extracting algorithm for English paper based on N-gram is proposed. Before the clustering, this algorithm first uses N-gram to generate the field phrases list by prior learning in the large-scale corpus,then clusters the English paper using K-means algorithm. Finally, the highest score N-gram terms from the cluster is extracted as the label. In the score calculation, if the term exists in the field phrases list, it is set double weight. Experimental results show that the quality of cluster label is improved. Furthermore, an improved TFIDF calculation method is developed,and a new R@N method to evaluate the cluster label is proposed.
References
|
Related Articles
|
Metrics
Select
Text Feature Selection Method Based on Particle Swarm Optimization
Lu Yonghe, Cao Lichao
2011,
27
(7/8): 76-81. DOI:
10.11925/infotech.1003-3513.2011.07-08.13
Abstract
From the perspective of the overall impact of text features on the result of text categorization, a text feature selection method based on particle swarm optimization (PSOTFS)is proposed; to mine the text feature selection rules by PSO algorithm. At first, PSOTFS uses CHI to preselect the text features, then uses PSO algorithm to precisely select the text features from the preselected text features. PSOTFS uses a particle to represent a feature selection rule and the set of feature selection rules corresponds with a particle swarm. At the same time, the classification precision is used as the fitness function and grouping is used to reduce the dimensions of the particles. The experiment result shows that the text categorization effectiveness of PSOTFS is better than that of CHI, information gain, document frequency and mutual information.
References
|
Related Articles
|
Metrics
Select
Approximately Duplicate Data Cleaning Algorithm Based on Improved Edit Distance
Ye Huanzhuo, Wu Di
2011,
27
(7/8): 82-90. DOI:
10.11925/infotech.1003-3513.2011.07-08.14
Abstract
Similarity calculation is a key issue in the process of approximately duplicate data cleaning,and edit distance algorithm is widely used in this application. Based on the traditional edit distance algorithm, by analyzing the sequence length, synonyms and other factors which affect the similarity of the results, an improved approximately duplicate data cleaning algorithm based on semantic edit distance is proposed. This algorithm used synonyms thesaurus and normalized distance metric, and it can be applied to similar records identification process. Experimental results show that the calculating results by this improved algorithm become more in line with the sentence semantic information and people's cognitive experience. Thereby, the method effectively improves the accuracy and precision of detect approximately duplicate data.
References
|
Related Articles
|
Metrics
Select
Information Retrieval System Based on Negative Association Rules and Frequent Itemsets Mining
Huang Mingxuan, Yu Ru
2011,
27
(7/8): 91-96. DOI:
10.11925/infotech.1003-3513.2011.07-08.15
Abstract
A novel model of information retrieval system based on negative association rules and frequent itemsets mining is proposed, and its designing conception and the function of each module are expounded. And some key techniques to implement the model and searching algorithm are also expatiated. The results of experiment show that the proposed model can improve and enhance the performance of information retrieval effectively .
References
|
Related Articles
|
Metrics
Select
Design and Implementation of Semantic-based Sentiment Mining System
Li Gang, Wang Zhongyi
2011,
27
(7/8): 97-103. DOI:
10.11925/infotech.1003-3513.2011.07-08.16
Abstract
Due to the complexity of natural language, there are still some problems existing in sentiment mining such as: domain dependence of sentiment words, implicit features recognition, synonym recognition, the calculation of the features' sentiment strengths and so on. To solve these problems, this paper proposes a sentiment mining method based on topic map. This method, which makes full use of the semantic relationships between feature words and sentiment words, can improve the accuracy of the sentiment mining to certain extent.
References
|
Related Articles
|
Metrics
Select
Topic Evolution Based on Seminal Document and Topic Model
Shan Bin, Li Fang
2011,
27
(7/8): 104-109. DOI:
10.11925/infotech.1003-3513.2011.07-08.17
Abstract
This paper presents a new method to infer the LDA topic evolution automatically based on seminal documents. The semantic distribution of the seminal documents is used to guide the successive model and link topics between consecutive time slices. The experiments are based on NIPS dataset and Chinese newswire of NPC and CPPCC,and the results show that the method can not only get the correct evolutions in various forms, but also avoid those related topics without evolution relationship.
References
|
Related Articles
|
Metrics
Select
Study on Testing and Improving Nonlinear Evaluation Methods for Academic Journals
Yu Liping, Pan Yuntao, Wu Yishan
2011,
27
(7/8): 110-115. DOI:
10.11925/infotech.1003-3513.2011.07-08.18
Abstract
In nonlinear evaluation, sometimes an abnormal phenomenon occurs where the final evaluation score decreases while the value of component indicators increases. Regression adjustment method, a new method for test and improvement, is suggested as a solution to the above abnormality.
References
|
Related Articles
|
Metrics
Select
Research on Duplicated Literature Deletion Method Based on Cross-database Search
Hao Dan, Zhou Jinhui, Guan Bei, Wang Yanxi, Han Jixin
2011,
27
(7/8): 116-120. DOI:
10.11925/infotech.1003-3513.2011.07-08.19
Abstract
This paper takes the statistic on publications by authors and affiliations as the background.Special reasons that cause data redundancy in cross-database searching are analyzed, and four duplicate removal methods including Cross Chinese Database ID, Cross English Database ID, DOI and “Title & Type” are proposed and applied in literature statistics work effectively, which can better solve the cross-database redundancy problems between different databases.
References
|
Related Articles
|
Metrics
Select
Application of 3D Virtual Browsing Technology in Digital Library Construction Based on Virtools——3D Books Navigation System of Capital Normal University Library
Wang Shuo
2011,
27
(7/8): 121-126. DOI:
10.11925/infotech.1003-3513.2011.07-08.20
Abstract
Taking Capital Normal University Library 3D virtual books navigation system as an example, the paper introduces the application case of virtual books navigation in our country based on technology of 3DsMax and Virtools. It mainly discusses how to create 3D models and realize the interactivity when the users visit the system via Web OPAC or URL. The system implements and realizes virtual books searching and path navigation, real-time messages exchanging, multi-media sharing functions as well as a real virtual library ramble scene.
References
|
Related Articles
|
Metrics
Select
Design and Implementation of Library Bibliography Information Self SMS Push Service
Zhou Hong, Zhang Bei, Jiang Airong, Zhang Chengyu
2011,
27
(7/8): 127-131. DOI:
10.11925/infotech.1003-3513.2011.07-08.21
Abstract
In order to give a better service to patrons by new technologies, Tsinghua University Library supplies library bibliography information self SMS push service, which is based on the information extraction of OPAC,the collection of patrons'mobile phone number by self-building Web page, the building of structured database, and the database synchronization feature of “Qixintong” SMS system.
References
|
Related Articles
|
Metrics
Select
MELINETSⅡ Design and Implementation of Group Acceptance Inspection of Interviews ——Take Guangxi University Library for Example
Tang Xiaoxin
2011,
27
(7/8): 132-136. DOI:
10.11925/infotech.1003-3513.2011.07-08.22
Abstract
A function module of group acceptance in the library interview system is added to avoid the trivial details in the procedure to accept the books. It can achieve the aim to improve acceptance speed and meet the needs of library outsourcing service. The opinions and the process are presented in details, and the key technologies and solution are introduced.
References
|
Related Articles
|
Metrics
Copyright © 2016 Data Analysis and Knowledge Discovery Tel/Fax:(010)82626611-6626,82624938 E-mail:jishu@mail.las.ac.cn