    Analysis of Data Characteristics in 3O Convergence Websites
    Gao Li, Zhou Jinhui, Liu Yajing
    2013, 29 (7/8): 1-12.  DOI: 10.11925/infotech.1003-3513.2013.07-08.01
    Abstract
    This paper is based on the paper "Open Access, Open Knowledge, Open Innovation Pushes for Open Knowledge Services——3O Convergence and a New Paradigmatic Shift for Research Libraries" published in the second issue of New Technology of Library and Information Service in 2013. On the basis of investigation, sorting and analysis for the websites which own the characteristics of 3O convergence at home and abroad, this paper summarizes the coverage, openness, computability, and re-use features of 3O resource, providing the reference for readers to understand and use the characteristics of 3O convergent platforms.
    Infrastructure, Intelligence, Innovation:Driving the Data Science Agenda——A Comprehensive Review of IDCC2013
    Wu Zhenxin, Qi Yan, Fu Honghu, Liu Chao, Li Wenyan, Liu Xiaomin, Wang Yuju
    2013, 29 (7/8): 13-21.  DOI: 10.11925/infotech.1003-3513.2013.07-08.02
    Abstract
    This paper reviews the 8th International Digital Curation Conference systematically and comprehensively, centring on the theme of the "Infrastructure, Intelligence, Innovation: Driving the Data Science Agenda", the conventioneers present, analyze and discuss the problems about the Institutional Research Data Management, National Perspectives in Research Data Management,Repositories/Data Archives, Cloud Services, Education & Training, Confidentiality/Open Research Data, Formats & Identifiers, Cross Disciplinary Data, Arts & Humanities Data, Formats/Metadata, Data Publication detailedly, deeply and extensively, which witness the research results, current status and challenges of the theoretical and practical aspects in this realm.
    Implementation of Semantic Retrieval Based on Ontology Created by SKOS and Association Rule Mining
    Liu Wei, Zhu Zhongming, Zhang Wangqiang, Wang Sili, Yao Xiaona, Lu Linong
    2013, 29 (7/8): 22-27.  DOI: 10.11925/infotech.1003-3513.2013.07-08.03
    Abstract
    This paper proposes an Ontology construction and application solution. Firstly, the authors create Ontology through the conversion of SKOS.Then, association rules are mined to supply association property between classes. Finally, semantic retrieval is achieved by using retrieval and reasoning techniques based on Ontology.
    An Improved Best-First Search Algorithm Based Focused Crawling Research
    Qiao Jianzhong
    2013, 29 (7/8): 28-35.  DOI: 10.11925/infotech.1003-3513.2013.07-08.04
    Abstract
    This paper introduces two new features——harvest rate and media type as the basis to judge relevance, by refining and reclassifying all kinds of characteristic factors that are used by focused crawlers to predict the priority of Web links, and proposes an improved Best-First Search algorithm. The algorithm uses "fine-grained" policy filtering irrelevant Web pages, selects multiple angles representative characteristic factors and constructs a links priority formula to reveal and predict the subjects of Web links comprehensively. The small-scale experiment comparing with the other three topic search algorithms demonstrates that the improved algorithm has a better performance on harvest rate and the average number of links submitted.
    Fronts and Hotspots of the Application Research on Folksonomy Abroad
    Bi Qiang, Wang Yu
    2013, 29 (7/8): 36-42.  DOI: 10.11925/infotech.1003-3513.2013.07-08.05
    Abstract
    By analyzing 2003-2012 foreign Folksonomy applied research,the paper focuses on four representative and influential areas which are Ontology, Library2.0, Web semantic retrieval and Subject information navigation, and analyzes the frontier developments and research hotspots. The paper also prospects the integration of Folksonomy and Ontology,the application of "user participation in the concept" to Library2.0, Web semantic retrieval analysis processing and the future development of subject information navigation based on the classification of the label, in order to provide references for domestic Folksonomy studies.
    Study on Instance Learning Method of Internet User Preference Ontology
    Zhu Hengmin, Jia Danhua, Huang Zhenqi, Wang Chunhui
    2013, 29 (7/8): 43-48.  DOI: 10.11925/infotech.1003-3513.2013.07-08.06
    Abstract
    Internet user preference Ontology can fully and accurately describe the interest and multidimensional preference of Internet users. In order to effectively resolve the problem that a large number of instances which are expanding and varying are hard to collect manually, the learning method of three representative instances including the topic professional website, brand and sporting events is researched. This method can achieve semi-automatic construction of Internet user preference Ontology. The experiments are designed to verify the effectiveness of the method.
    Research on Text Clustering Based on Social Tagging
    He Wenjing, He Lin
    2013, 29 (7/8): 49-54.  DOI: 10.11925/infotech.1003-3513.2013.07-08.07
    Abstract
    In this paper, the authors select social tags which are used to annotate resources as feature items. Text clustering is implemented by K-means, a kind of clustering algorithm, and successfully conducted on small data set. The implementation of primary technology, such as tag filtering, clustering algorithm, in text clustering based on social tagging is discussed in details. By the experiment, it is concluded that text clustering based on social tags performs better than keywords, which can improve the clustering results.
    Model Construction and Experiment Analysis of Automatic Indexing for Chinese Books
    Wang Hao, Zou Jieli, Deng Sanhong
    2013, 29 (7/8): 55-62.  DOI: 10.11925/infotech.1003-3513.2013.07-08.08
    Abstract
    For the problem of automatic keywords indexing for Chinese books, this paper introduces the machine learning algorithm of Condition Radom Fields to deal with it. The method generates an annotation model including semantic relations and rule features among sequence entities though training the large number of existing keywords data of Chinese books indexed by manual, then uses the annotation model for machine predicting so that to automatically extract the books' keywords. The paper mainly solves two problems. First, because the parameters choice of CRFs will affect the indexing performance, the authors make comparative tests from several angles so as to identify the optimal parameter set of CRFs for the specific problem of keywords indexing for Chinese books. Second, the authors discusse the effect of different observed features to the keywords indexing, and demonstrate four observed features which can improve the indexing performance effectively through the experiments analysis. Finally, the optimal model of keywords indexing oriented to Chinese books is constructed.
    Research of Automatically Recognizing Name in Pre-Qin Ancient Chinese Classics
    Tang Yafen
    2013, 29 (7/8): 63-68.  DOI: 10.11925/infotech.1003-3513.2013.07-08.09
    Abstract
    The ancient Chinese name is automatically recognized by the machine learning model of Conditional Random Field based on Pre-Qin corpus from a point on the research of text mining and analysis of digital humanities. The training model, the F-score of which is 91.52% in cross-validation corpus, is identified as the optimal performance of ancient Chinese name recognition and experimentally verified based on Pre-Qin corpus containing 187 901 words. The research is not only helpful to extract the named entity from Pre-Qin ancient literature but also beneficial to explore the relationship and background among people in other humanities and social sciences.
    Research on Author Name Disambiguation Algorithm in the Literature Database
    Guo Shu
    2013, 29 (7/8): 69-74.  DOI: 10.11925/infotech.1003-3513.2013.07-08.10
    Abstract
    This paper firstly analyzes a graphical framework for name disambiguation called GHOST, and then provides a modified name disambiguation algorithm combining with the text mining of literature information. The new algorithm is more suitable for literature database, making up for the limitations existed in GHOST. Based on selecting title and publication name as computing feature from the literature information, the experiment shows that the algorithm achieves high precision and recall value, and F1 reaches 84%, which is good enough for name disambiguation.
    Reviews on Development of Patent Citation Research
    Chen Liang, Zhang Zhiqiang, Shang Weijiao
    2013, 29 (7/8): 75-81.  DOI: 10.11925/infotech.1003-3513.2013.07-08.11
    Abstract
    As an important data source for strategic intelligence analysis, not only does patent citation reflect knowledge flow in technological development, but also it can be used in technology front tracking and technology status recognition on both country and organization level. After exploring definition and origin of patent citation, this paper describes the roadmap of patent-patent citation analysis method and patent-paper citation method according to categories of patent citation, and lists some representative works. In the end, the paper summarizes the problems occurred in patent citation analysis and provides some corresponding advices.
    Review on Percentile-based Bibliometric Indicator
    Zhou Qun, Zuo Wenge, Chen Shiji
    2013, 29 (7/8): 82-88.  DOI: 10.11925/infotech.1003-3513.2013.07-08.12
    Abstract
    Percentile is established in bibliometrics as an important alternative to Relative Citation Rate, and then applied to the evaluation of research performance. This paper introduces the background of percentile-based bibliometric indicator and describes the concept and types and advantages of using percentiles in bibliometric. It also elaborates the problems in the calculation of percentiles and further assignment of percentile rank, and further analyzes the application of percentile-based bibliometric indicator.
    A New Feature Selection Method Based on Term Contribution in Co-word Analysis
    Hu Changping, Chen Guo
    2013, 29 (7/8): 89-93.  DOI: 10.11925/infotech.1003-3513.2013.07-08.13
    Abstract
    From the view of data dimension reduction, the method of constructing co-word matrix by high frequent words has a great improvement space. By comparing co-word analysis with traditional text processing including text categorization, text clustering and information retrieval, the authors introduce a new feature selection method based on term contribution and the algorithm description. Through experimental comparison, it is shown that the new method has obvious effect on improving the data quality and cluster result.
    Analysis on Statistical Characteristic and Dynamics for User Behavior in Microblog Communities
    He Jing, Guo Jinli, Xu Xuejuan
    2013, 29 (7/8): 94-100.  DOI: 10.11925/infotech.1003-3513.2013.07-08.14
    Abstract
    Using the complex network and statistical methods, this paper analyzes the network topology and user behavior characteristics of the Sina micro-blogging on the individual and group levels. The results show that human behaviors have different multi-scaling characteristics. Of which, node degree distribution and microblog-post behavior approximately obey the power law distribution; however the forwarding and comment behavior obeys exponential truncated power-law distribution. Based on this, the interest-driven mechanism and heavy-tail characteristics of the user behavior are studied and some commonalities are botained. It is helpful to the research of public opinion propagation dynamics.
    Research on Chinese Patent Automatic Classification Method Based on Statistical Distribution
    Hu Bing, Zhang Jianli
    2013, 29 (7/8): 101-106.  DOI: 10.11925/infotech.1003-3513.2013.07-08.15
    Abstract
    Traditional text automatic classification algorithm based on Vector Space Model fails to take the distribution information of terms among classes and the position information of terms in class into consideration, which leads to a poor performance of the algorithm in patent classification. This paper proposes a Chinese patent automatic classification method based on statistical distribution. Firstly, this paper puts forward distribution information weighting factor to manifest the weighting of the terms that appear frequently but in less class. Then, combining with the structural feature of patent text, this paper introduces position information weighting factor to highlight the legal and technical characteristics of patent and differences of patent's each element in content. Finally, the contrast experiment shows that the classification effect can be improved sufficiently by this proposed method.
    Construction of Keywords-Chinese Library Classification Codes Integrated Thesaurus
    Yang He, Yang Yihong, Li Ning
    2013, 29 (7/8): 107-113.  DOI: 10.11925/infotech.1003-3513.2013.07-08.16
    Abstract
    Based on years of massive manual indexing data, this paper constructs a natural language classification thesaurus with Mutual Information (MI), Chi-Square (χ2) and Maximum Likelihood Estimate (MLE) to analyze the corresponding relation between keywords and Chinese Library Classification Codes. The performances of the Keywords-Chinese Library Classification Codes Integrated Thesaurus used for automatic indexing of sci-tech literatures are tested by close and open testing.
    The Design and Implementation of Distributed Patent Information Extraction System
    Zhai Dongsheng, Zhang Xinqi, Zhang Jie, Kang Ning
    2013, 29 (7/8): 114-121.  DOI: 10.11925/infotech.1003-3513.2013.07-08.17
    Abstract
    As a vital patent source, Derwent patent database offers rich patent resource. However, its output format is limited, and includes patent abstract only. This article designs a distributed Derwent patent extraction system based on multi-agent platform. With it, the patents information is imported into a local database, and the detail information in USPTO can also be acquired. The system is effective,and this study is contributed to make a good information acquisition method for patent research.
    Constructing Statistical Analysis System of Electronic Periodical Databases Based on the Firewall Log Mining
    Wang Xiaoliang, Wang Wei
    2013, 29 (7/8): 122-126.  DOI: 10.11925/infotech.1003-3513.2013.07-08.18
    Abstract
    In order to have a clear vision of usage of electronic periodical databases, a statistics & analysis system based on firewall logs is proposed. Useful field information is identified and extracted from firewall logs and stored in relational database for future analysis and research. In case of China Pharmaceutical University Library, tests are carried out on some databases of Chinese and Foreign electronic periodicals. The results show that the methods based on extraction of firewall logs can be very effective on access statistics of target databases, and it is very helpful for policy makers to have a general idea of ordered databases'usage from macro perspective.
    Web Dynamic Interactive Visualization of Knowledge Organization Systems with D3.js
    Zhang Yunliang, Zhang Zhaofeng, Zhang Xiaodan, Xu Deshan
    2013, 29 (7/8): 127-131.  DOI: 10.11925/infotech.1003-3513.2013.07-08.19
    Abstract
    The basic visualization demands about nodes, links and related things of knowledge organization systems are analyzed. After a survey on visualization techniques and the demands, D3.js is chosen to implement the design. The key problems of interactive functions with users in this Web visualization are emphasized. The experiment proves that it is feasible and convenient to use D3.js in a Web dynamic interactive visualization of knowledge organization systems.
    Construction and Application of Assistant Management System for Regulation Violation Cases in Tsinghua University Library
    Zhuang Mei, Wang Ping, Yang Jie, Chen Hong, Wang Yifei
    2013, 29 (7/8): 132-136.  DOI: 10.11925/infotech.1003-3513.2013.07-08.20
    Abstract
    The article starts from introducing the status for management of regulation violation cases in the academic libraries. Utilized as the assistant management scheme to promotion and education, the demand for a proper management system is discussed in the first section. Detailed information of system design, construction and application is elaborated based on the Assistant Management System for Regulation Violation Cases in Tsinghua University Library. Summarizing the practice, the article analyzes the results for the application of the system in Tsinghua University and proposes the future improvement plan.
    Study on Planning of Library Data Center Virtualization Network
    Xu Zhuobin, Lin Junwei
    2013, 29 (7/8): 137-142.  DOI: 10.11925/infotech.1003-3513.2013.07-08.21
    Abstract
    This paper targets on the network design problems in virtual data center of Xiamen University Library, including analysis and tuning of virtualized infrastructural network. It proposes the design principles of data segregation, links redundancy, and bandwidth sharing in order to implement a virtualized data center network with balance between performance and reliability.
