Current Issue
    , Volume 29 Issue 7 Previous Issue    Next Issue
    For Selected: View Abstracts Toggle Thumbnails
    article
    Analysis of Data Characteristics in 3O Convergence Websites
    Gao Li, Zhou Jinhui, Liu Yajing
    2013, 29 (7/8): 1-12.  DOI: 10.11925/infotech.1003-3513.2013.07-08.01
    Abstract   HTML   PDF (1098KB) ( 312 )
    This paper is based on the paper "Open Access, Open Knowledge, Open Innovation Pushes for Open Knowledge Services——3O Convergence and a New Paradigmatic Shift for Research Libraries" published in the second issue of New Technology of Library and Information Service in 2013. On the basis of investigation, sorting and analysis for the websites which own the characteristics of 3O convergence at home and abroad, this paper summarizes the coverage, openness, computability, and re-use features of 3O resource, providing the reference for readers to understand and use the characteristics of 3O convergent platforms.
    References | Related Articles | Metrics
    Infrastructure, Intelligence, Innovation:Driving the Data Science Agenda——A Comprehensive Review of IDCC2013
    Wu Zhenxin, Qi Yan, Fu Honghu, Liu Chao, Li Wenyan, Liu Xiaomin, Wang Yuju
    2013, 29 (7/8): 13-21.  DOI: 10.11925/infotech.1003-3513.2013.07-08.02
    Abstract   HTML   PDF (531KB) ( 304 )
    This paper reviews the 8th International Digital Curation Conference systematically and comprehensively, centring on the theme of the "Infrastructure, Intelligence, Innovation: Driving the Data Science Agenda", the conventioneers present, analyze and discuss the problems about the Institutional Research Data Management, National Perspectives in Research Data Management,Repositories/Data Archives, Cloud Services, Education & Training, Confidentiality/Open Research Data, Formats & Identifiers, Cross Disciplinary Data, Arts & Humanities Data, Formats/Metadata, Data Publication detailedly, deeply and extensively, which witness the research results, current status and challenges of the theoretical and practical aspects in this realm.
    References | Related Articles | Metrics
    article
    Implementation of Semantic Retrieval Based on Ontology Created by SKOS and Association Rule Mining
    Liu Wei, Zhu Zhongming, Zhang Wangqiang, Wang Sili, Yao Xiaona, Lu Linong
    2013, 29 (7/8): 22-27.  DOI: 10.11925/infotech.1003-3513.2013.07-08.03
    Abstract   HTML   PDF (986KB) ( 259 )
    This paper proposes an Ontology construction and application solution. Firstly, the authors create Ontology through the conversion of SKOS.Then, association rules are mined to supply association property between classes. Finally, semantic retrieval is achieved by using retrieval and reasoning techniques based on Ontology.
    References | Related Articles | Metrics
    An Improved Best-First Search Algorithm Based Focused Crawling Research
    Qiao Jianzhong
    2013, 29 (7/8): 28-35.  DOI: 10.11925/infotech.1003-3513.2013.07-08.04
    Abstract   HTML   PDF (1005KB) ( 356 )
    This paper introduces two new features——harvest rate and media type as the basis to judge relevance, by refining and reclassifying all kinds of characteristic factors that are used by focused crawlers to predict the priority of Web links, and proposes an improved Best-First Search algorithm. The algorithm uses "fine-grained" policy filtering irrelevant Web pages, selects multiple angles representative characteristic factors and constructs a links priority formula to reveal and predict the subjects of Web links comprehensively. The small-scale experiment comparing with the other three topic search algorithms demonstrates that the improved algorithm has a better performance on harvest rate and the average number of links submitted.
    References | Related Articles | Metrics
    article
    Fronts and Hotspots of the Application Research on Folksonomy Abroad
    Bi Qiang, Wang Yu
    2013, 29 (7/8): 36-42.  DOI: 10.11925/infotech.1003-3513.2013.07-08.05
    Abstract   HTML   PDF (677KB) ( 336 )
    By analyzing 2003-2012 foreign Folksonomy applied research,the paper focuses on four representative and influential areas which are Ontology, Library2.0, Web semantic retrieval and Subject information navigation, and analyzes the frontier developments and research hotspots. The paper also prospects the integration of Folksonomy and Ontology,the application of "user participation in the concept" to Library2.0, Web semantic retrieval analysis processing and the future development of subject information navigation based on the classification of the label, in order to provide references for domestic Folksonomy studies.
    References | Related Articles | Metrics
    Study on Instance Learning Method of Internet User Preference Ontology
    Zhu Hengmin, Jia Danhua, Huang Zhenqi, Wang Chunhui
    2013, 29 (7/8): 43-48.  DOI: 10.11925/infotech.1003-3513.2013.07-08.06
    Abstract   HTML   PDF (649KB) ( 355 )
    Internet user preference Ontology can fully and accurately describe the interest and multidimensional preference of Internet users. In order to effectively resolve the problem that a large number of instances which are expanding and varying are hard to collect manually, the learning method of three representative instances including the topic professional website, brand and sporting events is researched. This method can achieve semi-automatic construction of Internet user preference Ontology. The experiments are designed to verify the effectiveness of the method.
    References | Related Articles | Metrics
    Research on Text Clustering Based on Social Tagging
    He Wenjing, He Lin
    2013, 29 (7/8): 49-54.  DOI: 10.11925/infotech.1003-3513.2013.07-08.07
    Abstract   HTML   PDF (577KB) ( 349 )
    In this paper, the authors select social tags which are used to annotate resources as feature items. Text clustering is implemented by K-means, a kind of clustering algorithm, and successfully conducted on small data set. The implementation of primary technology, such as tag filtering, clustering algorithm, in text clustering based on social tagging is discussed in details. By the experiment, it is concluded that text clustering based on social tags performs better than keywords, which can improve the clustering results.
    References | Related Articles | Metrics
    Model Construction and Experiment Analysis of Automatic Indexing for Chinese Books
    Wang Hao, Zou Jieli, Deng Sanhong
    2013, 29 (7/8): 55-62.  DOI: 10.11925/infotech.1003-3513.2013.07-08.08
    Abstract   HTML   PDF (1144KB) ( 253 )
    For the problem of automatic keywords indexing for Chinese books, this paper introduces the machine learning algorithm of Condition Radom Fields to deal with it. The method generates an annotation model including semantic relations and rule features among sequence entities though training the large number of existing keywords data of Chinese books indexed by manual, then uses the annotation model for machine predicting so that to automatically extract the books' keywords. The paper mainly solves two problems. First, because the parameters choice of CRFs will affect the indexing performance, the authors make comparative tests from several angles so as to identify the optimal parameter set of CRFs for the specific problem of keywords indexing for Chinese books. Second, the authors discusse the effect of different observed features to the keywords indexing, and demonstrate four observed features which can improve the indexing performance effectively through the experiments analysis. Finally, the optimal model of keywords indexing oriented to Chinese books is constructed.
    References | Related Articles | Metrics
    Research of Automatically Recognizing Name in Pre-Qin Ancient Chinese Classics
    Tang Yafen
    2013, 29 (7/8): 63-68.  DOI: 10.11925/infotech.1003-3513.2013.07-08.09
    Abstract   HTML   PDF (590KB) ( 298 )
    The ancient Chinese name is automatically recognized by the machine learning model of Conditional Random Field based on Pre-Qin corpus from a point on the research of text mining and analysis of digital humanities. The training model, the F-score of which is 91.52% in cross-validation corpus, is identified as the optimal performance of ancient Chinese name recognition and experimentally verified based on Pre-Qin corpus containing 187 901 words. The research is not only helpful to extract the named entity from Pre-Qin ancient literature but also beneficial to explore the relationship and background among people in other humanities and social sciences.
    References | Related Articles | Metrics
    Research on Author Name Disambiguation Algorithm in the Literature Database
    Guo Shu
    2013, 29 (7/8): 69-74.  DOI: 10.11925/infotech.1003-3513.2013.07-08.10
    Abstract   HTML   PDF (621KB) ( 381 )
    This paper firstly analyzes a graphical framework for name disambiguation called GHOST, and then provides a modified name disambiguation algorithm combining with the text mining of literature information. The new algorithm is more suitable for literature database, making up for the limitations existed in GHOST. Based on selecting title and publication name as computing feature from the literature information, the experiment shows that the algorithm achieves high precision and recall value, and F1 reaches 84%, which is good enough for name disambiguation.
    References | Related Articles | Metrics
    Reviews on Development of Patent Citation Research
    Chen Liang, Zhang Zhiqiang, Shang Weijiao
    2013, 29 (7/8): 75-81.  DOI: 10.11925/infotech.1003-3513.2013.07-08.11
    Abstract   HTML   PDF (550KB) ( 383 )
    As an important data source for strategic intelligence analysis, not only does patent citation reflect knowledge flow in technological development, but also it can be used in technology front tracking and technology status recognition on both country and organization level. After exploring definition and origin of patent citation, this paper describes the roadmap of patent-patent citation analysis method and patent-paper citation method according to categories of patent citation, and lists some representative works. In the end, the paper summarizes the problems occurred in patent citation analysis and provides some corresponding advices.
    References | Related Articles | Metrics
    Review on Percentile-based Bibliometric Indicator
    Zhou Qun, Zuo Wenge, Chen Shiji
    2013, 29 (7/8): 82-88.  DOI: 10.11925/infotech.1003-3513.2013.07-08.12
    Abstract   HTML   PDF (455KB) ( 341 )
    Percentile is established in bibliometrics as an important alternative to Relative Citation Rate, and then applied to the evaluation of research performance. This paper introduces the background of percentile-based bibliometric indicator and describes the concept and types and advantages of using percentiles in bibliometric. It also elaborates the problems in the calculation of percentiles and further assignment of percentile rank, and further analyzes the application of percentile-based bibliometric indicator.
    References | Related Articles | Metrics
    article
    A New Feature Selection Method Based on Term Contribution in Co-word Analysis
    Hu Changping, Chen Guo
    2013, 29 (7/8): 89-93.  DOI: 10.11925/infotech.1003-3513.2013.07-08.13
    Abstract   HTML   PDF (639KB) ( 404 )
    From the view of data dimension reduction, the method of constructing co-word matrix by high frequent words has a great improvement space. By comparing co-word analysis with traditional text processing including text categorization, text clustering and information retrieval, the authors introduce a new feature selection method based on term contribution and the algorithm description. Through experimental comparison, it is shown that the new method has obvious effect on improving the data quality and cluster result.
    References | Related Articles | Metrics
    Analysis on Statistical Characteristic and Dynamics for User Behavior in Microblog Communities
    He Jing, Guo Jinli, Xu Xuejuan
    2013, 29 (7/8): 94-100.  DOI: 10.11925/infotech.1003-3513.2013.07-08.14
    Abstract   HTML   PDF (1135KB) ( 833 )
    Using the complex network and statistical methods, this paper analyzes the network topology and user behavior characteristics of the Sina micro-blogging on the individual and group levels. The results show that human behaviors have different multi-scaling characteristics. Of which, node degree distribution and microblog-post behavior approximately obey the power law distribution; however the forwarding and comment behavior obeys exponential truncated power-law distribution. Based on this, the interest-driven mechanism and heavy-tail characteristics of the user behavior are studied and some commonalities are botained. It is helpful to the research of public opinion propagation dynamics.
    References | Related Articles | Metrics
    Research on Chinese Patent Automatic Classification Method Based on Statistical Distribution
    Hu Bing, Zhang Jianli
    2013, 29 (7/8): 101-106.  DOI: 10.11925/infotech.1003-3513.2013.07-08.15
    Abstract   HTML   PDF (665KB) ( 262 )
    Traditional text automatic classification algorithm based on Vector Space Model fails to take the distribution information of terms among classes and the position information of terms in class into consideration, which leads to a poor performance of the algorithm in patent classification. This paper proposes a Chinese patent automatic classification method based on statistical distribution. Firstly, this paper puts forward distribution information weighting factor to manifest the weighting of the terms that appear frequently but in less class. Then, combining with the structural feature of patent text, this paper introduces position information weighting factor to highlight the legal and technical characteristics of patent and differences of patent's each element in content. Finally, the contrast experiment shows that the classification effect can be improved sufficiently by this proposed method.
    References | Related Articles | Metrics
    article
    Construction of Keywords-Chinese Library Classification Codes Integrated Thesaurus
    Yang He, Yang Yihong, Li Ning
    2013, 29 (7/8): 107-113.  DOI: 10.11925/infotech.1003-3513.2013.07-08.16
    Abstract   HTML   PDF (1129KB) ( 413 )
    Based on years of massive manual indexing data, this paper constructs a natural language classification thesaurus with Mutual Information (MI), Chi-Square (χ2) and Maximum Likelihood Estimate (MLE) to analyze the corresponding relation between keywords and Chinese Library Classification Codes. The performances of the Keywords-Chinese Library Classification Codes Integrated Thesaurus used for automatic indexing of sci-tech literatures are tested by close and open testing.
    References | Related Articles | Metrics
    The Design and Implementation of Distributed Patent Information Extraction System
    Zhai Dongsheng, Zhang Xinqi, Zhang Jie, Kang Ning
    2013, 29 (7/8): 114-121.  DOI: 10.11925/infotech.1003-3513.2013.07-08.17
    Abstract   HTML   PDF (1091KB) ( 334 )
    As a vital patent source, Derwent patent database offers rich patent resource. However, its output format is limited, and includes patent abstract only. This article designs a distributed Derwent patent extraction system based on multi-agent platform. With it, the patents information is imported into a local database, and the detail information in USPTO can also be acquired. The system is effective,and this study is contributed to make a good information acquisition method for patent research.
    References | Related Articles | Metrics