    Analysis of the Difference Between Digital Curation and Digital Preservation
    Zhang Zhixiong,Wu Zhenxin,Liu Jianhua,Guo Hongmei
    2014, 30 (1): 4-13.  DOI: 10.11925/infotech.1003-3513.2014.01.02
    [Objective] To analyze the difference between Digital Curation and Digital Preservation. [Coverage] Based on the important historical documents and reports related to those two concepts,the authors also investigate the definitions of those two concepts released from major institutes in this research area,such as DCC,JISC and ARL. [Methods] Based on the analysis of history of those two concepts,the authors figure out the underlying causes of those two concepts. Based on the definitions of those two concepts from major research institutes and researchers,the authors analyze the difference between those two concepts. Based on the analysis of those two concepts from 8 aspects,the authors perform a multi-aspect comparison study of the two concepts to identify the difference more clearly. [Results] Although Digital Curation and Digital Preservation have similar meaning,they are quite different two concepts that have lots differences in many aspects. [Limitations] More detailed comparison on the tasks of the two concepts is needed to give more clearly explanation. [Conclusions] Digital Curation and Digital Preservation are two different concepts,but they are complementary. A more active way is needed for the library to carry out digital preservation.
    A Comparative Analysis of Foreign Collaborative Information Search Systems
    Wu Dan,Yu Wenting
    2014, 30 (1): 14-23.  DOI: 10.11925/infotech.1003-3513.2014.01.03
    [Objective] Collaborative information search system is a tool for collaborative information retrieval. This paper provides references to its research and development. [Methods]Analysis methods such as typical case study and comparative study are adopted to analyze four foreign representative collaborative information search systems,which are Annotate!,Cerchiamo,CoSearch and SearchTogether,comparing in the frame structure,supporting technology,function and evaluation. [Results] Explicit collaborative search systems support synchronous retrieval,adjusting by user interface,mostly for C/S structure,using instant messaging and automatic division technologies,which are functional richer.While tacit collaborative search systems support asynchronous retrieval,adjusting by deep algorithm,mostly for multi-level structure,using data or agent technologies,which are functional simpler. [Limitations] Due to the experimental research stage,experience research method isn’t adopted in this paper. [Conclusions] The development of collaborative search systems has no fixed way. User’s functional requirements and corresponding supported technologies should be taken into consideration when designing.
    Research on Plant Growth and Development Stage Named Entity Recognition for Text Mining
    Wang Run,He Lin,Wang Dongbo,Huang Shuiqing,Fan Yuanbiao
    2014, 30 (1): 24-27.  DOI: 10.11925/infotech.1003-3513.2014.01.04
    [Objective] This paper researches in the extraction that identifies plant growth and development stage entity from text. [Context] PDSE is a kind of named entity essentially. Named entities recognition has become one of most valuable basic technologies in Natural Language Processing field,which is used widely in many Natural Language Processing systems. [Methods] It adopts multiple strategies based on conditional random field and rules,with putting forward and realizing a method of CRF template,characteristic function and extraction rules for the features of plant growth and development stage entity. Also,it tests the extraction effect by articles from the PubMed database. [Results] The experiment shows that the proposed hybrid strategies can obtain high accuracy and recall rate. [Conclusions] This research has a certain significant reference for biology text extraction.
    Experimental Study of Multilingual Text Clustering
    Deng Sanhong,Wan Jiexi,Wang Hao,Liu Xiwen
    2014, 30 (1): 28-35.  DOI: 10.11925/infotech.1003-3513.2014.01.05
    [Objective] Analyzing the performance,the crucial points and direction of characteristics translation and LSI in cross-language text clustering. [Methods] Selecting 2736 Sino-British bilingual news text from some bilingual websites,complete the clustering test with these two methods and compare the parameters,such as recall rate,accuracy and F value. [Results] Characteristics translation method improves clustering while the LSI method doesn’t get a good result for its time and space complexity. [Limitations] Samples need to be expanded and the LSI experiment need to be repeated in a high-performance computing environments. [Conclusions] Characteristics translation method need some more effective translation system,and the LSI method need to solve the calculation complexity and the select of the K value,etc.
    A Hierarchical Framework for User Intention Recognition
    Tang Jingxiao,Lv Xueqiang,Liu Chengyang,Li Han
    2014, 30 (1): 36-42.  DOI: 10.11925/infotech.1003-3513.2014.01.06
    [Objective] Any query search engine has its potential query intention,and accurate intention identification can improve the efficiency. [Methods] For the explicit intent queries,the authors employ sliding window strategy to find the maximum common substring for extracting user intent templates and then use the templates to identify the user intention. For implicit intent queries,the authors use a multi-feature integration method to build classifier for the final query intention recognition. [Results] Experimental results show that the hierarchical intention recognition framework can achieve better precision comparing with methods based on classifier,and the accuracy enhances 19.04%. [Limitations] Intention template obtaining is limited,so explicit intention recognition has limitation. For large-scale data,complexity of the pattern match and machine learning algorithm is very high,the algorithm need further optimization. [Conclusions] Experiment shows that this method is valid in Web intention recognition,which has a positive significance for improvement of intention recognition rate.
    Research on Domain Ontology Term Extraction
    Tang Qing,Lv Xueqiang,Li Zhuo,Shi Shuicai,
    2014, 30 (1): 43-50.  DOI: 10.11925/infotech.1003-3513.2014.01.07
    [Objective] Ontology terms are extracted as more as possible for the quality of Ontology construction. [Methods] This paper proposes an Ontology term extraction method based on term component extension. It uses the polymerization characteristics and POS features of the terms,extracts term components by word frequency comparison approach. Considering the factors of term length,term POS and term internal associative strength of character strings,reasonable extended rules are designed for components extension to get the candidate terms. Then,Ontology terms are filtered from candidate terms by using the relational information and the contextual information. [Results] Experimental result shows that accuracy rate is 83.5%,the recall rate is 87%,the accuracy rate is 2.5 percentages over the baseline. [Limitations] It needs a balanced corpus to extract term component,and term extracting effect is effected by the quality of the term. [Conclusions] The method is effective and has a positive significance for Ontology learning and Ontology construction etc.
    The Research and Analysis on Automatic Extraction of Science and Technology Literature Terms
    Zeng Wen,Xu Shuo,Zhang Yunliang,Zhai Juanhua
    2014, 30 (1): 51-55.  DOI: 10.11925/infotech.1003-3513.2014.01.08
    [Objective] In order to improve the efficiency of science and technology literature information organization and retrieval,extraction of science and technology terms is the basic research problem. [Methods] The paper proposes an automatic extraction method based on science and technology terms characteristics and statistical computing. The method fully combines language characteristics and statistical information of terms such as the combination strength between words and the position that appeared in the literature to realize automatic extraction algorithm. [Results] Experimental results show that the average accuracy of scientific terms extraction can reach 51.2%. [Limitations] Statistical computing algorithm and data processing still need further improve for the algorithm and the quality of data. [Conclusions] The proposed method is effective.
    Media as a Community? Literature Based Topic Evaluation in Information Systems Discipline
    Zhao Yuxiang,Peng Xixian
    2014, 30 (1): 56-65.  DOI: 10.11925/infotech.1003-3513.2014.01.09
    [Objective] This paper is to validate the assertion of Media as a Community(MaaC)based on the IS journal publications. [Coverage] ISI Web of Science and AIS journal rankings are used as the data source to examine the topic,and 45 IS journals are selected. [Methods] The study employs the co-word analysis and visualization method to explore the research theme and evolution of social media and online community. [Results] The findings show that the two concepts,i.e.,social media and online community,have been evolved from the original parallel relation to the tangling circumstance. [Conclusions] Due to the smooth linking and overlapping between these two concepts,social media is to some extent as an online community from the literature perspective.
    Technical Strength Evaluation Method Based on Patent Text Data
    Han Hongqi,Gui Jie,Xu Shuo,Liu Yuqin
    2014, 30 (1): 66-71.  DOI: 10.11925/infotech.1003-3513.2014.01.10
    [Objective] The paper aims to propose a method to evaluate enterprise technical strength based on patent text data without citation data. [Methods] Four indexes are used to evaluate technical strength comprehensively,including valid granted patent quantity index,patent growth ratio index,technical centrality index and patent minimum value index. The four indexes reflect technical strength respectively from technical scale,growth,importance and value.[Results] The experimental results on indexes comparison between CII and TII show that citation analysis give higher value to earlier publicated patents. Another experimental results on indexes comparison between TS and TSQGIV show the effectiveness of the proposed method. [Limitations] The enterprise names are not normalized in data pre-process,which might cause errors to experimental results. [Conclusions] Compared with previous methods,the proposed novel method can evaluate technical strength of companies without citation data.
    Chinese Organization Name Recognition in User Query Log
    Guan Xiaoda,Lv Xueqiang,Li Zhuo,Zheng Luexing,
    2014, 30 (1): 72-78.  DOI: 10.11925/infotech.1003-3513.2014.01.11
    [Objective] To solve the problems of query log annotated data shortage and information asymmetry in user query log organization name recognition. [Methods] The paper proposes an automatic method to create training data,which abates the insufficient of user query log annotated data. The authors cite the adhesion features and constructed CRF model to recognize organization names by integrating context information. [Results] Experiments on Sogou user query log show that precision rate can reach 72.80%,recall rate can reach 86.73% and F-measure can reach 79.16%. The method improves F-measure by 30% comparing with the traditional organization name recognition method. [Limitations] The model error using auto-created training set will be greater than standard annotated user query log data.The scale of organization name set will affect the completeness of the model’s context knowledge. [Conclusions] Experiment results demonstrate that the method is effective.
    Explore User’s Behavior of Academic Blog Based on EDTM:Take Blog.Sciencenet as an Example
    Xu Xiaojuan,Zhao Yuxiang,Zhu Qinghua
    2014, 30 (1): 79-86.  DOI: 10.11925/infotech.1003-3513.2014.01.12
    [Objective] This paper explores user’s behavior of academic blog and takes blog.sciencenet as an example. [Methods]:The model of user’s behavior is established,including the use and cease of the academic blog by Ethnographic Decision Tree Method(EDTM). [Results] The paper finds the reasons of using the academic blog include the authenticity of the contents,value and authority,sociability and sharing ideas,and stopping reasons contain the depth of the blog content is not enough and the efficiency of the content does not reach expectations. [Limitations] Since the research is based on the principle of convenience sampling,there are some limitations. They will be improved in the future. [Conclusions] The value of the research is in defining and testing Ethnographic Decision Tree Method,as a supplement of methodology to investigate blog user behavior research. The results not only direct the reasons why people use and give up academic blog,but also have predictive value from the perspective of decision science.
    Design and Implementation of Library WeChat Public Platform Service in Development Mode
    Zhang Bei,Dou Tianfang,Zhang Chengyu,Li Jiefang
    2014, 30 (1): 87-91.  DOI: 10.11925/infotech.1003-3513.2014.01.13
    [Objective] Extend the service channels of Tsinghua University Library and enhance patron experience by designing and developing WeChat public platform service. [Context] The rise of mobile Internet prompts WeChat to become a platform that gets much patron attention. Take Tsinghua University as an example,nearly 80% of the freshmen are using WeChat. [Methods] This service makes use of the message receiving and sending interfaces provided by WeChat public platform and embeds the functionatlities like library hot news search and OPAC search into the WeChat. [Results] Patrons can conveniently use the library’s services and resources via command interaction in social network environment. [Conclusions] This application can enrich the service forms of the library and make patrons closer to the library.
    The Construction of a Citation Retrieval and Analysis Automation System
    Zhang Sufang,Song Hu
    2014, 30 (1): 92-96.  DOI: 10.11925/infotech.1003-3513.2014.01.14
    [Objective] To discuss the construction of a citation retrieval and analysis automation system,the design principles,the basic structure,function modules,the realization and the results are described. [Context] Due to the characteristics of too many manual operations and the diversity of users’ demands in citation retrieval and analysis,the system is designed for librarians,teaching and research staff,and research management personnel. [Methods] The system is developed by using Perl language in the Linux environment. [Results] The automatic acquisition of search results,statistical analysis of citation data,citation list formatting and various choices of other-citing standards can be realized. [Conclusions] The system can improve the retrieval efficiency.
    Application and Implementation of Two-dimensional Bar Code on Library Book Inquiry Machine
    Li Shanjie
    2014, 30 (1): 97-101.  DOI: 10.11925/infotech.1003-3513.2014.01.15
    [Objective] This paper is to display the query book information with two-dimensional bar code on the management program of the book inquiry machine. [Context] Because the SIRSI’s OPAC module cannot provide the function of the searched bibliographic information display with two-dimensional barcode,to improve retrieval efficiency of the library readers on the inquiry machine,using the management program of the book inquiry machine as a carrier platform to achieve the two-dimensional barcode display. [Methods] With the early development of the inquiry machine management program,and using the HtmlAgilityPack and QrCode.Net open source components as a method,implement the two-dimensional barcode bibliographic data extraction and display. [Results] When the reader browses the details of query results,the management program can also display the two-dimensional code at the same time. [Conclusions] The reader’s query efficiency on the inquiry machine is improved significantly.
