[Objective] Based on requirements of standards of trusted digital repository, research on the ingest workflow of trusted preservation system in digital preservation practice. [Methods] Digital preservation system of National Science Library, Chinese Academy of Sciences need to receipt, ingest, archive data from multiple publishers, the ingest workflow is an important part of this system. Based on standards of trusted digital repository, apply the workflow management theory, mechanism of trust chain and trusted workflow management model in designing and developing trusted ingest workflow. [Results] The design and development of the ingest workflow of the digital preservation system are completed. [Conclusions] Basically meet requirements of the ingest workflow of the preservation system with good flexibility, customizability, personalization, expansibility and reusability.
[Objective] Realize the application status, practice and techniques of integration and embedding between ORCID and IR. [Methods] Analyze literature tracing and case analysis of integration practice between ORCID and IR; Analyze the integration features based on common open source soft of IR platform. [Results] Obtain development strategy, promotion mechanism, technical framework of integration between IR and ORCID. Summarize the examples and best practices of usage scenario, embedding work flows and development methods. [Conclusions] IRs in China should design and implement ORCID integration application in different directions of authority control, embedding in work flow and simple reuse data through referring advantage experience and their own needs.
[Objective] To detect the birth, extinction, development, merge and split of topic evolution of the literatures in a certain field. [Methods] This paper divides time windows according to the publication data of the literatures, and LDA model is applied to extract topics from each time window automatically. The topic association filter rules are used to determine evolution relationships between topics in adjacent time windows. Form a topic evolution path in a continuous time period. [Results] Considering the continuity of the topics, different types of topic evolution could be detected with high accuracy. [Limitations] This method fixes the size of time windows without considering the diversity of topic evolution cycles. [Conclusions] This method can effectively reduce the interference of topics with smaller similarity in LDA, and enhance accuracy of evolution relation recognition.
[Objective] Patent keyword indexing plays an important role in nature language processing and is widely applied in many fields, such as patent retrieval, translation and automatic summary. [Methods] Using K-proximity coupled graph to transfer patents into complex graph model, and average connectivity weight is proposed with the average path variation, the average clustering coefficient, and the current node's liquidity effect. Considering the location information, the word-gap information and the inverse document frequency of keywords, a patent comprehensive correlation calculation method for quantitative analysis of keyword importance is proposed. [Results] Experiment of patent literatures in sensor domain obtains the precision of 60.9% on top-8, and the recall rate of 73.4% on top-10. [Limitations] The result of keywords with low frequency is not good enough, which affects the indexing result. [Conclusions] Experimental results show that this method is effective and has active significance for patent indexing.
[Objective] Accurately calculate the credibility of the Wikipedia entry. [Methods] This paper builds a trust evaluation model which makes a comparison between the current version and their historical version by the text analysis to obtain each version of the edior's effective edit content, and combined with the number of reference and image in the current version of the Wikipedia article. [Results] It shows that the model is able to distinguish the high trust Wikipedia article and low trust through empirical research. [Limitations] The entry level threshold by this algorithm is not very obvious to distinguish the two types of B level and C level. [Conclusions] The algorithm is simple and effective, and can understand the changing process of entry from the microscopic level, dynamically compute its trust value.
[Objective] Improve the text categorization accuracy by modifying the weighting approach in feature selection. [Methods] Introducing the inner and outer categorical information, and modifying the TF-IDF weighting, this paper proposes the TF-IDF-CD approach which based on the categorical description. Combining TF-IDF-CD with varied classifiers, such as NB and SVM, this paper conducts text categorization experiment in balanced corpus and unbalanced corpus respectively. At last, the accuracies of different weighting approaches are compared with TF-IDF-CD. [Results] The TF-IDF-CD performs well even when there are a less number of feature items. Compared to the TF-IDF, when combined with varied classifiers in different corpus, the TF-IDF-CD can greatly improve the average accuracies. The minimum increase is 14%, and the maximum up to 30%. Compared to the CTD approach, when combined with NB, SVM, and DT, the TF-IDF-CD could improve the the average accuracy of TC from 1% to 13%. But, in unbalanced corpus, when combined with KNN, the performance of the TF-IDF-CD is 2% lower than CTD. [Limitations] Combined with KNN classifier which is sensitive to the skew data, the TF-IDF-CD needs to be improved to resist the skew characteristics of unbalanced corpus. [Conclusions] Experiment resualts show that the TF-IDF-CD approach is effective.
[Objective] To solve the problem that recommender systems recommend outdated information resources to the target user. [Methods] This paper proposes an individual recommendation method for information resource based on dynamic tag resources network graph. Firstly, resource network graph is established to form resource semantic relationships, using common tags in two resource objects as a link pairwise. Secondly, tag network graph with time is created to describe users' interest drifting using the links in resource network graph. Thirdly, top N information resource objects are recommended to target user from tag network graph by matching target users' dynamic tags describing users' interest drifting. [Results] In MovieLens data set, the experimental results show that this information recommendation method can trace and predict users' interest drifting, and recommend accurate resource to users. Mean Absolute Error (MAE) is lower than the traditional methods by about 15%. [Limitations] The method does not involve the problem that information resources are recommended under real-time dynamic environment such as information retrieval with users' interests drifting rapidly. [Conclusions] The proposed method can recommend more accurate information resource to users with interest drifting.
[Objective] Improve the library recommendation service and help readers select interested books. [Methods] This paper proposes an Ontology-based and location-aware book recommendation model in library by applying Wi-Fi indoor positioning technology, which constructs user's profile based on the books classification Ontology, and then recommends books by combining regional group profile and considering the problem of when to make a recommendation. [Results] The proposed method outperforms the existing Ontology-based hybrid recommenddation method in accuracy and correlation by 13.6% and 21.8% respectively, and shows 48.03% increase in the set diversity compared with the content-based filtering method. [Limitations] The weights of user preferences and regional group preferences in the recommendation model are not discussed. [Conclusions] This research can improve the library recommendation service and provide location-aware personalized book recommendation.
[Objective] It's vital to detect the consumers' shopping needs in the e-commerce environment by mining clickstream logs so as to achieve effective shopping guidance. [Methods] This paper first marks page types that users visit in Taobao.com, then uses K-means cluster to analyse the visit session data. Two clustering indexes are used, that are page-type and page-complexity. [Results] Based on page types, the visit sessions are clustered to four user need states, including direct management, continuous searching, product browsing and information seeking. The four types are then categorized into nine detailed ones based on page complexity. [Limitations] The effectiveness of the user need state analysis needs to be further validated in real-world environment. [Conclusions] It is an effecitve and operable method to detect and denote the e-shopper's need states by clustering analysis of the visiting sessions.
[Objective] This paper aims at constructing the search engine optimization indices to guide the better construction of industry websites and enhance the core competitiveness. [Methods] Construct the ‘Eco - Search Engine Optimization' indices system from the perspective of information ecology and use the AHP to make an empirical analysis of 10 representative cloud storage sites at home and abroad. [Results] The empirical results show that technology maturity of hardware and software create favorable industry environment for the development of industry websites. The eco-construction of overseas search engine optimization is paid more attention than that of the domestic search engine optimization. [Limitations] Only chose cloud storage sites as the samples. The number of representative industry website is relatively few. [Conclusions] At the theoretical level this paper provids a new research perspective for the construction of industry website. At the practical application level it is also able to guide the industry sites to evaluate the efficiency of the optimization about search engine.
[Objective] To help Chinese researchers acquire an ORCID ID and manage academic achievements. [Context] ORCID aims to solve the name ambiguity problem in research and scholarly communications, which also troubles Chinese academic circles in a long time. Collaborating with ORCID, to meet the needs of Chinese researcher's persistent unique identifier. [Methods] Through the ORCID API, help Chinese researchers acquire an ORCID ID. Integrated and linked with mutiple third-party systems such as CAS IR, CSCD, Web of Science, show scientific research achievements and create an academic profile automatically. [Results] Register a unique ORCID identifier for researcher. Researchers could manage their personal information and scientific outputs, and synchronize these data with ORCID. [Conclusions] A large number of scientific research institution and universtity researchers are taking use of iAuthor to obtain ORCID IDs, which lays a good fundation for solving the name ambiguity problem.
[Objective] Avoiding the repeat of tools construction, this paper aims to realize the efficient reusing and sharing of tools related to knowledge organization, which are outputs of Science and Technology Knowledge Organization System (STKOS). [Context] The Construction and Application Demonstration of Knowledge Organization System Oriented Foreign Scientific and Technical Literature Information is one of the National Science & Technology Pillar Program projects during the Twelfth Five-year Plan Period, which aims to construct a multilayer system of science, engineering, agriculture and medicine involving thesaurus, Ontology and category. During the construction of STKOS, a large number of tools are developed. [Methods] The Equinox OSGi core framework implementation is used to build a plug-in integrated service system for knowledge organization tools. Plug-in automatic packaging process is designed and implemented, tools and plug-ins are stored in files and databases, and a plug-ins allocation mechanism based on jBPM workflow is proposed in this paper. [Results] The construction of STKOS is completed, including the related tools storage and release system, the tools integration frame and components and the knowledge organization tools integration based on workflow. [Conclusions] The related tools integration, standardized management and sharing of STKOS are realized.
[Objective] The article constructs a lightweight context-aware recommendation service platform based on GimbalTM of Qualcomm. [Context] Acquiring users' geographic positions and interests from mobile terminals, and then providing context-aware personalized service can improve user experience in library. [Methods] Select Gimbal SDK in Android environment to develop client application of academic library context-aware service, and set up Gimbal Manager parameters, including Geo-fences, communication triggers and information service content. Then, Gimbal Manager acquires user context and interests actively and pushes information content according to triggers condition. [Results] When the Android mobile phone users who install client applications enter different Geo-fences, they will receive information corresponding to personal interests pushed by Gimbal Manager. [Conclusions] The platform can provide context-aware personalized service and improve library service quality.