[Objective] This paper trends to research the efficient methods and means to use Institute of Software Chinese Academy of Sciences Institution Repository(ISCAS-IR), design the solution to support research service through analyzing the needs of ISCAS-IR, and then provide reference for IR application in supporting scientific services. [Context] National Science Library Chinese Academy of Science started the construction of institutional repository in 2009, ISCAS participated to become one of the first demonstration institutions. On the basis of completing the data storage, ISCAS explores the effective ways to support ISCAS research service with IR. [Methods] According to the need of data organization in ISCAS-IR, the authors design the proposal satisfied the need of knowledge service on data extraction and integration, supply the technology for ISCAS research supporting service. [Results] Making use of the ISCAS-IR effectively, the approach for researcher and scientific management department to acknowledge the research output is supplied, and the statistics problems of research output by hand is solved. [Conclusions] This paper supplies the method and practical proposal for the IR users on analyzing the research needs, making use of IR data effectively and improving the application of IR data.
[Objective] Build the association relationships between authors and items in the Institutional Repository. [Methods] Match authors and items by computer, and send the information to related authors to confirm. [Results] Establish the identity' alias library, distribute unique ID to each author and solve the problem about accurate matching the authors and their works. [Limitations] The metadata for this function need higher quality and the process of informations collection would depend on large manual participation. [Conclusions] This study not only accurate matches the authors and their works, but also provides accurate data for developing more knowledge service.
[Objective] To solve the data synchronization problem between service provider and data provider existed in the construction process of the Federated Institutional Repository of CAS. [Context] The Federated Institutional Repository of CAS is built on the OAI metadata interoperability interface and provides accurate and effective data to user only if the service provider keeps pace with the data provider. [Methods] This paper extends the OAI interface and implements functions such as resource sets updating, map relation updating, invalid data detection and customizes a new metadata format and operation. [Results] The extended OAI interface can effectively realize the data synchronization of resource collections and items between institutional repositories and support data exchange and sharing of the complicated metadata format. [Conclusions] This method effectively solved the practical problem, and can be referenced by similar systems.
[Objective] Practical methods are investigated to address common problems regarding lack of contents and low utilization that Institutional Repositories (IR) are confronted with. [Context] Most of IRs worldwide face the challenges regarding shortage of data/contents and limited customer service functions. To tackle these issues, extensive collections of contents are added to China Agricultural University Institutional Repository (CAUIR), enabling the delivery of extended services. [Methods] Based on successful implementation of extensible IR at CAU, technical details are presented on content construction, and extension of service function. Usage data of CAUIR proves that extending IR services can improve utilization. [Results] CAUIR has been extended to thirteen topics, and provides a series of services. Over the past six years, total of user logins has been reached 11.29 million, and five thousand user logins are carried out per day on average. [Conclusions] Practice in the construction of CAUIR proves that expanding IR service function for ordinary users is an effective measure to increase IR utilization.
[Objective] To ensure the trustworthiness of cloud library virtualized environment which contains users' resources and services by trustworthiness validation. [Methods] By establishing a validation model in which the Third Trusted Party respectively validates the cloud library platform providers and users, this paper designs and implements the trustworthiness validation process of cloud library virtualized environment. [Results] This paper can ensure the trustworthiness in process of request, allocation and startup of virtual machine with little overhead, and ensure that the virtual machine assigned to users is trusted. [Limitations] The overhead of validation in virtual machine's usage and migration needs to be verified. [Conclusions] The research can ensure trustworthiness of virtual machine to build the trustworthiness between cloud library users and platform providers.
[Objective] Explore the current status and development trends in future with the change of numbers of cloud services. [Methods] Grab entries of cloud service concepts of Google search engine in last 12 years, using time series analysis. [Results] Cloud services can be divided into steep-type, pulse-type and wave-type categories. Steep-type cloud service will continue to grow, but the growth trend will slow down. It is unlikely for pulse-type cloud service to grow.The development trend of pulse-type cloud service is instability. [Limitations] Analyze trends of cloud services only from the time dimension. [Conclusions] From laboratory to market, cloud services may develop in the direction of centralized management services in the future, there will be industry-specific customization of cloud services.
[Objective] Find the collaboration pattern of domain Ontology construction based on FCA with the concepts and techniques in cloud computing environment. [Methods] Partition the formal context with the techniques in MapReduce framework in cloud computing environment, construct concept lattices locally, use the ideas from experts and users to modify domain Ontology. [Results] Successfully design a new collaboration pattern for construction domain Ontology in cloud computing environment, improve degree of automation of the process of construction, people with different privileges participate in the process. Thereby increasing the efficiency and quality of Ontology. [Limitations] This collaborative pattern is still in the conceptual design phase. The pattern requires multi-users participation in practice under cloud computing to improve collaboration solutions. [Conclusions] The construction of domain Ontology based on FCA can be extended in cloud computing environment. People from different levels can modify the Ontology via the collaboration.
[Objective] This article contributes to the development of ManGO (Mandarin Grammar Online) for deep linguistic processing. [Context] On the platform of LKB (the Linguistic Knowledge Builder) and based on Grammar Matrix, ManGO is developed in the environment of DELPH-IN (Deep Linguistic Processing with HPSG Initiative). The frameworks of its syntactic and semantic analysis are HPSG (Head-driven Phrase Structure Grammar) and MRS (Minimal Recursion Semantics) respectively. ManGO lays a solid foundation for further resource grammar development and commercial application. [Methods] First, linguistic knowledge is formalized according to systematic Ontological studies. Then, the computational implementation of ManGO goes through grammar customization, creation of a Chinese MRS test suite, lexicon building, definition of grammar rules and MRS representation. [Results] ManGO covers nearly all the major Chinese word types and grammar phenomona, and fully covers the Chinese MRS test suite. [Conclusions] ManGO is one of the earliest medium-size computational grammars of Chinese. It serves as the bridge and effective carrier of the interdisciplinary studies across formal grammar theory and computational linguistics.
[Objective] This paper proposes a framework of the intent-oriented intelligent search engine system, and studies the key content ranking algorithm in detail. [Methods] This paper reinvents the search engine algorithms based on the user search intent in three aspects, i.e., content storage, content retrieval and content ranking, and considers multiple factors in the content ranking algorithm, including relevance, reliability, variety and hotness of the content. [Results] Experiments indicate that the relavence of the search results from the intent-based intelligent search algorithm has stably better performance which dominates the traditional keywords-based algorithm. [Limitations] Building intelligent search engine is so complicated that there are still many technical and engineering problems to resolve. Much more experiments need to be conducted to futher verify and improve the content ranking algorithm. [Conclusions] This research lays a foundation of building the next generation intent-oriented intelligent search engine.
[Objective] According to the succinct and hierarchical character of scholarly article outlines, this paper concentrates on finding a method to extract important and meaningful phrases from the outlines. [Methods] This paper first adopts a combined method of linguistic rules and terminology dictionaries to identify the candidate phrases. Then, it calculates tf-idf based on syntactic dependencies between phrases, and quantifies the hierarchical feature according to hierarchical structure of outline. At last, it combines the tf-idf and the hierarchical feature to rank candidate phrases, and selects the keyphrases. [Results] Experiments show that the F-score of the candidate phrases identification reaches 89.57%, and the F-score of candidate phrases selection reaches 36.89%. [Limitations] In this method, the inadequate phrase extraction rules and the empirical values involved in weight setting during tf-idf calculation lead to non-optimal effect. [Conclusions] This method can effectively extract the keyphrase from outlines, and is suitable for keyphrase extraction from hierarchical structure.
[Objective] Under the computing mode of machine learning, using the methods of feature weighting and shallow-hierarchical classification can effectively achieve Chinese Library Classification (CLC) classification for periodical articles. [Context] The traditional way of artificial classification shows its own limits in the background of "Big Data", and the trend of periodicals electronic makes that automatic classification techniques can effectively relief the pressure of artificial classification jobs. [Methods] This paper introduces the thinking of machine-learning into the field of automatic classification of periodical articles. It analyzes and compares the effects of Support Vector Machine(SVM) and BP Neural Networks Algorithm(BPNN) in the procedure of automatic classification, transforms CLC into another classification system with three levels in the thoughts of hierarchical classification, and sets the weights based the sources of classification features. [Results] The experiments of classification tests show that SVM is more reasonable than BPNN under the condition of large-scale sparse data, the accuracy rates of these three levels reach 95.05%, 92.89% and 89.02%, and the integrated accuracy rate is close to 80%, and the feature weights from mulit-sources can lead to better classification results than single-source. [Conclusions] The study proves that the model of machine-learning with feature weighting and shallow-hierarchical classification in automatic classification of periodical articles has higher feasibility, rationality and effectiveness, and a new idea on automatic classification of periodical articles has been presented.
[Objective] Highlight core characteristic words directly by reducing the high-dimensional co-matrix sparely in co-word analysis. [Methods] This article proposes, based on the Penalized Matrix Decomposition (PMD) method, a method to extract core characteristic words from texts of characteristic words.The authors experiment on articles which are related to university libraries that take advantage of SNS, and use Matlab R2012a to decompose high-dimensional co-word matrix by PMD. [Results] By using PMD method, 65 core characteristic words are extracted from all 1648 characteristic words, which more than 34 characteristic words that extracted by the principal components analysis, and also reveal research hotspots of the university libraries using social networks. [Limitations] The authors don't refer to all the characteristic words that acquired from literature, and have a certain subjectivity. [Conclusions] Converting into sparse matrix by PMD, core characteristic words are comprehended and explained more easily, meanwhile, they can show some marginal subjects.
[Objective] This study investigates the relevance between numbers of patent cooperation and rates of culture-embedding change, and explores how cultures embeds perturbation influences enterprises' patent cooperation ability. [Methods] The study gets numbers of cooperative patents of six regional marine engineering equipment patents leading enterprise cooperation through the China patent inquiry system, and obtains the data on culture embedding combined with the questionnaire, and uses the dynamic evaluation model with the speed characteristics, which is proposed in this paper, to evaluate enterprise patent cooperation ability. [Results] This method not only avoids the direct impact on the state of the patent cooperation by culture-embedding change, but expands the otherness of evaluation between enterprises. Traditional methods can only distinguish three kinds of ranking, but six kinds of ranking can be distinguished by using this method. [Limitations] The influencing factors of enterprise patent cooperation ability are not fully involved, and the data sample confines to a specific industry which needs further development. [Conclusions] This study is helpful to handle patent cooperation ability evaluation work in some distribution malformation or data discretization conditions, and promots cooperation patent of metrological work smoothly.