[Objective] Conduct a comprehensive analysis of scientific metadata standards and build a common metadata standards design model. [Methods] Make an overview and analysis of six typical metadata standards in different research fields and design common metadata standards of scientific data based on statistics. [Results] There are many obvious differences in format, organization, expression of metadata standards of different research fields, but there are also some similarities in its elements. [Conclusions] Discipline-oriented metadata standards of scientific data promote the development of scientific research, but also pose a challenge to the unified management and service of scientific data. Based on the statistics of metadata standards elements in different research fields and build a common metadata specification is an idea to solve this problem.
[Objective] For policy implementation recommodation, this study focuses on life sciences data repositories policies. [Methods] By artificial reading, screening method to investigate 38 data repositories in the field of life sciences having a clear policy statements, this paper summarizes and analyzes these policy statements mainly from data repository in data submission, data management and data using. [Results] The stakeholder groups of Subject Data Repository (data administrator, data contributors and data users) have different data rights management specification. [Limitations] Just simply investigate 38 cases in the field of life sciences, without analyzing the temporal change of policy elements and lacked policy implementation details. [Conclusions] Good policy system of subject data repository should include: data submission policy (including content definition, format specification, source requirements and belonging instructions), data management statement (including data disclosure, data registration, disclaimer and data version management), and use of data specification (including data access, data recommended reference and data licensing)
[Objective] Service recommendation of the content of traditional digital literature resources is unable to fully exploit the user potential information demand and the ratings matrixes are always sparse. This paper provides an algorithm using collaborative filtering algorithm and association semantic link. [Methods] A recommendation algorithm for the content of digital literature resources is proposed by using the association semantic link and collaborative filtering algorithm. [Results] The experimental result shows that the algorithm can overcome the problems of the potential information needs of the users and the sparsity of the matrix. [Limitations] Lack of large-scale collection of digital resources, and the experimental cases are few. [Conclusions] The algorithm can fully exploit the users' information demand and generate the literature recommendation information. Finally, the validity and practicability of the proposed algorithm are verified by experiments.
[Objective] Researching the relationship between users' preference mining granularity and mining efficiency in collaborative filtering, this paper aims at finding out the most efficient mining granularity. [Methods] According to the practical application, the users' preference mining granularity is divided into three kinds from coarse-grained to fine-grained, and then design the corresponding preference mining algorithm under the three kinds of granularities, finally contrast users' preference mining efficiency under different granularities through experiments. [Results] Experimental results show that the preference mining efficiency reduces as the users' preference mining granularity changes from coarse to fine. [Limitations] Data only includes users' consumption data and rating data, other types of data are not covered temporarily. [Conclusions] Coarse-grained preference mining is better for discovering users' preferences.
[Objective] Explore an opinion evolution model based on the information dissemination of micro-blog. [Methods] Analyzing three kinds of user behavior in the micro-blog network (including publishing, review, forwarding), this paper proposes a new opinion evolution model, which introduces the concept of Sensitivity and Activity to measure user's enthusiasm for getting new information and discussing with others. Based on the NetLogo platform, this paper discusses the influence of the parameters on the result of evolution firstly, and then contrasts with HK model by computer simulation. [Results] The trust threshold has the effect on the user's opinion. Sensitivity has a promotion effect on the communication of information. Activity can speed up the dissemination of information and promote user's opinion to be stable. [Limitations] At present, the research of the opinion dynamics is mainly based on the theoretical analysis and the experiment, so the model also need to expand data size to verify the adaptability of the theoretical model. [Conclusions] The presented model is based on the behavior of micro-blog users. The experimental results show that the model can describe the complex information dissemination and the update of the opinion in the micro-blog network.
[Objective] Opinion mining in product areas draws more and more attention and becomes a hot research topic. The outcome of opinion mining can be used widely just like harmful information filtering, society opinion analysis, user consumption guidance and product improvement and so on. Implicit feature identification plays an important role because implicit features are common in network comments and the identification of them is difficult. [Methods] This paper uses the comments against a certain automobile brand which only have the explicit features to get refined multi-POS opinions and generate opinion clusters by using Synonyms Forests. Meanwhile identify opinions based on field common phrases. Dictionary in the form of {Feature, Opinion, Weight} is generated by using features and opinions, and the weight is calculated. Then deploy explicitly multi-strategy property extraction algorithm based on a dictionary and consider similarity of the opinions in unmatched comments including implicit features and dictionary. [Results] Implicit features can be extracted effectively and the F-value is 75.55% which reaches the good result of the identification of implicit features. [Limitations] Data labeling is a time-consuming job. [Conclusions] Experiment of the new algorithm shows positive result and has some practical value.
[Objective] In order to select competitive products in the market and mine useful information for both enterprises and customers. [Methods] This paper proposes a model of sentiment analysis based on comparative sentence, which can compute feature scores of comparative products and visualize the comparative relations between these products. In order to verify the effectiveness of this model, an experiment on smart phones is conducted with the help of Baidu search engines. [Results] The experiment selects 9 pairs of competitive smart phone products from 28 pairs, thus the results can help smart phone enterprises identify competitors. And also visualize the comparative relations between these products and provide suggestions for customer purchase. [Limitations] The accuracy of feature extraction is not high. The recognition rate of comparative sentence in this model need improvements. [Conclusions] The result of this experiment is consistent with facts, which proves the effectiveness of this competitiveness analysis mothed presented in this paper and its great value to enterprises.
[Objective] Based on the comprehensive medical Ontologies, this paper proposes a new algorithm to enhance the precision of semantic similarity estimation of medical terminology. [Methods] On the basis of the hierarchy and semantic relationships of concepts of SNOMED CT and MeSH, the semantic parameters such as depth and distance are extracted. Then the depth factor and the distance factor are obtained weighted by the concept density, and the function of semantic similarity is thus established. [Results] The algorithm is applicable to both distinctive medical Ontologies, and the experimental results demonstrate that this algorithm has higher correlation coefficient with manual scoring versus conventional algorithms. [Limitations] This algorithm is subject to hierarchy of Ontologies. [Conclusions] The new algorithm benefits the enhanced precision of semantic similarity estimation of medical terminology.
[Objective] This paper aims at increasing the accuracy, and improving the satisfaction of question answer system. [Context] In the field of Natural Language Processing, question answering system has become an important research point, but the accuracy of system is low at present. How to improve the satisfaction of the system becomes the burning question. [Methods] This paper analyzes the source code of ALICE for modification by using the Chinese word segmentation. Based on the analysis of its internal reasoning, this paper puts forward a recommend method. [Results] Integrate the domain Ontology into ALICE robot, then analyze the user question, extract key words. Finally, search the Ontology and then give the recommends. [Conclusions] Experiments show that after introducing Ontology of recommended results, customer satisfaction is increased greatly.
[Objective] Establish Chinese Plant Species Diversity Domain Ontology. [Methods] With BFO as the upper Ontology, this paper takes KACTUS method as a reference to build the Chinese Plant Species Diversity Domain Ontology by reusing PO. The specific process includes cutting and consolidation of PO, increase of entities, accretion of relations, Chinese localization of terminology and filling of instances. [Results] This paper establishes a Chinese Plant Species Diversity Domain Ontology which includes 720 entities and more than 4 000 instances. Furthermore, some knowledge fragments on description of Feronia Limonia from “Flora of China” are expressed based on the Ontology using OWL. [Limitations] The Ontology does not exhaust instances due to the lack of a perfect field dictionary. [Conclusions] The Chinese Plant Species Diversity Domain Ontology can support the formal representation of knowledge on plant species diversity.
[Objective] This study aims at designing an appropriate curation process to deal with cross-disciplinary data management in environmental health field in a stable and sustainable manner. [Methods] Referring to Digital Curation Center (DCC) Curation Lifecycle Model, the authors formulate environmental health data processing procedure in a standardized workflow and make the contents of each module with rigorous definition. [Results] The workflow is applied to curate climate data and hosptial registry data, that provides backend support for the environmental health part of the medical knowledge service system. The result shows it could practically help manage cross-disciplinary data. [Limitations] Due to the diversity of demand, the workflow needs further specification in data model, data standardization, etc. [Conclusions] The workflow could effectively incorporate curators with different backgrounds, take into account both the data quality and data size, and help curate cross-disciplinary data.
[Objective] A Zero-Watermarking Algorithm which has superior performance on transparency and real-time is designed in order to protect the copyright of color image resources for libraries, museums and archives under Internet. [Context] The algorithm can improve the visual quality of color images, while meeting the real-time demand of copyright protection under Internet. [Methods] The surf points and vectors of original color image are extracted. Then, the zero-watermarking sequence is produced by comparing the cosine angles between the surf vectors and a secret reference vector. At last, the copyright mark is obtained by Arnold transform. [Results] The algorithm which has the robust ability to resist on image attacks, while the coefficient of BCR can keep above 0.7. The proposed scheme can also protect and identify the copyright of color images in real-time. [Conclusions] The study is helpful to the protection of color images copyright and can promote the sharing of digital information resources in libraries, museums and archives.
[Objective] To solve the problem of excessive downloading of digital resources in university libraries, design digital resource monitoring and management system based on the network sniffer technology. [Context] There are some defects in the existing solutions for the excessive downloading problems. To compensate for these defects, the optimization solution scheme based on the network sniffer technology is proposed. [Methods] This paper introduces network sniffer technology to constraint electronic resource excessive downloading. Taking the digital resource monitoring and management system of Dalian University of Technology Library as an example, it describes the technical support principles, design thinking and modules achievement. [Results] Under the premise of not affecting the topology structure and users' habits of the original network, this system can identify and record the readers' access and download to electronic resources, and can finally effectively prevent the occurrence of the event of excessive downloading by warning and even blocking the shield of the suspect users of excessive downloading. [Conclusions] The digital resource monitoring system based on the sniffer technology can accurately monitor the digital resources and effectively prevent the occurrence of the event of excessive downloading.