[Objective] Support automatic push and routing of open access articles from multiple publishers to multiple repositories of funders or research organizations. [Methods] Summarize the challenges of author self archiving for authors, challenges of push services to publishers, and analyze the needs of route services. [Results] Propose the concept of the iSwitch, a router service system, suggest the functional modules for ingest, affiliation/funder identification, routing, and data management, set out the standardization requirements for the system design and the collaboration requirements for its operation. [Conclusions] iSwitch can automatically receive open paper from presses, identify the authors of the paper, the author agencies and funding agencies, then automatically pushed to the corresponding institutional repositories, effectively guarantee preservation and spread of institutional achievements.
[Objective] Provide workflow and standards description for router service engine iSwitch of open access articles. [Methods] Analyze the workflows, key technical needs and applicable standards according to the requirements of publishers, the router and the receiving organizations during the push and routing process. [Results] Describe the iSwitch tech workflows, required information items from the publisher to iSwitch and from iSwitch to the recipient, and suggest the standards to be used. [Conclusions] Open access articles and its metadata description, packaging, transmission should follow the corresponding standards in iSwitch workflow.
[Objective] A thorough summarization is done on the latest development of Visual Analytics. Further application into library and information science areas is discussed. [Methods] Firstly several characteristics of visual analytics are compared, then based on VAST papers past five years, the paper summarizes from five aspects including sensemaking, text analytics, high dimensional data visual analysis, spatial and temporal analysis, and application cases. [Results] The basic principles and interdisciplinary attributes are explored. It's found that visual analytics studies are mainly conducted from angles of developing new algorithms, improving existing models and changing research perspectives etc. [Conclusions] Visual Analytics researches focus on constructing sensemaking basic algorithms and design principles, making breakthroughs in text analytics, high dimensional data, and spatial and temporal data analysis. Visual analytics is highly application oriented and widely used, and provides methodological support for information service, especially the intelligent service, although it is still in the developing stage.
[Objective] An intuitive navigation is provided to users by the text visualization of clustering results in the domain knowledge base. [Methods] The visual navigation of the texts in the domain knowledge base is realized by the procedures of topic discovery, dimensional reduction and visual display based on the automatic multi-level text organization by clustering. [Results] An algorithm of topic extraction named TF-ICF is put forward, and the visual display of domain knowledge base is realized by the optimized tree map and scatter diagram to help users know about the overview of knowledge base, find the required topics, understand the relation between different texts. [Limitations] The visual display partly depends on the manual participation, and the interaction of the visualization needs to optimize further. [Conclusions] The visualization method is applied successfully in domain knowledge base and helps to optimize the users' experiences further.
[Objective] To explore the form of the reference networks via the analyzing how the references are cited and disbuted in the content of the academic articles. [Methods] Based on the structured data of 575 academic articles, utilize content extraction, similarity computing and other technologies to build the networks of every single article's references and combine examples to analyze the interrelations among them and to find out the reasons. [Results] Some negative connections exsist between the similarity of references and their relative distance. Diversification and different models exist in the reference network of a single article as well. [Limitations] Some parts of the full-text data are not accurate enough, which affects the results of the experiment.The evaluation of the relative distance among references in this study lacks accuracy. Deep mining of the texts is needed to solve the problem. [Conclusions] From the results, the reference network structures can be roughly classfied into three categories, and the causes are different. The reference network of single academic article needs more studies.
[Objective] This paper carries on the research and experiment on the feasibility of applying TimeML to the annotation of temporal relations in Chinese text. [Methods] According to the characteristics of Chinese temporal expressions, this paper discusses the applicability of the main labels of TimeML in Chinese text based on TimeML and its main labels. [Results] Although there are some differences between Chinese and English in the grammatical structure and syntactic structure, the application of TimeML to the Chinese language is feasible. [Limitations] The main labels of TimeML can't be completely parallel implemented to the English-Chinese text on the grammar structure because of the differences of language structure between Chinese and English. [Conclusions] TimeML, a markup language of temporal relations in English text, can be effectively applied to the annotation of temporal relations in Chinese text. The study lays the foundation for the temporal ordering inference of events and further TRR research in Chinese text.
[Objective] This paper improves the methods of text classification based on VSM using semantic increment, and the model is verified by experiments. [Methods] Combing the studies of semantic vector and its improvement in text representation, this paper improves VSM based on semantic increment, and proposes an implementation frame of semantic vector representation of texts. Furthermore, based on the mapping relationships between words and concepts in domain Ontology, the construction of concept hierarchy tree and words positioning are constructed, semantic similarity of concepts is calculated, and the semantic vector model of texts' representation is achieved. [Results] The comparative experiments of texts classification demonstrate that the proposed method is feasible and effective, and the performance of this method is better than traditional methods from the perspectives of Precison, Recall and F1-Measure. [Limitations] The description of text semantic information is not good enough, and it is necessary to explore the authentic semantic methods in text modeling. In addition, more comparative experiments on several datasets should be conducted in order to obtain more accurate results. [Conclusions] The semantic improvement on traditional VSM is explored which is important for further text classification and semantic association.
[Objective] This paper researches the flow and framework of commodity information recommender system in the absence of consumers behavior information. [Context] Recommender system is an effective means to reduce information overload. But for the overreliance on consumers behavior information, it may have the cold-start problem, and raise consumer's privacy concerns also. [Methods] With the help of commodity domain knowledge, the interactive recommender system ascertains the consumer's commodity quanticational attribute requirement according to the rough use demand, and then recommends the right product information to the consumer. [Results] A prototype system is designed for experimental study, and the results show high customer satisfactions. [Conclusions] The proposed method can solve the cold-start problem and consumer's privacy concerns to some extent.
[Objective] Organize and analyze the approachs of topic evolution model based on topic model, summary the advantages and disadvantages of all models, then introduce this methods into the fields of information analysis. [Coverage] The literatures are obtained from "Google Scholar" and "Web of Science" by the keywords/topics of "Topic/Theme Evolution"、"Time Topic Model" and "Dynamic Topic Model" together with citation searching, and 25 literatures are used as references at last. [Methods] Explore the implementation mechanism, functional characteristics, advantages and disadvantages and the fields of application by literature analysis. [Results] The current models focus on researching the variable topic number, online processing and continuous time span, many models have one or two functions and could meet most of the applications. [Limitations] Some specific implementations of the models are lack of depth analysis. [Conclusions] The task about evolution analysis of various text source, granularity and time spans should take account of the concrete requirement, so as to apply the appropriate model according to its features.
[Objective] The current recommendation algorithm based on trust confronted with these issues: the explicit trust value is not accurate enough, the implicit trust value is hard to measure, the trust propagation path is not easy to determine. For that this paper presents a recommendation algorithm based on random walk in the trust network. [Methods] The algorithm uses the Bipartite network-based projection structure measure trust value between users, then these values are the formation of the transition probability matrix for random-walk with restart in the users projection network, random walk process does not stop until the trust distribution tends to be steady, namely the trust maximum entropy is achieved. The trust distribution at this time is the final trust matrix. [Results] The experiments on MovieLens dataset show that the improved Bipartite recommendation algorithm by adding user preferences significantly improves the Mean Absolute Error (MAE), Mean Reciprocal Rank(MRR) and normalize Discounted Cumulative Gain (nDCG) compared to other algorithms. [Limitations] Due to the cold start problems in the Bipartite network-based projection algorithm, this algorithm suffers from the new user/new item problem also. [Conclusions] That is to say, this algorithm can make the recommendation more accurate and successfully recommended objects rank in the front of the list, so this algorithm has a strong application value.
[Objective] In order to improve the accuracy of identification results, according to the characteristics of coordinate structures in Chinese patent literature, this paper presents an identification method combining rules and Conditional Random Fields(CRFs). [Methods] According to the characteristics of coordinate structures, using the rules to extract the symmetrical coordinate structure. Bundling the coordinate structures, using CRFs to identify non-nest coordinate structure. On the basis of the above identification results, using the wrong driver method to deal with the identification results to get the final identification results. [Results] The experimental results show that this method can identify the non-nest coordination in the patent literature effectively and get the F value of 76.57%. [Limitations] Rules used in the experiments can be further improved. The application of the rules directly affects the identification results of coordinate structures. [Conclusions] The identification method by combining rules and CRFs is effective for non-nest coordination in Chinese patent literature.
[Objective] In order to remedy the defects of traditional methods in the mining potential cooperation relationship, improve the potential mining effect. [Methods] The paper proposes the improved TFIDF algorithm and applies to the potential cooperation relationship mining based on the analysis of the flaw and the insufficiency of simple calculation method, minimum value calculation method and the traditional TFIDF algorithm. [Results] The simple calculation method and the minimum value calculation method are greatly influenced by authors productivity, traditional TFIDF algorithm result is difficult to achieve the conversion from potential cooperation relationship for practical cooperation, and improved TFIDF algorithm shows very prominent based on regarding the applying research methods of information science field in 19 kinds of journals of Library and Information Science in "Chinese Core Journal of Peking University Directory (2012 Edition)" in recent 5 years as sample data. [Limitations] The improved TFIDF algorithm does not consider the influence between author ranking orders of potential cooperation. [Conclusions] The results show that the improved TFIDF algorithm is more scientific, has more advantages and better practical value than other traditional methods, through comparing and evaluating four data mining results.
[Objective] A feature extraction method is proposed aiming to detect spams and improve recognition rate from regular product reviews in electronic commerce. [Methods] Based on the idea of quantitative evaluation, features are extracted comprehensively in terms of reviews' intrinsic characters such as the number of evaluation sentence, sentiment tendency, topic word and text structure. The number of evaluation sentence is the key feature to distinguish spams from regular product reviews using Part-Of-Speech (POS) path matching templates, and a custom dictionary is imported to improve recognition rate of detecting evaluation sentence. [Results] Experiment results show that the spam recognition precision can reach 97.96% and F-measure reach 88.48%. [Limitations] This method is mainly used to identify Chinese review spams, is not suitable for the English product reviews. [Conclusions] Review spams can be effectively and accurately detected by the proposed features. Furthermore, these features can also be applied to evaluate and rank the regular product reviews, and other related applications.