Current Issue
    , Volume 31 Issue 2 Previous Issue    Next Issue
    For Selected: View Abstracts Toggle Thumbnails
    Implementation of the Framework for Converting Web-data to RDF (W2R)
    Chen Tao, Zhang Yongjuan, Chen Heng
    2015, 31 (2): 1-6.  DOI: 10.11925/infotech.1003-3513.2015.02.01
    Abstract   HTML   PDF (698KB) ( 160 )

    [Objective] The article aims at building W2R framework for converting Web data to RDF format. [Methods] Build the bottom infrastructure of the framework with W2R vocabulary, and convert Web data to RDF format with mapping file which is consisted of system Ontology and Web page elements extracted in XPath syntax. Furthermore, use Virtuoso database as the persistent storage of RDF data. [Results] With the W2R framework, it is convenient for converting Web data to RDF format, merging data in different resources, storing them in named graphs and implementing simple inferences without changing any source code. [Limitations] The system Ontology is made up of public namespaces that describe the bibliographies currently. RDF data is only stored in Virtuoso database. [Conclusions] Through the W2R framework, this paper provides a new way of generating the standardized RDF data for semantic network and linked data applications.

    References | Related Articles | Metrics
    Research on Semantic Mining for Large-scale Oracle Bone Inscriptions Foundation Data
    Xiong Jing, Gao Feng, Wu Qinxia
    2015, 31 (2): 7-14.  DOI: 10.11925/infotech.1003-3513.2015.02.02
    Abstract   HTML   PDF (610KB) ( 123 )

    [Objective] Find the semantic relations among large-scale Oracle Bone Inscription (OBI) data in order to provide semantic analysis function for OBI research. [Methods] Based on text mining, combined with the semantic Web technology, implement semantic search on the data set of RDF-based entities and their relationships. And using Ontology relationships and Ontology reasoning to extract explicit or implicit semantic relationships among RDF objects. [Results] Experimental results show that the F-Measure can reach 74.49% on OBI literature semantic mining and 70.61% on OBI semantic mining, which satisfy the need of OBI information processing. [Limitations] Semantic mining is based on three different Ontologies instead of an integrated one. [Conclusions] RDF can provide a structured semantic specification description and the LarKC system is suitable for large-scale OBI semantic processing.

    References | Related Articles | Metrics
    Feature Analysis and Automatic Identification of Query Specificity
    Tang Xiangbin, Lu Wei, Zhang Xiaojuan, Huang Shihao
    2015, 31 (2): 15-23.  DOI: 10.11925/infotech.1003-3513.2015.02.03
    Abstract   HTML   PDF (637KB) ( 85 )

    [Objective] This paper constructs a human-annotated collection on the basis of Sogou query logs, aims at feature analysis and automatic identification of query specificity, as well as evaluates and compares the identifing results. [Methods] The queries' basic features and content features are selected and analyzed. And then the decision tree, SVM and Naive Bayes classifiers are built and trained to achieve the automatic query specificity classification. [Results] Using the features mentioned above, an effective query specificty identification is obtained. Finally, the macro average F-measures of the identification effects are all above 0.8. [Limitations] Users' clickthrough information is not selected during the feature selection, and the ignorance of the conditional independence assumption of the Naive Bayes classifier in this particular experiment should be further verified. [Conclusions] The queries' basic features and content features, by themselves, can well distinguish broad, medium, and specific queries.

    References | Related Articles | Metrics
    Acquisition of Synonym from Patent Query Logs
    Gu Wei, Li Chaofan, Wang Hongjun, Xiao Shibin, Shi Shuicai
    2015, 31 (2): 24-30.  DOI: 10.11925/infotech.1003-3513.2015.02.04
    Abstract   HTML   PDF (511KB) ( 78 )

    [Objective] This paper researches on the acquisition of synonym from patent query logs. [Methods] Propose a method based on the analysis of user behavior. Use logic expression parser to generate candidate synonym pairs, combine features such as pinyin, Chinese character pattern, abbreviation, traditional Chinese and simplified style to generate a synonym dictionary. [Results] Experiment results show that precision rate reaches 74.5%. This method generates 17 495 synonym pairs and the scale of dictionary exceeds some existing methods. [Limitations] This method is feasible for library and information retrieval with complex expressions. [Conclusions] This research provides a certain significant reference for log-based knowledge acquisition.

    References | Related Articles | Metrics
    Short-text Classification Based on HowNet and Domain Keyword Set Extension
    Li Xiangdong, Cao Huan, Ding Cong, Huang Li
    2015, 31 (2): 31-38.  DOI: 10.11925/infotech.1003-3513.2015.02.05
    Abstract   HTML   PDF (736KB) ( 117 )

    [Objective] This paper aims to implement characteristic extension of short-text and improve short-text classification performance. [Methods] Extract the high frequency words and topic core words of each class of the training set as domain keyword set based on two different feature granularity, which is word and potential topic, and derive the topic probability distribution of the testing text using LDA model, while some topic probability is greater than a certain threshold, extend the keywords of the topic into the testing text. Calculate the sematic similarity of the testing text and the domain keyword set of each class by using HowNet. [Results] Compared with the short-text classification method based on LDA model, the proposed classification algorithm in Fudan corpora, Sogou corpus and the Micro-blog corpus average increase by 4.9%, 5.9% and 4.2% on Macro F1, on the Micro F1 average increased by 4.6%, 6.2% and 4.6%. Compared with the short-text classification method based on VSM model, the method can increase F-measure more than 13% in the all three corpus. And experimental proof in combination with characteristics of high frequency words and subject core words in the field of extension method classification performance is better than the extension method that only using high frequency words or subject core words. [Limitations] There are many words not included by HowNet, and these words cannot use HowNet to calculate similarity. It will affect classification results. [Conclusions] The method of this paper can effectively improve the short-text classification performance.

    References | Related Articles | Metrics
    Research on Chinese Text Categorization Based on Semantic Similarity of HowNet
    Liu Huailiang, Du Kun, Qin Chunxiu
    2015, 31 (2): 39-45.  DOI: 10.11925/infotech.1003-3513.2015.02.06
    Abstract   HTML   PDF (500KB) ( 104 )

    [Objective] This is an algorithm for improving the classification precision of Chinese text classification, which can calculate the similarity between Chinese texts more accurately. [Methods] With the TF-IDF algorithm calculating item weight and HowNet analyzing the semantic relationships between lexical items, this paper proposes a text similarity weighting algorithm based on HowNet semantics similarity, and makes an experiment on its Chinese text classification. [Results] The experiment resualts show that the proposed method can improve the text categorization performance comparing with the traditional ones. [Limitations] This algorithm is quite high in its time complexity, and its speed of text classification needs to be improved. [Conclusions] It is proved to be an effective algorithm for enhancing the classification accuracy of Chinese text by analyzing the semantic relationships between feature items.

    References | Related Articles | Metrics
    Parallel Implementing Bursty Events Detection Using MapReduce
    Zhuo Keqiu, Yu Wei, Su Xinning
    2015, 31 (2): 46-54.  DOI: 10.11925/infotech.1003-3513.2015.02.07
    Abstract   HTML   PDF (2259KB) ( 122 )

    [Objective] In big data environment, this paper aims to accurately and quickly detect bursty events from the text stream. [Methods] Using Kleinberg bursty detection and LDA topic model, the method is extended to MapReduce framework to achieve parallel corpus predisposed, parallel detection of bursty word, parallel filtration of bursty document and parallel extraction of topic. [Results] The results of simulation experiments on the news text stream show that precision reaches 87.50%, recall reaches 77.78%, and F-measure reaches 82.35% with the parallel method to detect bursty events in specific areas. [Limitations] The MapReduce parallel method is difficult to achieve Online and Real-time detection of bursty events with large-scale dynamic text stream. [Conclusions] Compared with the traditional serial detecting method of bursty events, the distributed parallel method not only guarantees the accuracy of detecting results, but also has a good scalability.

    References | Related Articles | Metrics
    Credibility Research on Chinese Online Customer Reviews
    Hao Mei, Yang Xiaoyuan
    2015, 31 (2): 55-63.  DOI: 10.11925/infotech.1003-3513.2015.02.08
    Abstract   HTML   PDF (530KB) ( 345 )

    [Objective] This paper proposes a review credibility sorting model in order to assist customers to make the best shopping decision. [Methods] The review credibility indexes are adjusted and optimized on the Visual Studio application development platform. Through questionnaire investigation to obtain the indexes score, credibility sorting model is constructed by Fuzzy Analytic Hierarchy Process. [Results] The experiment resualts show that compared with the Web original reviews, the new reviews sorting method is more scientific and reasonable. Those reviews without “helpful vote” are not necessarily unreliable, so the “helpful vote” is important to review credibility, but not the only factor which determines the credibility. [Limitations] People have different attitudes on factor's weight, so the future work should attach more importance to the expertise of rating factors. [Conclusions] The sorting model in this paper synthesizes several indexes and adjustment methods, thus it provides a new credibility sorting method which considering objective information and semantic features for the Chinese online customer reviews.

    References | Related Articles | Metrics
    A Centralized Identity Authentication in the Cloud Service of Public Culture Digital Resources
    Gu Jiawei, Wang Shengqing, Zhao Danqun, Chen Wenguang
    2015, 31 (2): 64-71.  DOI: 10.11925/infotech.1003-3513.2015.02.09
    Abstract   HTML   PDF (498KB) ( 67 )

    [Objective] A centralized identity authentication model is raised to solve user identity management problem. [Context] In the National Public Culture Digital Platform, the identity authenticcation needs to consider the characters of the topological structure of the platform and the autonomous of the users from member libraries. [Methods] This model uses an implicit or explicit global identity and mapping relations of automomous identity in order to unify the autonomous identity of the member libraries. [Results] By this model, users don't need to remember multiple identities, member libraries can share users information and realize user-centered. New member libraries can join easily. [Conclusions] This model has certain feasibility, but it still has some problems such as the efficiency, identity disambiguation and security. It should be test and adjust when being implemented.

    References | Related Articles | Metrics
    Practice on Institutional Repository of Chinese Academy of Agricultural Sciences
    Zhao Ruixue, Du Ruopeng
    2015, 31 (2): 72-77.  DOI: 10.11925/infotech.1003-3513.2015.02.10
    Abstract   HTML   PDF (2225KB) ( 79 )

    [Objective] The goal of the construction of Institutional Repositories of Chinese Academy of Agricultural Sciences (CAAS-IR) is to promote the preservation and dissemination of digital assets utilization. [Context] With the rapid development of domestic and foreign IR construction and the open access movement, CAAS-IR will become the important knowledge infrastructure of Chinese Academy of Agricultural Sciences. [Methods] The CAAS-IR uses DSpace as the prototype system and is optimized by Java programming and application of Solr. [Results] CAAS-IR platform extends the functionality of faceted search, retrieval and statistical analysis and other functions that are based on frame of DSpace-core. [Conclusions] Practice on CAAS-IR promotes cognitive level of IR for the scientific research personnel and management of science and technology department of CAAS. The construction of IR involves many aspects such as technology, resources construction, management and service. The effective incentive mechanism and value-added service will help the implementation of IR.

    References | Related Articles | Metrics
    A Parallel Naive Bayesian Network Public Opinion Fast Classification Algorithm Based on Hadoop Platform
    Ma Bin, Yin Lifeng
    2015, 31 (2): 78-84.  DOI: 10.11925/infotech.1003-3513.2015.02.11
    Abstract   HTML   PDF (539KB) ( 120 )

    [Objective] A new Network Public Opinion (NPO) classification method based on parallel Naive Bayesian Classification Algorithm (NBCA) in Hadoop environment is proposed. [Context] The NPO are high-volume, high-distribution and high-variety information assets, thus the accurate and fast classification is difficult to achieve. [Methods] According to the distributed storage and parallel processing features of Hadoop platform, the NBCA is parallel encapsulated and the NPO documents are locally stored under HDFS frame and parallel classified in MapReduce process. [Results] The performance of MapReduce packaged parallel NBCA is testified and the results show that the execution efficiency of proposed algorithm improves 82% compared to centralized method and its classification accuracy rate arrives more than 85%. [Conclusions] The proposed algorithm can effectively improve the NPO classification efficiency and ability.

    References | Related Articles | Metrics
    Application of Location Mapping Technology in Book Positioning and Navigation
    Sun Wei, Hao Aiyu, Lv Qiang
    2015, 31 (2): 85-90.  DOI: 10.11925/infotech.1003-3513.2015.02.12
    Abstract   HTML   PDF (1054KB) ( 86 )

    [Objective] In order to improve the efficiency of finding books in library, this article provides a library book location and navigation system based on smart phone. [Context] Readers often use a low efficient way to find books in library and they need a new method for fast book positioning and navigation. [Methods] Set up a landmark system and create a mapping table between books call number and their locations, and users can search books and their location by mobile, the system provides a navigation path by HEAA algorithm. [Results] Readers can search books and find their location in half-time than before. [Conclusions] This system is better than others in low cost, easy deployment and convenience. It has good accuracy in location and navigation.

    References | Related Articles | Metrics
    Design and Implementation of Professional Digital Library Toolbar Combining Besttoolbar and JavaScript
    Liu Zexun
    2015, 31 (2): 91-96.  DOI: 10.11925/infotech.1003-3513.2015.02.13
    Abstract   HTML   PDF (2176KB) ( 63 )

    [Objective] This article designs a browser toolbar which can meet needs of professional digital library users. [Context] Based on the B/S mode digital library of Chengdu Aircraft Design & Research Institue, effectively use large collection of professional resources of digital library. [Methods] Use Besttoolbar as the developing tool, construct the basic structure of toolbar, and use JavaScript to achieve more functions. [Results] It designs an embedded IE browser toolbar to achieve functions such as wizard-style service, zoned word search, and online disciplines librarians consultation. [Conclusions] It enhances professional digital library user experience and simplifies users operation, effectively improves the utilization of information resources.

    References | Related Articles | Metrics
    Using Responsive Web Design to Build a Library Mobile Portal ——Taking Yunnan University Library as an Example
    Bi Jian, Liu Xiaoyan, Zhang Yu
    2015, 31 (2): 97-102.  DOI: 10.11925/infotech.1003-3513.2015.02.14
    Abstract   HTML   PDF (1001KB) ( 112 )

    [Objective] To solve the problem that library portals cannot automatically adapt to different devices such as the PC and mobile terminals. [Context] Except PC, access devices include smart phones and tablets with different resolutions and operating systems, that brings great challenges to library portal development. [Methods] Using responsive Web design to build Yunnan University Library mobile portal in Drupal platform, combined with HTML5, CSS3, JavaScript. [Results] Once developed, the site is able to automatically adapt to the normal use of the PC, tablets and smart phones. It plays well on the IE7+, Chrome5+ and Firefox3.6+, and automatically adapts to Apple, Samsung, Mi and other different mobile devices. [Conclusions] The official website with stable operation and low maintenance costs is popular, and has good scalability for future equipment.

    References | Related Articles | Metrics
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938