Data Analysis and Knowledge Discovery

Select

An Experimental Study on Collaborative Information Seeking Behavior in Community Environment

Wu Dan, Xiang Xue

New Technology of Library and Information Service. 2014, 30(12): 1-9. https://doi.org/10.11925/infotech.1003-3513.2014.12.01

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to explore how the community type and task difficulty influence the collaborative information seeking behavior. [Methods] Data collection methods include questionnaires, Web log analysis and semi-structured interview. Data analysis methods include statistical analysis and content analysis. [Results] In terms of the community type, compared with non-community, the community has more behaviors of Recommend, it does not depend on the retrieval system when inputting queries, its collaborative approaches are more diverse, it has advantages on the awareness of task knowledge. However, the profession community has no obvious differences with the interest community on any aspect. In terms of the task difficulty, it only influences the methods of issuing queries and confidence of completing the tasks. [Limitations] This paper researches on the real community. It should investigate virtual community and more types of communities in the future. [Conclusions] The community type and task difficulty have different degrees of influence on the different aspects of collaborative information seeking behavior. The factor of community has more influences than the factor of task. And the differences between the community and non-community are obvious, however, the differences between the profession community and the interest community are not significant.

Select

Analysis for the Search Behavior of Web Users

Chen Yong, Li Honglian, Lv Xueqiang

New Technology of Library and Information Service. 2014, 30(12): 10-17. https://doi.org/10.11925/infotech.1003-3513.2014.12.02

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] To count and analyze for the data of Web users behavior, provide the basis for further improving the performance of search engines. [Methods] Analyze the characteristics of users' query and the user's query results that the search engine returns. To introduce the concept of entropy, quantify the behavior of interaction process of users and search engines. [Results] In all user records, no spaces queries accounted for 93.66%, 83.59% of the users use a longer query, user's certainty click reaches 64.26%, and 71.26% of the users view the first three return results. [Limitations] The size of the user's query may affect the result of the analysis in a certain extent. [Conclusions] The results show that the user's click on the reliability is closely related to the certainty, search engine has some defects on positioning of the long query words.

Select

Comparisons of Common Altmetrics Tools

Wang Rui, Hu Wenjing, Guo Wei

New Technology of Library and Information Service. 2014, 30(12): 18-26. https://doi.org/10.11925/infotech.1003-3513.2014.12.03

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] Compare with four popular Altmetrics tools and discuss their advantages and weaknesses. [Context] Altmetrics is an emerging metric for evaluating the impact of academic publications. It is based on interactions on social network. Altmetrics tool can track large-scale activities around academic products in online tools and social networks. [Methods] This paper analyzes the features of Altmetrics tools, including the evaluation strategy, data sources, indexing, data collection and publication. [Results] Four existing Altmetrics tools aim at different user groups. These tools differ in supporting the evaluation object, choosing source data and indexing. Users may select the appropriate tools based on their needs. [Conclusions] This study can help researchers to understand existing Altmetrics tools, and provide some guidances for using Altmetrics.

Select

Research on Ontology-based Cloud Services Semantic Retrieval System

Tang Shouli, Xu Baoxiang

New Technology of Library and Information Service. 2014, 30(12): 27-35. https://doi.org/10.11925/infotech.1003-3513.2014.12.04

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] As the number of available cloud services increases exponentially, the problem of cloud service discovery and selection arises. [Methods] Semantic retrieval technology in use of information retrieval, semantic analysis and information fusion can improve retrieval efficiency. Combined with Ontology technology can ensure search processes accuracy and consistency, and realize cloud service discovery and selection. [Results] This paper can semantically represent and semantically annotate cloud services. According to extracting semantically annotate terms, it applies vector value to create semantic indexing. Using semantic search engine calculate vector space value between query sentence and index data, and obtain documents similarity. [Limitations] Relevant algorithms involved in some semantic retrieval system are still in development. This paper researches semantic retrieval system as a whole, every module just applies these basic algorithms, algorithm improvement is not involved. [Conclusions] Empirical research proves Ontology technology applied in semantic retrieval system achieves good effects. Especially it is suitable for retrieval of unstructured information, when changes between Ontology and semantic need to keep consistency.

Select

Automatic Acquisition of Domain Parallel Corpora from Internet

Shao Jian, Zhang Chengzhi

New Technology of Library and Information Service. 2014, 30(12): 36-43. https://doi.org/10.11925/infotech.1003-3513.2014.12.05

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] To automatically obtain domain parallel corpora via classified bilingual corpora and sentence alignment. [Methods] Classify bilingual corpora based on text classification technology, use sentence alignment tool to align classified bilingual corpus based on length information of bilingual sentence and bilingual dictionary. This paper uses artificial aligned bilingual corpora to calculate length parameters. [Results] The results obtain 95.45% rate of sentence aligned correctly. The length mean is 1.7777 and variance is 1.2640. [Limitations] Due to the extent of the initial alignment of bilingual corpus is satisfied, so the result of alignment is not universally representative. [Conclusions] The result proves the method presented in this paper is effective, so this method can acquire high quality domain parallel corpora.

Select

Using Dependency Parsing Pattern to Extract Product Feature Tags

Nie Hui, Du Jiazhong

New Technology of Library and Information Service. 2014, 30(12): 44-50. https://doi.org/10.11925/infotech.1003-3513.2014.12.06

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] The method of association recognition for features and the relevant opinions is investigated in order to extract features tags and summarize users' generated online reviews, which is helpful for Web users to access useful information effectively, especially when online information normally varies greatly in quality. [Methods] The dependency parsing is employed to obtain the extraction templates, the template library is constructed after the processes of classifying, filtering and generalization. In terms of the templates and the corresponding external lexicons, feature tags are extracted and sifted out according to the filtering rules. [Results] The experiment results indicate that the method outperforms the similar one which is only based on templates filtration or generalization. The performance of F-measure achieves 56.5% and the accuracy could reach 65% by adjusting the corresponding parameters. [Limitations] The filtering strategy for improving the quality of review data is not conducted in the research. Building feature lexicon automatically and adding more syntactic relations need to consider to extend the library of templates and make improvement of extraction accuracy further. [Conclusions] The better performance can be achieved by finding the most appropriate values for the template-specific parameters, such as the length of template, or by adopting an effective filtering window strategy to detect the noise templates.

Select

The Comparative Analysis of Natural Language Processing Research at Home and Abroad Based on Knowledge Mapping

Qiu Junping, Fang Guoping

New Technology of Library and Information Service. 2014, 30(12): 51-61. https://doi.org/10.11925/infotech.1003-3513.2014.12.07

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper makes a comparative analysis to the development of natural language processing at home and abroad from multi-angle. [Methods] The literatures are from CNKI (5 582), Web of Science (10 348) and major international conferences on natural language processing (5 573). Use word frequency statistics and co-occurrence analysis as main research methods and use knowledge maps to show statistical results. [Results] The result shows that the study of natural language processing performance at home and abroad has a great similarity. Their research focuses on the domains of information extraction, artificial intelligence, information retrieval, machine translation, machine learning and so on. [Limitations] There are some limitations in this paper, such as the choice of subject term, the error resulting from the subjectivity to data cleaning. [Conclusions] According to the results, several recommendations are made on the development of natural language processing.

Select

Research and Application of Science Intelligence Analysis Integrated Services Architecture Using OSGi

Qian Li, Zhang Xiaolin, Li Chunwang, Wang Xiaomei, Yang Liying, Chen Ting, Zhang Zhixiong

New Technology of Library and Information Service. 2014, 30(12): 62-70. https://doi.org/10.11925/infotech.1003-3513.2014.12.08

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] To effectively integrate multiple information analysis tools and processes. [Context] Complicate information analysis often uses multiple, distributed, and heterogeneous analytic tools and data resources, so reliable and flexible mechanisms are needed for their seamless integration. [Methods] An analysis service framework based on OSGi (Open Service Gateway initiative) technology is designed with a plug-in service model and a plug-in service configuration technique, together with a plug-in service integration model. [Results] An international Research & Development (R&D) monitoring service platform is realized with such a model capable of modularly and dynamically managing multiple analytic tools. [Conclusions] The OSGi-based integration framework of information analytic services is capable of flexibly configuring information analysis services, and provides support to wrapping and integrating third-party analytic tools and algorithms.

Select

The Automatic Identification of Chinese Names in Query Logs

Zeng Zhen, Lv Xueqiang, Li Zhuo

New Technology of Library and Information Service. 2014, 30(12): 71-77. https://doi.org/10.11925/infotech.1003-3513.2014.12.09

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] Many names exist in query logs, and the name recognition can improve the performance of the search engine. [Methods] This paper presents a method that identifies the names in query logs. Basing on the internal structure characters of the name and its context information, extract seven features, choose suitable feature template, and apply the conditional random field model to preliminary identify of the person's name. According to the characteristics of the query string that CRFs cannot mark with the names, design Bayesian conditional probability formula to select more names. [Results] Experiments are done in Sogou Web query logs, the precision of name recognition reaches 95%, and the F-measure of the machine learning method is 91%. [Limitations] A certain amount of manual annotation training corpus is required. [Conclusions] The results validate the effectiveness of this name recognition method, and prove that this method has positive impact on name recognition.

Select

Evolution Model of Microblog Public Opinion Considering the Influence of Next-nearest Neighbors

Yang Liu, Zhu Hengmin, Ma Jing

New Technology of Library and Information Service. 2014, 30(12): 78-84. https://doi.org/10.11925/infotech.1003-3513.2014.12.10

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] Study on an evolution model of microblog public opinion considering the influence of next-nearest neighbors. [Methods] Use the directed BA scale-free network to simulate the network formed by users' attention relationship in microblog, and design the iterative rule, in which the nearest and the next-nearest neighbors influence the microblog view evolution in combination. Then simulate the view evolution of microblog public opinion with or without the influence of next-nearest neighbors, and the opinion evolution under different review probabilities and different forwarding probabilities. [Results] Considering the influence of next-nearest neighbors on view evolution, it would take less time to reach an agreement for microblog users. The experiments show that the behaviour of review increases the relaxation time of opinion evolution, but the behaviour of forwarding shortens it. [Limitations] The model highlights the effect of next-nearest neighbors on the opinion evolution of microblog public opinion and does not account for other factors such as the social environment. [Conclusions] The evolution model of microblog public opinion considering the influence of next-nearest neighbors can characterize the opinion evolution of microblog public opinion in a more realistic way. The simulation results show that microblog plays a role in aggregating the public opinion in a short time period and easily causes public pressure.

Select

Situation of Library Report Service by WeChat

Li Bangqun

New Technology of Library and Information Service. 2014, 30(12): 85-91. https://doi.org/10.11925/infotech.1003-3513.2014.12.11

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] To report the situation of library for readers and librarian by WeChat in real time. [Context] It is difficult for readers to understand the situation of library before entering library, the resources maintenance tasks of librarian is onerous increasingly. WeChat is deeply given much attention by readers, and becomes the important channels for situation of library report. [Methods] Using WeChat platform interfaces and technology of simulating HTTP request, identity authentication of readers and regularly sending messages, and connect all kinds of service system and platform of library, then send it to readers and administrator. [Results] WeChat public account developed in this paper provides the situation of library, the detail pages of data statistics and the usability status of network resource for readers. [Conclusions] Situation of library report service by WeChat is convenient to users, increases efficiency and quality of library manage and service.

Select

Research on Correspondence Between Keyword and Chinese Library Classification Based on Latent Semantic Analysis

Xia Dong, Xiao Xiaodan, Li Guolei, Chen Xianlai

New Technology of Library and Information Service. 2014, 30(12): 92-96. https://doi.org/10.11925/infotech.1003-3513.2014.12.12

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper attempts to explore the relationship between keyword and Chinese Library Classification for building a foundation for the comparison system. [Context] To help the authors unfamiliar with CLC make indexing and to assist users to complete more precise retrieval through combining keywords with related CLC. [Methods] Through decompositing constructed Keywords-CLC matrix with SVD (Singular Value Decomposition), A three-dimensional semantic coordinates between keywords and CLC is obtained. Then, according to vector representation of a query and the CLC coordinates, the correspondence is calculated and sorted in descending order. [Results] Comparing with single, three or more keywords, the correspondence accuracy between two keywords and CLC achieved better results. Among 100 phrases containing two keywords, 91 phrases are able to determine at least one associated CLC, the accuracy rate reaches 91%. [Conclusions] The correspondence effect between the phrases of two key words and single CLC is positive and lays a good foundation for the construction of the comparison system.

Select

Research on WeChat and Library Business and Application System Integration

Li Dan, Li Juan

New Technology of Library and Information Service. 2014, 30(12): 97-104. https://doi.org/10.11925/infotech.1003-3513.2014.12.13

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] To address the problems of integrating WeChat public platform with library business systems and mobile library. [Context] With limited number of audience and lack of openness for single library business services systems, traditional information push service can not satisfy demands of all readers. [Methods] Make use of WeChat public platform API combined with Java program, realizing the seamless data integration in WeChat, library business systems and mobile library. [Results] Successfully realize graphic push, news browsing, search integration, readers identification binding, readers information query, real-time reference in WeChat service platform. [Conclusions] Integration application riches WeChat service platform resources and function, and also improves readers visits.

Please choose a citation manager

Content to export

25 December 2014, Volume 30 Issue 12

模态框（Modal）标题

Please choose a citation manager

Content to export

25 December 2014, Volume 30 Issue 12