Data Analysis and Knowledge Discovery

Select

M-library: From Devices to People——A Comprehensive Review of the 5th International M-libraries Conference

Yao Fei, Jiang Airong

New Technology of Library and Information Service. 2015, 31(1): 1-8. https://doi.org/10.11925/infotech.1003-3513.2015.01.01

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper reviews the 5th International M-libraries Conference, presents and discusses the current situation and the development trend of mobile libraries. [Coverage] Take the 40 presentations of the conference as the main research objects. [Methods] Focusing on "M-libraries: From Devices to People", analiyze and discuss challenges and strategies involved in embracing mobile innovation for libraries, practice for the use of mobile technologies in libraries, wearable devices and augmented reality, the tight coupling of mobile technologies with teaching and researching, China development of mobile technologies, mobile technologies enhancing information access for all and so on in depth. [Results] Stress the importance of mobile strategies, the difference of attributes to mobile libraries, the imbalance of development, and summarize the main research progresses and existing problems. [Limitations] Based on the presentations of the conference, it may not cover the wider practice cases and research results. [Conclusions] Libraries are supposed to face and anticipate the construction of the user-centred m-libraries, providing a ubiquitous service everywhere.

Select

Purdue University Research Repository and Scientific Data Management Services Based on PURR

Wang Hui, Michael Witt, Dou Tianfang

New Technology of Library and Information Service. 2015, 31(1): 9-16. https://doi.org/10.11925/infotech.1003-3513.2015.01.02

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] Conduct a comprehensive analysis of the case of Purdue University Research Repository (PURR). [Methods] This article analyzes the case from many aspects, including the construction background of PURR platform, preservation policy, preservation strategies, workflow, reference standards, development platform, metadata, DataCite, backup, working mechanism and the campus scientific data management services supported by PURR. [Results] Development of PURR is using many standards to support the data management services, but it still need to improve in user experience, metadata support and other aspects. [Conclusions] As a pioneer of data management tools, experience about PURR gained in the development and promotion provides a valuable reference for data practice in China.

Select

Research on the Themes Dynamic Evolutions of the Patent Analysis Papers from WoS Database

Zhang Yun, Hua Weina, Yuan Shunbo, Su Baoduo

New Technology of Library and Information Service. 2015, 31(1): 17-23. https://doi.org/10.11925/infotech.1003-3513.2015.01.03

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] SciMAT is used to determine the themes dynamic evolutions in the specific area. [Methods] The records on patent analysis from WoS databases are analyzed and the visual maps based on SciMAT are drawn and analyzed to explore the evolutions of patent analysis. [Results] The most important topics in patent analysis from WoS databases mainly include knowledge management, patent analysis technologies and how to promote the development of enterprises and industries by patents. The new hot topics include intellectual property, knowledge transfer and how to judge the evolution trends. [Conclusions] SciMAT can be used to effectively reveal the themes evolutions from different views by combining indicators reflecting quality characteristics and a varity of maps.

Select

Hierarchical Filtering Method for Patent Term Extraction

Hou Ting, Lv Xueqiang, Li Zhuo

New Technology of Library and Information Service. 2015, 31(1): 24-30. https://doi.org/10.11925/infotech.1003-3513.2015.01.04

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] As the core content and the important part of patent documents, the extraction task of patent terms is regarded as the basis of research works on the patent. [Methods] A hierarchical filtering method is presented to extract terms. Based on the suffix array, this method takes repeated strings as the candidate words and divides invalid strings into three classes, including the broken string, the redundant string and the common word, according to their features in the candidate set. Besides, by removing the above invalid strings, patent terms are obtained. The authors propose an independence calculation method, a relative activity calculation method and a word segmentation error correction method to filter broken strings and redundant strings respectively. [Results] Experimental results show that the proposed method has a good effect on Chinese patent term extraction. The average precision is 90.54% and the average recall is 87.33%. [Limitations] The method is just suitable for repeated strings and cannot identify the term which frequency number is 1. [Conclusions] The method is effective in patent term extraction.

Select

Authorship Identification in English Translations of Chinese Classics

Qi Ruihua, Huo Yuehong, Guo Xu, Liu Caihong

New Technology of Library and Information Service. 2015, 31(1): 31-37. https://doi.org/10.11925/infotech.1003-3513.2015.01.05

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper analyzes the key issues of the authorship indentification in English translations of Chinese classics and proposes the effective way to identify the authorship of incomplete data. [Methods] Based on the stylistic features composed of vocabulary level, sentence level and discourse level, the stylistic feature vector space model for poetry translation texts is established. From the angle of the characteristics of imbalance poetry corpus, the Weighted Naïve Credal Classifier is proposed. [Results] The output of the contrast experiments verifies the effectiveness of the Weighted Naïve Credal Classifier. [Limitations] The size of the data set and the number of the authors should be further expanded, so that the efficiency and the accuracy of authorship identification on large data sets can be improved. [Conclusions] The method proposed in this paper has good accuracy and applicability on poetry translation collections.

Select

Semi-supervised Micro-blog Sentiment Classification Method Combining Active Learning and Co-training

Bi Qiumin, Li Ming, Zeng Zhiyong

New Technology of Library and Information Service. 2015, 31(1): 38-44. https://doi.org/10.11925/infotech.1003-3513.2015.01.06

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] Aimed at less labeled data and more unlabeled samples in micro-blog sentiment classification, a novel method is proposed. [Methods] Active learning is introduced into co-training, the method selects the most valuable ones from low confidence samples, then labels and adds them into training dataset, trains classifiers again. [Results] Experimental results show that classifiers have better performance in this way, and the accuracy is improved obviously. Especially when labeled data reaches 40%, the accuracy increases by about 5%. [Limitations] In the collaborative process, random feature subspace generation can not build two strong classifiers, so hypothesis are not fulfilled. [Conclusions] This method solves the defects of co-training after introducing active learning; the performance and accuracy of classifiers are enhanced.

Select

Collaborative Filtering Recommendation Model Based on Rough User Clustering

Wang Xiaoyun, Qian Lu, Huang Shiyou

New Technology of Library and Information Service. 2015, 31(1): 45-51. https://doi.org/10.11925/infotech.1003-3513.2015.01.07

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] In order to improve the quality of recommendation, rough set is introduced into collaborative filtering based on user clustering. [Methods] This paper proposes a collaborative filtering recommendation model based on rough user clustering. When off-line, it clusters all users by rough K-means user clustering algorithm, which assigns user to upper or lower approximation based on similarity and thus generates his initial neighbor. When on-line, the model starts searching the nearest neighbor from the target user's initial neighbor, forecasts his ratings and makes recommendation. [Results] Experimental results show that the proposed model decreases the Mean Absolute Error (MAE) about 14% when compared with traditional and item-based collaborative filtering, and decreases MAE about 10% when compared with collaborative filtering based on user clustering. [Limitations] When considering the importance of upper and lower approximation to adjusting the centroid of cluster, this paper ignores the impact of the number of user clusters and the threshold of the number of nearest neighbors. [Conclusions] This model can effectively improve recommendation accuracy, and has high feasibility and practical significance.

Select

Research on Discovering Micro-blog User Interests

Shi Weijie, Xu Yabin

New Technology of Library and Information Service. 2015, 31(1): 52-58. https://doi.org/10.11925/infotech.1003-3513.2015.01.08

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] Discovering the micro-blog user interests plays an important role in the personalized recommendation of micro-blog social network to improve users' satisfaction. [Methods] In this paper, apart from the data mining from the user's own micro-blog, analyze the data of the micro-blogs that followed by this user, as well as the social correlation among them. By computing the similarity between their micro-blogs and intimacy, uncover the user interests further. Also combine the results coming from the two aforementioned aspects to get the interest set of users. [Results] This paper experiments on the dataset gained from Sina Micro-blog, and the precision rate and recall rate rise both more than 15% compared with the traditional method. [Limitations] The stop words are not full in the process of data preprocessing, because of not realize the automatic learning the list of stop words. And needs manually tagging user interest set to calculate the precision rate and recall rate. [Conclusions] The experimental results show that the method is better than the traditional method, and it's more effective and accurate to discover user interests.

Select

Friend Recommendation in Social Network

Wu Hao, Liu Dongsu

New Technology of Library and Information Service. 2015, 31(1): 59-65. https://doi.org/10.11925/infotech.1003-3513.2015.01.09

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] Make use of the friends and historical behavior of users in social network, to recommend potential friends for the target users. [Methods] The proportion of common friends and the proportion of interaction are used as indicators to measure the closeness of the relationship in a social network graph. The relationship between friends is scored according to sociality interest and interest similarity, and the Top-k users with the highest scores are recommended to the target users. [Results] Experimental results show that the precision rate and recall rate of this method are improved significantly in comparison with traditional methods. [Limitations] Abnormal interaction without identification and treatment, may affect the accuracy of the recommendation results. [Conclusions] Considering more factors, including the proportion of interaction, the improved friend recommendation method has a better effect than traditional single factor method.

Select

Study on Automatic Classification of Patents Oriented to TRIZ

Hu Zhengyin, Fang Shu, Wen Yi, Zhang Xian, Liang Tian

New Technology of Library and Information Service. 2015, 31(1): 66-74. https://doi.org/10.11925/infotech.1003-3513.2015.01.10

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes an approach to automatically classify patents oriented to TRIZ applications based on a personalized classification system. [Methods] A personalized classification system is constructed in micro-macro-meso levels using topic model. Then, an appropriate feature and classifier are chosen to preliminarily classify patents. The classifier is optimized by smoothing unbalance data and reducing features dimensions. [Results] This approach implements semi-automatically constructing a personalized classification and automatically classifying patents oriented to TRIZ applications. In medium data size, this approach can classify patents with F-measure value of 90.2%. [Limitations] This approach is not available in small size data set and not verified in big size data set. [Conclusions] This paper can classify patents oriented to TRIZ applications in medium data size.

Select

Application of DROID About Format Identification in Long-term Preservation System

Wang Yuju, Wu Zhenxin, Kong Beibei, Fu Honghu

New Technology of Library and Information Service. 2015, 31(1): 75-81. https://doi.org/10.11925/infotech.1003-3513.2015.01.11

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] Integrate open source file-format identification tool into Digital Preservation System (DPS) to get complex object format information. [Context] Based on the existing open source tools, to meet the practical requirements, the DPS needs choose appropriate tools for application integration. [Methods] Analyze and compare several open source file-format identification tools. According to the practical requirements, DROID has been chosen for the DPS. At the same time to meet the efficiency requirements of DPS, an idea of choosing DROID batch format identification of complex objects is proposed. [Results] Batch format processing module which is integrated with DROID is utilized to complete format identification of complex objects and technical metadata extraction. [Conclusions] DROID is an excellent open source tool, of which the automatic batch processing can meet the requirements of DPS.

Select

Research and Implementation of Textual Clustering in Distributed Environment

Zhao Huaming

New Technology of Library and Information Service. 2015, 31(1): 82-88. https://doi.org/10.11925/infotech.1003-3513.2015.01.12

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] To implement the textual clustering and classification in distributed environment through open-source tools. [Methods] According to the convergence of words in masses of text, this paper classifies texts based on word-clustering, including text preprocess by open-source tokenizer, cluster analysis by Mahout, classifying the test text by computing the similarity between the text and word-cluster. [Results] The textual clustering based on word-clustering in distributed environment effectively solves the bottleneck of word-clustering of massive texts. The tested result of word-clustering is ideal while the number of text training set exceeds 100 and the iterative convergence threshold is 0.01. [Limitations] The data type is limited in the field of news and the other field-based word-clustering also needs further test, optimization and adjustment. [Conclusions] This study describes the build process and key steps of the textual clustering and classification in distributed environment to help readers with in-depth understood.

Select

A Duplicate Removal Algorithm of Cross-database Search Based on Sci-tech Novelty Retrieval

Hao Hui

New Technology of Library and Information Service. 2015, 31(1): 89-95. https://doi.org/10.11925/infotech.1003-3513.2015.01.13

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] Remove the data redundancy of cross-database searching in sci-tech novelty retrieval and improve the retrieval efficiency. [Methods] Choose thesis names, serial titles, publication dates and first authors of search records from different databases and build the character strings of search records by modifying comparison algorithm related to I-Match as the evidence of duplicate removal. [Results] The duplicate removal algorithm can improve retrieval effeciency by analyzing and duplicating the retrieval results from different databases. The experient suggests the precision of algorithm is superior, while the recall of the algorithm could be improved by modifying database records. [Limitations] The treatment effect depends on four characters extracted from database search records, different feature extraction model of search records needed to be customized according to different thesis databases due to the search result diffenrence. [Conclusions] The experiment test suggests the algorithm has a decent precision of duplicate removal and treatment efficency, which accords with the requirement of sci-tech retreival.

Select

Construction and Research of Library WeChat Public Platform

Luo Tao

New Technology of Library and Information Service. 2015, 31(1): 96-100. https://doi.org/10.11925/infotech.1003-3513.2015.01.14

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] Through the construction of WeChat public platform, the readers can easily access the library information and service, which can improve the library's attention. [Context] With the expansion of public influences of WeChat platform, the library using it as mobile services becomes a trend. [Methods] The local server uses the message interface of WeChat public platform to get message from readers, then validates and classifies the message, finally the results return to readers. [Results] By sending specific format of the message to the WeChat public platform through WeChat, readers can access the library information, personal borrowing book information, FAQ, information of catalogue and literature. [Conclusions] As a new mode of library mobile service, this application attracts the attention of readers and strengthens the communication between readers and the library.

Please choose a citation manager

Content to export

25 January 2015, Volume 31 Issue 1

模态框（Modal）标题

Please choose a citation manager

Content to export

25 January 2015, Volume 31 Issue 1