[Objective] Research and analyze the policy for establishing specification of Institutional Repository and Research Data Repository; explore related rights and obligations for recommending the data repository to librarians.[Methods] Summarize and refine policy elements through the literature review and Internet research.[Results] The results include rights and obligations of managers (establishing audit mechanism, making data identification standards, issuing regulations of spreading and using), rights of submitters (free storage, update metadata, set the embargo) and its obligations (ensuring reliable data source, abiding by the policy of data repository, avoiding the intellectual property rights disputes), rights and obligations of users (free of charge, follow the reference rules).[Limitations] Lack the policy research on Special Research Data Repository, thus the future study can establish a complete policy framework.[Conclusions] Establishing a complete policy of Research Data Repository, which can balance the interests of all parties and then promote the research data sharing.
[Objective] Solve the problems in the traditional collaborative filtering recommendation algorithm, such as sparse data and user's interests in different time being considered equally.[Methods] This paper proposes a collaborative filtering algorithm based on user's interest fuzzy clustering. In the algorithm, the model of user's interest consists of the stable interest and the current interest. Users are clustered by the fuzzy clustering according to the stable interest, then the nearest neighbours and the initial recommendation list can be obtained. The final recommendation list is generated by sorting the similarity between the each item of initial recommendation list and user current interest, on the basis of the initial recommendations. [Results] The Mean Absolute Error (MAE) of the proposed method is nearly 10% reduction verified on the MovieLens dataset, compared with the traditional method.[Limitations] All categories of projects are considered in the model of the user stable interest without special treatments, such as merge and delete.[Conclusions] The experiment result indicates that the recommendation accuracy of the advanced approach is more efficiency, compared with the traditional recommendation algorithm.
[Objective] This paper proposes a multi-strategy method for Word Sense Disambiguation (WSD) based on Wikipedia which makes full use of the latent knowledge in Wikipedia.[Methods] Design three indicators including category commonness, content relatedness and the importance of the word sense, make an entropy-based dynamic linear fusion of these three indicators, combined with re-disambiguation to choose the best sense of an ambiguous term in its context.[Results] Experimental result shows an average precision of 74.82%, therefore validating the feasibility and effectiveness of this method.[Limitations] The proposed method mainly aims at WSD in English with a setting of fine grained candidate senses, lacking certain generality to other languages.[Conclusions] This method provides more semantic knowledge and background information based on Wikipedia which enhance the precision of disambiguation tasks.
[Objective] This paper aims to calculate feature weights more accurately for the improvement of the accuracy of text similarity calculation. [Methods] The semantic association among features is considered to structure text complex networks and select features. An improved calculation method of feature weighting is proposed to carry out the Chinese text classification experiment with the definition of category correlation coefficient and the combination of the feature selection results. [Results] Experiment results show that the proposed Chinese text classification method works better in classification than the TFIDF algorithm. [Limitations] The parameters in the feature selection evaluation function need to be given. [Conclusions] Compared with the traditional TFIDF algorithm, the new algorithm is more accurate in the representation of feature weights.
[Objective] Extract hot topics from e-commerce microblog in social marketing.[Methods] This paper proposes an integrated model, EM-LDA (E-commerce Microblog-LDA) to extract hot topics from e-commerce microblog. The integrated model contains two submodels, that is, ET-LDA model and IT-LDA model. The former is to extract hot topics from those e-commerce microblog with Hashtag, and the latter is to extract hot topics from those e-commerce microblog without Hashtag.[Results] The standard LDA model and EM-LDA integrated model are both used to extract hot topics from e-commerce microblog text after the number of topics is determined. Compared with the standard LDA model, EM-LDA model extract hot topics more accurately and effectively, also can improve interpretability.[Limitations] ET-LDA model is not considered about the relationship between microblog contacts, that is, user feature is neglected. IT-LDA model does not concern how to deal with those e-commerce microblog both belong to conversation and retweet.[Conclusions] According to the special features of e-commerce microblog text, EM-LDA integrated model ameliorates the standard LDA model to improve the accuracy of hot topic extraction from e-commerce microblog.
[Objective] Aiming at collusive sales inflation fraud in e-commerce promotion, this paper presents a collusive product sales fraud detection method based on users' information search behavior.[Methods] Firstly, in order to describe users' information search behavior in online shopping, a model for user information search behavior with keywords and a similarity calculating method for users' information search behavior are proposed. Secondly, a suspicious fraud mining algorithm based on hierarchical clustering algorithm for inflation sales is proposed, which depends on the similarity between users' information search behavior. Finally, this paper proposes a method for detecting suspicious fraud based on statistical analysis, to identify inflating sales in sale record of illegal vendors.[Results] The experimental results show that the recall and precision of the method are 88.6% and 90.1% respectively based on the improved data set.[Limitations] The threshold value predetermined for judging whether the fraudulent behavior is “scalping” behavior is fixed.[Conclusions] The method is effective for the detection of collusive sales inflation fraud based on users' information search behavior template.
[Objective] This paper is aiming at discovering the topic of multimedia content such as images or videos in microblogs.[Context] The text content of multimedia microblogs is usually brief and the topic of such microblogs generally contains in its multimedia content such as images or videos, so the traditional text mining methods may not be applied to these cases.[Methods] Extend text space of the multimedia microblog through the use of hot comments. Then use LDA topic model to inference the classification and mine the topic features. Finally, express topic features of the multimedia mircoblog in the form of ‘topic tag -feature words'.[Results] Experiments by constructing the training set use 99 823 Sina microblogs collected by crawler tool set, and constructing the test set use 151 hot multimedia microblogs with all those comments. Results show that the classification directory built in this paper is complete, the topic tag infers with 88.6% accuracy, and the relevant feature word mining accuracy is 76.0%.[Conclusions] The experiment results show that the new algorithm can effectively and significantly discover topic features of multimedia microblogs.
[Objective] By building a dissemination model with a discussion of Internet public opinion, the paper studies the inner rule of the public opinion evolution. [Methods] Present a new dissemination model with a discussion of the mechanism named SIaIbR, and express the impact of media on public opinion with the concept of Enhanced Degree and Divergence. According to dynamics equations, the equilibrium point and stability of the model are proved.[Results] The result of simulation shows that relative to the Enhanced Degree, the Divergence has an even greater impact on the dissemination of Internet public opinion. When Divergence is lower than 0.5, the government has a great impact on putting down the public opinion.[Limitations] Without combining reality disseminate examples.[Conclusions] The results can help the government take measures when facing the problem of Internet public opinion propagation, and also provide some references for the further research on Internet public opinion.
[Objective] This paper is to analyze the similarity between the organization structure of the website category and the user's subjective cognition directly based on the view of regional differences, which can support the website personalization.[Methods] Combined with the mental model theory and Web log mining method, this paper uses the website log data to obtain the user's cognition, and uses the multidimensional scaling to analyze the user's mental models of expected website category hierarchy from different regions.[Results] It is verified that there are differences in the mental models of the user's from different regions based on a Chinese e-commerce website case.[Limitations] In this paper, the test data is relatively small, and the new method needs to be verified by the more data.[Conclusions] The users' mental models of expected website category hierarchy are different according different regions. We can set up a personalized category hierarchy for users of different regions, which can better meet their use habits and improve their customer satisfactions.
[Objective] The research investigates whether user readership data in Mendeley is reliable and useful in evaluating scientific literatures and whether user readership data can reveal high quality articles, to validate the indicators of Altmetrics in scientific evaluation.[Methods] The paper selects a number of articles, collects these articles' citations in Web of Science (WoS) and Google Scholar (GS) and user readership data in Mendeley, and then makes statistical and correlational analyses.[Results] Mendeley has accumulated much more user data than before. Articles' user readership data have strong relationship with the citations in WoS and GS. However, the relationship between user counts and citations in the articles that have highest citations in WoS is relatively weaker.[Limitations] In this research, articles come from less journals in a specific field, that may make it be lack of representativeness and comprehensiveness.[Conclusions] User readership data could be useful to act as a supplement of present scientific evaluation indicators.
[Objective] By collecting and visualizing the sentiment information from bullet-screen comments, we can extract the emotion features and the trend of online videos.[Context] The visualized information of bullet-screen comments can be considered as sentiment tags. Based on these labels of online video, a new retrieval model focusing on comment emotion can be raised.[Methods] According to sentence level sentiment analysis, the study model of sentiment analysis towards bullet-screen comments is developed, including process of constructing sentiment word dictionary, extracting sentiment words and calculating weight value of comments based on time series.[Results] Analyzing tools of radar map, tag cloud and trend-curve diagram are utilized to present the outcome.[Conclusions] Sentiment analysis and visualization methods utilized in bullet-screen comments can provide a new approach to retrieve online videos.
[Objective] Based on the mainstream private cloud management software and Drupal to virtualize cloud computing KVM system.[Context] The KVM application operated by Shenzhen University at present is managed through the system own management tool, which is not only less efficient, but also low security in data.[Methods] Developing custom modules, combing with PHP-SSH2, taking KVM middleware Libvirt API, then building KVM system.[Results] The KVM system is achieved, and the limitation of high requirements for mainstream private cloud severs and inability to manage the existing servers is made up.[Conclusions] This system can achieve the systemized management, while it also has the ability to manage existing virtual machines, and has excellent expansibility and independency.
[Objective] To improve the information effect with multidimensional information visualization technology.[Context] “Classic Reading” platform requires continuous application of new technology to improve user-experience, and attract more readers.[Methods] Combine the image-based and animation-based multidimensional information visualization technology, and mix display-mode of chess-board, rotating-shelves and water-fall-flow together.[Results] Book-detail's pageviews decreased while book-reports' averagely increased by 65% per month.[Conclusions] Readers turn to pay more attention for book-reports, that visualization function can improve “Classic Reading” teaching quality.
[Objective] Develop new functions on WeChat platform to enhance interactivity between users and libraries. [Context] More and more libraries offer services with the popularity of WeChat platform.[Methods] By using Apache+PHP+MySQL architecture and new WeChat Platform interfaces, more interactive functions are developed.[Results] Three interactive functions as “Happy Quiz”, “Music Appreciation” and “Picture Wall” are realized.[Conclusions] Using WeChat Platform, libraries should offer readers more new interactive services.