[Objective] This study investigated the advantages of desktop search for the purpose of improving mobile search services. [Methods] We analyzed the differences between mobile and desktop search behaviors through user search experiment. [Results] The mobile and desktop search behaviors are different in search platforms, types of searched information, search situation, search process, user experience, search result precision, and user satisfaction, etc. [Limitations] The types and number of experimental population are limited, thus, further studies are needed to generalize our findings to a larger population. [Conclusions] Both mobile search and desktop search have advantages and disadvantages. However, desktop search has more advantages over its mobile counterpart.
[Objective] This study tries to extract domain terms more accurately and conveniently. [Methods] First, proposed a method using the CBOW model to build word vectors for each component of the terms. Then, applied the cosine similarity to calculate the internal correlation degree among each term’s individual components. To get more representative terms, we used the PageRank algorithm to rank the candidates. [Results] We obtained high recall and precision rates using the paper abstacts in the field of natural language processing as the training pool. [Limitations] The training pool was relatively small, which might influence the results. [Conclusions] This study shows that CBOW model is a more appropriate method to extract terminologies.
[Objective] This study proposed a confidence ranking model to extract product feature and user opinion from the Chinese online reviews. [Methods] Examining the semantic and association relations between candidate words, we built the confidence ranking model based on the improved HITS algorithm, and then retrieved the feature and opinion words. [Results] Compared with the reference model, our method showed better recall and precision rates while extracting the feature and opinion words from the Chinese corpus. [Limitations] Only extracted the explicit feature and opinion words, and did not try to identify and extract the implicit ones. [Conclusions] We could effectively extract the feature and opinion words using their mutual reinforcement and semantic relations. Filtering method of the semantic polarity could also improve the precision of the extracted opinion words.
[Objective] This study proposes a new approach to identify terminologies from search engine query logs for the purpose of improving traditional technology.[Methods]First, used the four-partite graph to re-present those query logs.Then,ranked the candidate terminologies with the help of manifold ranking algorithm. Those top ranked ones were domain-specified. [Results]We tested the proposed method with real search engine query logs and found the precision rates were about 20% higher than the standard approach. [Limitations] The coverage of those identified terminologies relies on the initial domain-specified queries manually chosen by the experts. [Conclusions]The proposed approach could build high quality domain thesaurus without pre-defined large domain corpus and annotations. Thus, the new method was more practical for real world issues.
[Objective] This study aims to improve the precision, recall and user experience of the search engine. [Methods] We proposed an automatic query correction model based on the statistics and characteristics. First, established a model to generate the confusion query set for the users’ search terms, Then, created a ranking algorithm for the confusion set and chose the best match for the original queries. [Results] Our new model improved the search engine’s performance. The precision and recall rates were 92.2% and 95% on a testing set of 110k, which were 13.6% and 8.3% higher than those of the N-gram model. [Limitations] Our model only generated four types of words for the confusion set, and the training process required a lot of computation. [Conclusions] The new model can improve the precision, recall and user experience of the search engine.
[Objective] This study tried to describe the customers’ characteristics effectively. [Methods] The proposed framework aimed to explore the personal and social relationship among the customers and their friends on the microblog platform. We described the customers’ characteristics using self-defined tags and then created segmentation with the help of text clustering and non-negative matrix factorization technologies. [Results] The method based on non-negative matrix factorization achieved an approximately 86.130% on average asw index, which outperformed traditional methods based on K-means and hierarchical clustering. [Limitations] The customers’ characteristic cannot be described only by himself and his friends with self-defined tags on Microblogging. [Conclusions] The proposed framework could improve the effectiveness of characteristics description, evaluation and visualization of microblog customer segmentation.
[Objective] This study classified the Online to Offline (O2O) service users accurately, which could lead to more appropriate service strategies for different user groups. [Methods] We first designed an O2O user classification model based on the Latent Class Analysis (LCA). Then, we classified the catering service O2O customers to examine the simplicity and efficiency of this new model. [Results] We grouped the users into four categories and found their latent classes, which helped the O2O service providers develop different marketing strategies. [Limitations] Applying the proposed method to classify users might have some subjective factors involved. [Conclusions] The LCA model could help us better categorize and target the O2O service users, which expanded the applicable scope of this model.
[Objective] This paper aims to solve the feature mismatch problem caused by different document types and improve the performance of automatic classification technology. [Methods] We proposed a new method to extend the semantic features using documents of various types as the corpus, which were introduced the third-party resource HowNet and were different with the other un-categorized ones. [Results] Compared with the non-feature-extension classification method, the proposed method increased the F-measure by 1.2% to 11.0% in our classification experiment. Four document types, used in our study included webpages, books, non-academic periodicals and academic journals. [Limitations] Not every type of document was tested with the publicly accessible corpus, thus, more tests were needed to examine the generalization and objectiveness of the new method. [Conclusions] Our study showed that the proposed method was feasible. It could effectively eliminate the semantic differences among various types of collections and improve the performance of automatic text classification through corpus construction and feature extension.
[Objective] This study aims to build a sentiment analysis dictionary for the Chinese book reviews. [Methods] We first divided the user’s sentiments into seven categories, which were used to create the Chinese book review emotional word list. Then, chose seed terms from that list with the help of a basic sentiment analysis lexicon. Finally, used the improved SO-PMI algorithm and synonym expansion method to classify target terms from the real book reviews. [Results] With the help of this new book review sentiment analysis dictionary, the average precision, recall and F1 rates were 0.90, 0.83 and 0.85 respectively. [Limitations] The test corpus is relatively small, which might influence our results. [Conclusions] The proposed method was an effective and reliable way to conduct sentiment analysis for the Chinese book reviews.
[Objective] This study ctreated a new model to improve the quality and maintenance efficiency of the personal name authority data in China. [Methods] To prove the feasibility of using open semantic resources to enrich the name authority data, this study analyzed the number and types of semantic resources, evaluation metrics, automation and maintenance speed, as well as the credibility of the open resource. The FOAF was used as an example to implement the schema. [Results] This study set restriction conditions, interface mode and harvest rules for obtaining the semantic resources. It created RDF predicate and two realizing techniques, like SDK and software to discover and integrate resources. This study designed automatic multi-matching algorithm and mapping table to automatically enrich name authority data. [Limitations] Only creates the schema, which was not put into practice. The semantic resource’s storage model and the extraction processing methods are also at the initial framework stage. No detailed implementation technology was discussed. [Conclusions] The proposed method could be automatically matched with open semantic resources of the individual names to enrich local personal name authority data.
[Objective] This study developed a disease prediction model based on the support vector machine, using electronic medical records of the severe acute pancreatitis patients. [Methods] We first adjusted the kernel type and parameter values of the support vector machine method to get an optimized prediction model. Then, we combined it with univariable and multivariable logistic regression analysis methods to select features’ variable. Finally, we proposed a simplified early warning model for the severe acute pancreatitis. [Results] The new model’s prediction accuracy rate is 70.37%. Variables used by this model include: white blood cell count, serum calcium, serum lipase, systolic blood pressure, diastolic blood pressure and pleural effusion. [Limitations] Because of the small sample size, we only used this support vector machine method to develop the new disease prediction model. In the future, we will try to establish a larger examination system for the clinical trial. [Conclusions] Support vector machine can help us develop an optimal disease prediction model. A new system based on this model could support our clinical decision makings.
[Objective] This study aims to retrieve the trending events from the micro-blog platform with the help of data mining algorithms. [Methods] First, we collected micro-blog message with geographic coordinates from the most popular platform (the Sina Weibo) using its API service. Then, we used the K-means, KNN and decision trees algorithms to construct the geographical patterns of those collected posts. The number of published posts, re-tweets, and comments, as well as user activity and movement strength were also examined. Third, we compared these geographical patterns with the daily regional micro-blog data to identify breaking news in that area. [Results] We analyzed data collected on April 15 and April 16 of 2015 with the help of the proposed model, and found a trending event of “Beijing Sandstorm”. [Limitations] The sample size was small, which might influence the results. [Conclusions] Geographic coordinates could help us detect trending events on the Sina Weibo, and this new method will also support the government’s crisis management strategy and decision-making process.
[Objective] This study tests a new system for building a smart library which provides instant services to the readers based on their locations. [Context] The reader’s needs of service vary with his/her location, and readers at the same location might expect different services. Our new system can predict the real time needs of readers and provide services pro-actively. [Methods] Propose the framework of an active information differential service system based on the location awareness technology. This system can detect readers’ precise locations with the help of WiFi and GPS technology, and then generate personalized service for each individual reader. [Results] After installing the library’s App, the readers can receive different information services at various locations in the library. [Conclusions] Location service helps the library predict readers’ real time needs and improve their user experience.