[Objective] This research aims to identify shopping tasks from product search, and then analyze the characteristic of multi-task sessions. [Methods] Using the product classification of Taobao, and a list of manually selected product terms, we identified online shopping tasks based on query terms from 19 704 search sessions by 2 754 users. [Results] First, factors influence the number of queries per shopping task: product characteristics, the amount of available products, and the difficulty in describing product category with query terms. Second, we found that in sessions with a major task, the relationship among the shopping tasks is closer. [Limitations] The task identification method based on query terms cannot completely describe the complex consumer shopping behaviors. [Conclusions] This study provides an exploratory understanding of the relationships among various shopping tasks, and may be used to improve product recommendation algorithm, as well as predict shopping behaviors.
[Objective] This study aims to identify the key nodes of public opinion spread and evolution based on the semantic social network model. [Methods] We first built model for Weibo semantic social network with the help of hypernetwork theory, and then used emotion Ontology and LDA model to quantify nodes. Finally, we established the hyper edge sorting algorithm to identify the key nodes. [Results] The proposed model could effectively and acturately quantify those nodes from real Weibo data. [Limitations] We did not explore the results of the proposed method’s real-time performance, and new ways of leading the public opinion after identifying those key nodes. [Conclusions] This study provides a solution for the government to identify the key nodes in the social network systems, and then reduce the impacts of negative contents to the healthy development of the Internet.
[Objective] This study aims to explore the relationship between the user tags and microblog post topics, with the purpose of improving subject identification and automatic tag recommendation services. [Methods] We first used crawlers to retrieve user profiles and posts in the field of “natural language processing” from the Sina Weibo. Second, extracted words from the posts and semantically extended user tags. Finally, matched the tags and posts by the edit distance algorithm. [Results] There was correlation between user tags and posts in natural language processing field. [Limitations] We only studied one academic field and the Sina Weibo, more research is needed in the future to generalize the results. [Conclusions] The tag recommendation system can use microblog posts as an important source to provide more personalized services, which in turn will improve the microblog content analysis.
[Objective] Generate hierarchical semantic paths of texts from Wikipedia. [Methods] We first establish article concept vector of Chinese texts from Wikipedia through explicit semantic analysis. And then, we mapped the vector to the category nodes of hierarchical-tree-like graph. Finally, we generated the hierarchical paths with the help of seed node information diffusion and top-down path selection, as well as optimization technology. [Results] The average relevance degree of the first generated hierarchical path was 54.10% on the test dataset, and the top 20 paths were sorted by relevance in the descending order. [Limitations] We did not analyze the effect of using different numbers of explicit concept vector to the quality of the generated path. [Conclusions] The hierarchical paths generated from Wikipedia can reflect the main semantic meaning of the given texts.
[Objective] This study aims to identify the research trends (RT) based on patent bibliographic coupling method with the help of similarity algorithms. [Methods] We first established two types of patent similarity matrixes with two similarity algorithms - observed value (OV-BCA) and cosine distance (CD-BCA). We then used social network analysis to get the RT of Brain-Computer Interface (BCI) Patents. [Results] Six BCI research trend clusters were retrieved by OV-BCA algorithm, while CD-BCA algorithm got nine RT clusters. The two algorithms’ family ID coincidence rates were 43%. [Limitations] We focused on the comparison of results, including number, content and coincidence degree. More research is needed to study the characteristics of these algorithms. [Conclusions] RT can be retrieved by bibliographic coupling method with the help of the proposed algorithms. Specifically, the cosine distance algorithm can find more detailed research trends than the observed value algorithm.
[Objective] This paper studied the linked data from the Web, which is machine-readable, semantically meaningful and relationally descriptive. We examined these data’s effectiveness to improve the information organization of the academic resource websites (ARWs), with the purpose of retrieving more similar documents. [Methods] We first calculated the similarity of documents published in the ARWs with the help of the Latent Semantic Analysis (LSA) method. Then, chose documents with high similarities by the Hierarchical Cluster method, and created a document relation matrix. Finally, we used the dynamic document technology to generate a linked data index to search the ARWs. [Results] We built a preliminary ARWs linked data index, which helped us find similar documents more effectively from the ARWs. [Limitations] We investigated the similar documents retrieval technology from the perspective of statistical analysis. Therefore, further research is needed to locate similar documents from various subject areas with the support of deep learning technology. [Conclusions] We computed documents’ similarity using LSA method to discover related documents of specific articles. The linked data could help us find more similar documents, while reducing the waiting time for similarity calculation.
[Objective] This study aims to extract knowledge for clinical decision from electronic medical records through semantic analysis. [Methods] We first extracted clinical terms from the training samples by the word segmentation algorithm with the help of custom dictionary and statistical method. Then, we used latent semantic analysis to find the potential correlations between clinical terms and treatment plans. Finally, we established a latent semantic model to support gastric cancer treatments. [Results] We successfully extracted 605 treatment plans from 1000 test samples based on the discharge summary texts. [Limitations] Only discharge record texts were examined for this study. [Conclusions] The latent semantic analysis could effectively process electronic medical records to assist doctors’ clinical decision-making work, which posed positive effects to the development of electronic medical record applications.
[Objective] This study aims to utilize the knowledge sharing and constantly updating advantages of the Question Answering Community - Baidu Zhidao, which helps us reduce the cost of maintaining large geographical relationship resource, and find the complete location information. [Methods] First, we changed the incomplete location information to the approximate area names retrieved from Baidu Zhidao. Second, extracted each area’s features and calculated scores of related geographic entities. Finally, we constructed the feature vectors for the areas with those geographic entities, which help us identify the geographic locations of these posts. [Results] The proposed method could retrieve accurate geographic information from 92.51% of City Complaints from the Micro-blog platform. [Limitations] The proposed method could not analyze posts without any geographic location information. [Conclusions] Our study found an effective and feasible way to locate the missing geographic information.
[Objective] This study aims to identify the research object attribute instance from the paper titles. With the help of limited labeled samples, we could maximumize the accuracy of research object recognition. [Methods] We first analyzed the grammatical features of scientific research objects based on conditional random field sequence labeling algorithm. Second, we recognized and extracted research objects using a small amount of samples. Finally, we introduced an active learning iterative labeling system based on unlabeled data to improve the research object recognition accuracy. [Results] The results showed that the proposed method could efficiently use the unlabeled data, and increase the accuracy of the research object recognition to 78.3%. [Limitations] The proposed algorithm needs to be further optimized to improve its efficiency. [Conclusions] The proposed method performed well on the research object attributes identification, which is the foundation for further mining the knowledge system and the structure of science and technology literature.
[Objective] This study aims to build a knowledge requirement model for online outsourcing tasks. [Context] The proposed model could help us find proper personnel for each task. [Methods] We first designed an expert system framework and built a descriptive model for each task. And then, we analyzed the tasks based on inference rules and text analysis technology, with the purpose of quantifying the knowledge requirement for each task. [Results] The proposed framework successfully established the knowledge requirement model. [Conclusions] The new model laid foundation for the task-talent matching system of online outsourcing services.
[Objective] This study aims to effectively storage, manage and reuse scientific data with the help of a specialized data repository management system, the TeamDR, for the research teams. [Context] TeamDR is a Web tool helping scientific research team members organize, storage, manage and share data. It was developed by Java and offered cloud-based and standalone services. [Methods] We first designed a dynamic metadata template to organize and manage scientific research data. MongoDB was then adopted to improve data storage capacity and query performance. [Results] TeamDR stores and manages the scientific research data effectively with the support of, dynamic metadata template, categorized sharing control, and full-text search of metadata. Users’ feedbacks show that TeamDR meets the demands of scientific data storage and management. [Conclusions] TeamDR effectively addresses the issues of scientific data storage and management, data sharing and collaboration, data discovery and linking. However, this system’s usability, completeness and extensibility could be further improved.
[Objective] This study aims to promote the creation, management and use of academic library’s digital resources. [Context] The development of IPv6 and 10 Gigabit Network generated difficulties in network data acquisition. [Methods] We proposed a port mirroring technology for network devices. Data from the IPv4 and IPv6 networks were filtered before they were collected for the digital resource usage analysis system. [Results] We built a practical digital resource usage analysis system. [Conclusions] The proposed method helped academic library establish digital resources analysis systems for the IPv4/IPv6 dual stack and high-speed campus network environment.
[Objective] This study aims to build a bridge between the online readers and the library, which significantly improves the library service and user experience. [Context] The near-field service was provided by traditional electronic display panels and in-person introduction. There was few guide resource online. [Methods] We designed a new service model for Beijing Administrative College Library with the help of WeChat platform, the iBeacon technology and the HTML5. [Results] The new model provided different near-field library services for different scenarios, which attracted more online users. The WeChat Public Account of the library also got more followers. [Conclusions] The new near-field services completely changed the library’s user experience. The readers are more willing to learn and interactively use the library resources.