Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (9): 31-37    DOI: 10.11925/infotech.1003-3513.2015.09.05
Current Issue | Archive | Adv Search |
A Method of Keywords Annotation Based on Linked Triples
Xu Deshan1, Li Hui2, Zhang Yunliang1
1 Institute of Scientific & Technical Information of China, Beijing 100038, China;
2 Beijing Institute of Science and Technology Information, Beijing 100048, China
Download: PDF(1522 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] Build an auto-indexing system by triple acquirement and NLP for Chinese scientific and technical literatures based on Ontology management and service platform. [Methods] Merging Ontology knowledge bases and vocabularies by Web services, the system can identify the terms and unlisted words through matching vocabulary and words combination, as well as link them with the triples in the knowledge bases for building a conceptual relational network. [Results] This system can process 86 articles per second with recall rate of 65% and precision rate of 69%. [Limitations] It takes a lot of time to match terms because no index is built. The performance of Chinese word segmentation and POS tagging are influenced by the noise data such as spaces, line break, and so on. [Conclusions] Data cleaning process and algorithm optimization of keywords selecting need continuous study for supporting the deep mining and enhancing the efficiency of the system.

Received: 26 January 2015      Published: 06 April 2016
:  TP391.1  

Cite this article:

Xu Deshan, Li Hui, Zhang Yunliang. A Method of Keywords Annotation Based on Linked Triples. New Technology of Library and Information Service, 2015, 31(9): 31-37.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.09.05     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I9/31

[1] Domingue J, Dzbor M, Motta E. Magpie: Supporting Browsing and Navigating on the Semantic Web [C]. In: Proceeding of the 9th International Conference on Intelligent User Interfaces, Funchal, Portugal. 2004:191-197.
[2] Handschuh S, Staab S. Authoring and Annotation of Web Pages in CREAM [C]. In: Proceeding of the 11th International Conference on World Wide Web, Honolulu, Hawaii, USA. 2002: 462-473.
[3] Annotea Project [EB/OL]. [2014-10-13]. http://www.w3.org/2001/Annotea/.
[4] Ontotext Semantic Platform [EB/OL]. [2014-10-13]. http://www.ontotext.com/products/ontotext-semantic-platform.
[5] Dill S, Eiron N, Gibson D, et al. SemTag and Seeker: Bootstrapping the Semantic Web via Automated Semantic Annotation [C]. In: Proceedings of the 12th International Conference on World Wide Web, Budapest, Hungary. 2003:178-186.
[6] Armadillo [EB/OL]. [2014-10-13]. http://www.hrionline.ac. uk/armadillo/links.html.
[7] GATE [EB/OL]. [2014-10-13] https://gate.ac.uk/overview. html.
[8] Text2Onto [EB/OL]. [2014-10-13]. http://semanticweb.org/wiki/Text2Onto.
[9] 马颖华, 王永成, 苏贵洋, 等. 一种基于字同现频率的汉语文本主题抽取方法[J]. 计算机研究与发展, 2003, 40(6): 874-878. (Ma Yinghua, Wang Yongcheng, Su Guiyang, et al. A Novel Chinese Text Subject Extraction Method Based on Character Co-occurrence [J]. Journal of Computer Research and Development, 2003, 40(6): 874-878.)
[10] 耿焕同, 蔡庆生, 于琨, 等. 一种基于词共现图的文档主题词自动抽取方法[J]. 南京大学学报: 自然科学版, 2006, 42(2): 156-162. (Geng Huantong, Cai Qingsheng, Yu Kun, et al. A Kind of Automatic Text Keyphrase Extraction Method Based on Word Co-occurrence [J]. Journal of Nanjing University: Natural Sciences, 2006, 42(2): 156-162.)
[11] 索红光, 刘玉树, 曹淑英. 一种基于词汇链的关键词抽取方法[J]. 中文信息学报, 2006, 20(6): 25-30. (Suo Hongguang, Liu Yushu, Cao Shuying. A Keyword Selection Method Based on Lexical Chains [J]. Journal of Chinese Information Processing, 2006, 20(6): 25-30.)
[12] 李素建, 王厚峰, 俞士汶, 等. 关键词自动标引的最大熵模型应用研究[J]. 计算机学报, 2004, 27(9): 1192-1197. (Li Sujian, Wang Houfeng, Yu Shiwen, et al. Research on Maximum Entropy Model for Keyword Indexing [J]. Chinese Journal of Computers, 2004, 27(9): 1192-1197.)
[13] 赵鹏, 蔡庆生, 王清毅, 等. 一种基于复杂网络特征的中文文档关键词抽取算法[J]. 模式识别与人工智能, 2007, 20(6): 827-831. (Zhao Peng, Cai Qingsheng, Wang Qingyi, et al. An Automatic Keyword Extraction of Chinese
Document Algorithm Based on Complex Network Features [J]. Pattern Recognition and Artificial Intelligence, 2007, 20(6): 827-831.)
[14] 段宇锋, 黑珍珍, 鞠菲, 等. 基于自主学习规则的中文物种描述文本的语义标注研究[J]. 现代图书情报技术, 2012(5): 41-47. (Duan Yufeng, Hei Zhenzhen, Ju Fei, et al. Study on Semantic Markup of Species Description Text in Chinese Based on Auto-learning Rules [J]. New Technology of Library and Information Service, 2012(5): 41-47.)
[15] 段宇锋, 朱雯晶, 陈巧, 等. 朴素贝叶斯算法与Bootstrapping方法相结合的中文物种描述文本语义标注研究[J]. 现代图书情报技术, 2014(5): 83-89. (Duan Yufeng, Zhu Wenjing, Chen Qiao, et al. Semantic Annotation of Species Description Text in Chinese by Combining Naive Bayes Algorithm with Bootstrapping Method [J]. New Technology of Library and Information Service, 2014(5): 83-89.)
[16] 罗军, 高琦, 王翊. 基于Bootstrapping的本体标注方法[J].计算机工程, 2010, 36(23): 85-87. (Luo Jun, Gao Qi, Wang Yi. Ontology Annotation Method Based on Bootstrapping [J]. Computer Engineering, 2010, 36(23): 85-87.)
[17] 米杨, 曹锦丹. 顶级本体统控的多本体语义标注实证研究[J]. 现代图书情报技术, 2012(9): 36-41. (Mi Yang, Cao Jindan. A Case Study of Semantic Annotation with Multi-Ontology by Upper-level Ontology Unitive Control [J]. New Technology of Library and Information Service, 2012(9): 36-41.)
[18] 许德山, 张运良. 集成化本体管理平台的设计与实现[J]. 数字图书馆论坛, 2013(11): 15-20. (Xu Deshan, Zhang Yunliang. Design and Implementation of Integrated Ontology Management Platform [J]. Digital Library Forum, 2013(11): 15-20.)

[1] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[2] Chen Shiqin, Li Wenjiang. Application of WebSocket in Library Mobile Information Service[J]. 现代图书情报技术, 2015, 31(9): 90-96.
[3] Hu Juxiang, Lv Xueqiang, Liu Kehui. Complaint Text Classification Based on Guiding Words[J]. 现代图书情报技术, 2015, 31(7-8): 97-103.
[4] Duan Yufeng, Zhu Wenjing, Chen Qiao, Liu Wei, Liu Fenghong. The Study on Out-of-Vocabulary Identification on a Model Based on the Combination of CRFs and Domain Ontology Elements Set[J]. 现代图书情报技术, 2015, 31(4): 41-49.
[5] Li Junfeng, Lv Xueqiang, Zhou Shaojun. Patent Keyword Indexing Based on Weighted Complex Graph Model[J]. 现代图书情报技术, 2015, 31(3): 26-32.
[6] Ma Bin, Yin Lifeng. A Parallel Naive Bayesian Network Public Opinion Fast Classification Algorithm Based on Hadoop Platform[J]. 现代图书情报技术, 2015, 31(2): 78-84.
[7] Hou Ting, Lv Xueqiang, Li Zhuo. Hierarchical Filtering Method for Patent Term Extraction[J]. 现代图书情报技术, 2015, 31(1): 24-30.
[8] Tang Shouli, Xu Baoxiang. Research on Ontology-based Cloud Services Semantic Retrieval System[J]. 现代图书情报技术, 2014, 30(12): 27-35.
[9] Tang Xiaobo, Xiao Lu. Research of Text Feature Extraction on Dependency Parsing Network[J]. 现代图书情报技术, 2014, 30(11): 31-37.
[10] Shi Cui, Wang Yang, Yang Bin, Yao Ye. Identification of Non-nest Coordination for Chinese Patent Literature[J]. 现代图书情报技术, 2014, 30(10): 76-83.
[11] Zhang Yongjun, Liu Jinling, Ma Jialin. Classification of Multi Topic Extraction Based on Chinese Short Information Text Message Flow[J]. 现代图书情报技术, 2014, 30(7): 101-106.
[12] Li Wenjiang, Chen Shiqin. WeChat as Library Public Service Platform for the APP Client[J]. 现代图书情报技术, 2014, 30(7): 133-138.
[13] Tang Qing,Lv Xueqiang,Li Zhuo,Shi Shuicai,. Research on Domain Ontology Term Extraction[J]. 现代图书情报技术, 2014, 30(1): 43-50.
[14] Li Wenjiang, Chen Shiqin. Design of Library Information Push System Based on Android GCM Service[J]. 现代图书情报技术, 2013, 29(11): 91-96.
[15] Xiong Liyan, Tan Long, Zhong Maosheng. An Automatic Term Extraction System of Improved C-value Based on Effective Word Frequency[J]. 现代图书情报技术, 2013, 29(9): 54-59.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn