Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (9): 31-37    DOI: 10.11925/infotech.1003-3513.2015.09.05
Current Issue | Archive | Adv Search |
A Method of Keywords Annotation Based on Linked Triples
Xu Deshan1, Li Hui2, Zhang Yunliang1
1 Institute of Scientific & Technical Information of China, Beijing 100038, China;
2 Beijing Institute of Science and Technology Information, Beijing 100048, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] Build an auto-indexing system by triple acquirement and NLP for Chinese scientific and technical literatures based on Ontology management and service platform. [Methods] Merging Ontology knowledge bases and vocabularies by Web services, the system can identify the terms and unlisted words through matching vocabulary and words combination, as well as link them with the triples in the knowledge bases for building a conceptual relational network. [Results] This system can process 86 articles per second with recall rate of 65% and precision rate of 69%. [Limitations] It takes a lot of time to match terms because no index is built. The performance of Chinese word segmentation and POS tagging are influenced by the noise data such as spaces, line break, and so on. [Conclusions] Data cleaning process and algorithm optimization of keywords selecting need continuous study for supporting the deep mining and enhancing the efficiency of the system.

Received: 26 January 2015      Published: 06 April 2016
:  TP391.1  

Cite this article:

Xu Deshan, Li Hui, Zhang Yunliang. A Method of Keywords Annotation Based on Linked Triples. New Technology of Library and Information Service, 2015, 31(9): 31-37.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.09.05     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I9/31

[1] Domingue J, Dzbor M, Motta E. Magpie: Supporting Browsing and Navigating on the Semantic Web [C]. In: Proceeding of the 9th International Conference on Intelligent User Interfaces, Funchal, Portugal. 2004:191-197.
[2] Handschuh S, Staab S. Authoring and Annotation of Web Pages in CREAM [C]. In: Proceeding of the 11th International Conference on World Wide Web, Honolulu, Hawaii, USA. 2002: 462-473.
[3] Annotea Project [EB/OL]. [2014-10-13]. http://www.w3.org/2001/Annotea/.
[4] Ontotext Semantic Platform [EB/OL]. [2014-10-13]. http://www.ontotext.com/products/ontotext-semantic-platform.
[5] Dill S, Eiron N, Gibson D, et al. SemTag and Seeker: Bootstrapping the Semantic Web via Automated Semantic Annotation [C]. In: Proceedings of the 12th International Conference on World Wide Web, Budapest, Hungary. 2003:178-186.
[6] Armadillo [EB/OL]. [2014-10-13]. http://www.hrionline.ac. uk/armadillo/links.html.
[7] GATE [EB/OL]. [2014-10-13] https://gate.ac.uk/overview. html.
[8] Text2Onto [EB/OL]. [2014-10-13]. http://semanticweb.org/wiki/Text2Onto.
[9] 马颖华, 王永成, 苏贵洋, 等. 一种基于字同现频率的汉语文本主题抽取方法[J]. 计算机研究与发展, 2003, 40(6): 874-878. (Ma Yinghua, Wang Yongcheng, Su Guiyang, et al. A Novel Chinese Text Subject Extraction Method Based on Character Co-occurrence [J]. Journal of Computer Research and Development, 2003, 40(6): 874-878.)
[10] 耿焕同, 蔡庆生, 于琨, 等. 一种基于词共现图的文档主题词自动抽取方法[J]. 南京大学学报: 自然科学版, 2006, 42(2): 156-162. (Geng Huantong, Cai Qingsheng, Yu Kun, et al. A Kind of Automatic Text Keyphrase Extraction Method Based on Word Co-occurrence [J]. Journal of Nanjing University: Natural Sciences, 2006, 42(2): 156-162.)
[11] 索红光, 刘玉树, 曹淑英. 一种基于词汇链的关键词抽取方法[J]. 中文信息学报, 2006, 20(6): 25-30. (Suo Hongguang, Liu Yushu, Cao Shuying. A Keyword Selection Method Based on Lexical Chains [J]. Journal of Chinese Information Processing, 2006, 20(6): 25-30.)
[12] 李素建, 王厚峰, 俞士汶, 等. 关键词自动标引的最大熵模型应用研究[J]. 计算机学报, 2004, 27(9): 1192-1197. (Li Sujian, Wang Houfeng, Yu Shiwen, et al. Research on Maximum Entropy Model for Keyword Indexing [J]. Chinese Journal of Computers, 2004, 27(9): 1192-1197.)
[13] 赵鹏, 蔡庆生, 王清毅, 等. 一种基于复杂网络特征的中文文档关键词抽取算法[J]. 模式识别与人工智能, 2007, 20(6): 827-831. (Zhao Peng, Cai Qingsheng, Wang Qingyi, et al. An Automatic Keyword Extraction of Chinese
Document Algorithm Based on Complex Network Features [J]. Pattern Recognition and Artificial Intelligence, 2007, 20(6): 827-831.)
[14] 段宇锋, 黑珍珍, 鞠菲, 等. 基于自主学习规则的中文物种描述文本的语义标注研究[J]. 现代图书情报技术, 2012(5): 41-47. (Duan Yufeng, Hei Zhenzhen, Ju Fei, et al. Study on Semantic Markup of Species Description Text in Chinese Based on Auto-learning Rules [J]. New Technology of Library and Information Service, 2012(5): 41-47.)
[15] 段宇锋, 朱雯晶, 陈巧, 等. 朴素贝叶斯算法与Bootstrapping方法相结合的中文物种描述文本语义标注研究[J]. 现代图书情报技术, 2014(5): 83-89. (Duan Yufeng, Zhu Wenjing, Chen Qiao, et al. Semantic Annotation of Species Description Text in Chinese by Combining Naive Bayes Algorithm with Bootstrapping Method [J]. New Technology of Library and Information Service, 2014(5): 83-89.)
[16] 罗军, 高琦, 王翊. 基于Bootstrapping的本体标注方法[J].计算机工程, 2010, 36(23): 85-87. (Luo Jun, Gao Qi, Wang Yi. Ontology Annotation Method Based on Bootstrapping [J]. Computer Engineering, 2010, 36(23): 85-87.)
[17] 米杨, 曹锦丹. 顶级本体统控的多本体语义标注实证研究[J]. 现代图书情报技术, 2012(9): 36-41. (Mi Yang, Cao Jindan. A Case Study of Semantic Annotation with Multi-Ontology by Upper-level Ontology Unitive Control [J]. New Technology of Library and Information Service, 2012(9): 36-41.)
[18] 许德山, 张运良. 集成化本体管理平台的设计与实现[J]. 数字图书馆论坛, 2013(11): 15-20. (Xu Deshan, Zhang Yunliang. Design and Implementation of Integrated Ontology Management Platform [J]. Digital Library Forum, 2013(11): 15-20.)

[1] Yu Bengong,Zhu Xiaojie,Zhang Ziwei. A Capsule Network Model for Text Classification with Multi-level Feature Extraction[J]. 数据分析与知识发现, 2021, 5(6): 93-102.
[2] Liu Huan, Zhang Zhixiong, Wang Yufei. A Review on Main Optimization Methods of BERT [J]. 数据分析与知识发现, 0, (): 1-.
[3] Ye Guanghui, Xu Tong, Bi Chongwu, Li Xinyue. The Analysis of City Tourism Portrait Evolution Based on Multi-Dimensional Features and LDA Model [J]. 数据分析与知识发现, 0, (): 1-.
[4] Liu Jingru, Song Yang, Jia Rui, Zhang Yipeng, Luo Yong, Ma Jingdong. A BiLSTM-CRF Model for Chinese Clinical Protected Health Information Recognition [J]. 数据分析与知识发现, 0, (): 0-.
[5] Shi Lei,Wang Yi,Cheng Ying,Wei Ruibin. Review of Attention Mechanism in Natural Language Processing[J]. 数据分析与知识发现, 2020, 4(5): 1-14.
[6] Liu Ping,Peng Xiaofang. Calculating Word Similarities Based on Formal Concept Analysis[J]. 数据分析与知识发现, 2020, 4(5): 66-74.
[7] Liu Shurui,Tian Jidong,Chen Puchun,Lai Li,Song Guojie. New Sample Selection Algorithm with Textual Data[J]. 数据分析与知识发现, 2020, 4(2/3): 223-230.
[8] Xu Jianmin,Zhang Liqing,Wang Miao. Tracking Static Topics with Bayesian Network[J]. 数据分析与知识发现, 2020, 4(2/3): 200-206.
[9] Ying Tan,Jin Zhang,Lixin Xia. A Survey of Sentiment Analysis on Social Media[J]. 数据分析与知识发现, 2020, 4(1): 1-11.
[10] Hui Nie,Huan He. Identifying Implicit Features with Word Embedding[J]. 数据分析与知识发现, 2020, 4(1): 99-110.
[11] Bocheng Li,Yunqiu Zhang,Kaixi Yang. Extracting Emotion Tags from Comments of Microblog Commodities[J]. 数据分析与知识发现, 2019, 3(9): 115-123.
[12] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[13] Yu Chuanming,Gong Yutian,Wang Feng,An Lu. Predicting Stock Prices with Text and Price Combined Model[J]. 数据分析与知识发现, 2018, 2(12): 33-42.
[14] Zeng Ziming,Yang Qianwen. Sentiment Analysis for Micro-blogs with LDA and AdaBoost[J]. 数据分析与知识发现, 2018, 2(8): 51-59.
[15] Jia Longjia,Zhang Bangzuo. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn