Please wait a minute...
Advanced Search
现代图书情报技术  2015, Vol. 31 Issue (10): 81-87     https://doi.org/10.11925/infotech.1003-3513.2015.10.11
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
自动标注中文专利的引文信息
姜春涛
南京大学计算机科学与技术系 南京 210023;
江苏省专利信息服务中心 南京 210008
Automatic Annotation of Bibliographical References in Chinese Patent Documents
Jiang Chuntao
Department of Computer Science and Technology, Nanjing University, Nanjing 210023, China;
Patent Information and Service Center of Jiangsu Province, Nanjing 210008, China
全文: PDF (431 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的] 自动标注嵌入中文专利文本中的专利、标准、学术论文、其他专著4类引用信息。[方法] 对于专利、标准和其他专著的引用, 应用模式匹配的方法标注; 对于学术论文的引用, 应用由两阶段构成的机器学习方法标注, 自动检测含有引用的句子, 并从中自动提取6类文献特征信息。[结果] 10层交叉验证的结果表明: 专利引用标注的精确度和查全度均为100%, 标准引用标注的精确度和查全度分别达到92%和94%, 而其他专著引用标注的精确度和查全度分别达到80%和71%; 标注学术论文引用的精确度和查全度在阶段一分别为95.7%和96.0%, 阶段二分别为95.3%和94.9%。[局限] 模式匹配方法需要人工分析大量的专利文件, 训练数据规模相对较小。[结论] 运用模式匹配方法标注专利、标准引用的性能高于92%; 运用机器学习方法标注学术论文引用的平均性能达到95%。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
Abstract

[Objective] This paper aims to automatically annotate four types of bibliographical references in Chinese patent documents, such as patents, standards, papers, and other monographs public documents. [Methods] Use a pattern matching approach to annotate the references of patents, standards, and public documents, and use a two-phase machine learning approach to annotate the paper references, firstly, automatically detecte the sentences that contain citation information, then extracte 6 categories of bibliographic features from the results. [Results] The results of ten-fold cross validation show that the accuracy for annotating patents is 100%, and the precision and recall for annotating standards is 92% and 94% respectively, while the precision and recall for annotating public documents is 80% and 71% respectively. For annotating paper references, the precision and recall in phase one is 95.7% and 96.0% and in phase two is 95.3% and 94.9% respectively. [Limitations] The pattern matching approach requires analyzing a lot of patent documents manually, and the size of the training model used by the proposed machine learning approach is relatively small. [Conclusions] The performance of annotating patents and standards using a pattern matching approach achieves over 92%, and the performance of annotating papers using a machine learning approach achieves 95%.

收稿日期: 2015-04-14      出版日期: 2016-04-06
:  TP393  
通讯作者: 姜春涛, ORCID: 0000-0001-8332-7858, E-mail: spring_surge@126.com。     E-mail: spring_surge@126.com
引用本文:   
姜春涛. 自动标注中文专利的引文信息[J]. 现代图书情报技术, 2015, 31(10): 81-87.
Jiang Chuntao. Automatic Annotation of Bibliographical References in Chinese Patent Documents. New Technology of Library and Information Service, 2015, 31(10): 81-87.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2015.10.11      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2015/V31/I10/81

[1] Lopez P. Automatic Extraction and Resolution of Bibliographical References in Patent Documents [A].//Advances in Multidisciplinary Retrieval [M]. Springer Berlin Heidelberg, 2010: 120-135.
[2] Lai K K, Wu S J. Using the Patent Co-citation Approach to Establish a New Patent Classification System [J]. Information Processing and Management, 2005, 41(2): 313-330.
[3] Mayer M. Does Science Push Technology? Patents Citing Scientific Literature [J]. Research Policy, 2000, 29(3): 409-434.
[4] Adams S. The Text, the Full Text and Nothing but the Text: Part 1-Standards for Creating Textual Information in Patent Documents and General Search Implications [J]. World Patent Information, 2010, 32(1): 22-29.
[5] Lawson M, Kemp N, Lynch M F, et al. Automatic Extraction of Citations from the Text of English-language Patents - An Example of Template Mining [J]. Journal of Information Science, 1996, 22(6): 423-436.
[6] Agatonovic M, Aswani N, Bontcheva K, et al. Large-scale Parallel Automatic Patent Annotation [C]. In: Proceedings of the 1st ACM Workshop on Patent Information Retrieval. ACM, 2008.
[7] Lopez P, Romary L. Multiple Retrieval Models and Regression Models for Prior Art Search [C]. In:Proceedings of the 2009 Cross-Language Evaluation Forum Workshop. Springer, 2009.
[8] Peng F, McCallum A. Accurate Information Extraction from Research Papers Using Conditional Random Fields [C]. In: Proceedings of the 2002 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL). 2004: 329-336.
[9] Nanba H, Anzen N, Okumura M. Automatic Extraction of Citation Information in Japanese Patent Applications [J]. International Journal on Digital Library, 2008, 9(2): 151-161.
[10] Feng G, Chen X, Peng Z. A Rules and Statistical Learning Based Method for Chinese Patent Information Extraction [C]. In: Proceedings of the 8th Web Information Systems and Applications Conference. IEEE, 2011:114-118.
[11] 姜彩红, 乔晓东, 朱礼军. 基于本体的专利摘要知识抽取[J]. 现代图书情报技术, 2009(2): 23-28. (Jiang Caihong, Qiao Xiaodong, Zhu Lijun. Ontology-based Patent Abstracts' Knowledge Extraction [J]. New Technology of Library and Information Service, 2009(2): 23-28.)
[12] 王曰芬, 徐丹丹, 李飞. 专利信息内容挖掘及其试验研究[J]. 现代图书情报技术, 2008(12): 59-65. (Wang Yuefen, Xu Dandan, Li Fei. Experimental Study of Patent Information Content Mining [J]. New Technology of Library and Information Service, 2008(12): 59-65.)
[13] 于霜. 基于专利引文网络的空间关系可视化研究[D]. 大连: 大连理工大学, 2010. (Yu Shuang. Analysis on Visualization Among Spatial Relationship Based on Patent Citation Network [D]. Dalian: Dalian University of Technology, 2010.)
[14] 薄怀霞. 基于构建专利引文数据库的专利文献分析研究[D]. 曲阜: 曲阜师范大学, 2014. (Bo Huaixia. Patent Literature Analysis Study Based on Building Patent Citation Databases [D]. Qufu: Qufu Normal University, 2014.)
[15] Lafferty J D, Mccallum A, Pereira F C N. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [C]. In: Proceedings of the 18th International Conference on Machine Learning. 2001: 282-289.
[16] Vapnik V N. The Nature of Statistical Learning Theory [M]. The 2nd Edition. Springer, 1999.
[17] Cho H C, Okazak N, Miwa M, et al. Named Entity Recognition with Multiple Segment Representations [J]. Information Processing and Management, 2013, 49(4): 954-965.
[18] Hall M, Frank E, Holmes G, et al. The WEKA Data Mining Software: An Update [J]. SIGKDD Explorations, 2009, 11(1): 10-18.
[19] Fan R E, Chang K W, Hsieh C J, et al. LibLinear: A Library for Large Linear Classification [J]. Journal of Machine Learning Research, 2008, 9(12): 1871-1874.
[20] Okazaki N. CRFsuite: A Fast Implementation of Conditional Random Fields [CP/OL]. [2015-03-24]. http://www.chokkan. org/software/crfsuite/.
[21] Nocedal J. Updating Quasi-Newton Matrices with Limited Storage [J]. Mathematics of Computation, 1980, 35(151): 773-782.
[22] Crammer K, Dekel O, Keshet J, et al. Online Passive Aggressive Algorithms [J]. Journal of Machine Learning Research, 2006, 7(3): 551-585.

[1] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] 李文娜,张智雄. 基于置信学习的知识库错误检测方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 1-9.
[3] 孙羽, 裘江南. 基于网络分析和文本挖掘的意见领袖影响力研究 [J]. 数据分析与知识发现, 0, (): 1-.
[4] 王勤洁, 秦春秀, 马续补, 刘怀亮, 徐存真. 基于作者偏好和异构信息网络的科技文献推荐方法研究*[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[5] 李文娜, 张智雄. 基于联合语义表示的不同知识库中的实体对齐方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 1-9.
[6] 王昊, 林克柔, 孟镇, 李心蕾. 文本表示及其特征生成对法律判决书中多类型实体识别的影响分析[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[7] 杨晗迅, 周德群, 马静, 罗永聪. 基于不确定性损失函数和任务层级注意力机制的多任务谣言检测研究*[J]. 数据分析与知识发现, 2021, 5(7): 101-110.
[8] 徐月梅, 王子厚, 吴子歆. 一种基于CNN-BiLSTM多特征融合的股票走势预测模型*[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[9] 黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[10] 王晰巍,贾若男,韦雅楠,张柳. 多维度社交网络舆情用户群体聚类分析方法研究*[J]. 数据分析与知识发现, 2021, 5(6): 25-35.
[11] 阮小芸,廖健斌,李祥,杨阳,李岱峰. 基于人才知识图谱推理的强化学习可解释推荐研究*[J]. 数据分析与知识发现, 2021, 5(6): 36-50.
[12] 刘彤,刘琛,倪维健. 多层次数据增强的半监督中文情感分析方法*[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[13] 陈文杰,文奕,杨宁. 基于节点向量表示的模糊重叠社区划分算法*[J]. 数据分析与知识发现, 2021, 5(5): 41-50.
[14] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[15] 闫强,张笑妍,周思敏. 基于义原相似度的关键词抽取方法 *[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn