Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (11): 158-171     https://doi.org/10.11925/infotech.2096-3467.2022.1026
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于YOLOv5-ECA-BiFPN的学术期刊文献图表识别与提取方法研究*
李英群1,2,李亚菲1,2(),裴雷1,2,胡志伟1,2,宋宁远1,2
1南京大学信息管理学院 南京 210023
2南京大学数据智能与交叉创新实验室 南京 210023
Identifying and Extracting Figures and Tables from Academic Literature Based on YOLOv5-ECA-BiFPN
Li Yingqun1,2,Li Yafei1,2(),Pei Lei1,2,Hu Zhiwei1,2,Song Ningyuan1,2
1School of Information Management, Nanjing University, Nanjing 210023, China
2Laboratory of Data Intelligence and Cross Innovation, Nanjing University, Nanjing 210023, China
全文: PDF (22765 KB)   HTML ( 17
输出: BibTeX | EndNote (RIS)      
摘要 

目的】 精准识别与提取学术期刊文献中的图表,促进学术图表的传播和交流。【方法】 在YOLOv5算法中引入ECA通道注意力模块,并优化PAN模块为BiFPN,随机抽样13个学科门类1 300篇学术期刊文献作为实验数据,利用poppler-0.68.0将其转换为高质量的图片,并基于该数据集验证新算法性能。【结果】 相较于次优值,新算法F1值提高1.99个百分点,达到99.88%。【局限】 数据标注范围与数量有待扩大,可覆盖至更多场景。【结论】 基于YOLOv5-ECA-BiFPN的学术期刊文献图表识别与提取方法能够有效提高特殊场景下的图表识别与提取效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
李英群
李亚菲
裴雷
胡志伟
宋宁远
关键词 学术期刊文献YOLOv5-ECA-BiFPN学术图表    
Abstract

[Objective] This paper aims to accurately identify and extract figures and tables from academic literature, which promotes the dissemination of academic achievements. [Methods] First, we introduced the ECA channel attention module into the YOLOv5 algorithm and replaced the PAN module with BiFPN. Then, we randomly chose 1300 scholarly articles from thirteen subjects as experimental data and converted them to high-quality images using poppler-0.68.0. Finally, we examined the performance of the new algorithm on this dataset. [Results] Compared with the suboptimal algorithm, the F1 value of the new model improved by 1.99% to 99.88% when applied to the dataset. [Limitations] The scope and quantity of data annotation needs to be expanded to more scenarios. [Conclusions] YOLOv5-ECA-BiFPN can effectively improve the recognition of figures and tables from academic journals.

Key wordsAcademic Journal Literature    YOLOv5-ECA-BiFPN    Academic Figures and Tables
收稿日期: 2022-09-08      出版日期: 2023-03-22
ZTFLH:  TP391 G256  
基金资助:*2022年度南京大学数据智能与交叉创新实验室研究项目的研究成果之一
通讯作者: 李亚菲,ORCID:0000-0003-1754-2300,E-mail:dg20140013@smail.nju.edu.cn。   
引用本文:   
李英群, 李亚菲, 裴雷, 胡志伟, 宋宁远. 基于YOLOv5-ECA-BiFPN的学术期刊文献图表识别与提取方法研究*[J]. 数据分析与知识发现, 2023, 7(11): 158-171.
Li Yingqun, Li Yafei, Pei Lei, Hu Zhiwei, Song Ningyuan. Identifying and Extracting Figures and Tables from Academic Literature Based on YOLOv5-ECA-BiFPN. Data Analysis and Knowledge Discovery, 2023, 7(11): 158-171.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.1026      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I11/158
Fig.1  技术框架
Fig.2  YOLOv5算法网络结构[23]
Fig.3  Focus模块切片操作
Fig.4  ECA模块[26]
Fig.5  FPN、PAN和BiFPN结构[27]
Fig.6  学术图表识别与提取数据集
Fig.7  LabelImage标注示例
Fig.8  标签数据集信息分布
类别 参数
操作系统 Windows 10
CPU Intel Core i7 9700K
GPU
固态硬盘
Python
CUDA
PyTorch
OpenCV
NVIDIA GeForce RTX2080Ti
500GB
Python 3.8
CUDA 11.3
PyTorch 1.11
OpenCV 4.3.2
Table 1  实验环境配置
Fig.9  损失函数曲线
Fig.10  模型训练结果
算法 mAP/% F1/% Precision/% Recall/%
Faster R-CNN 92.35 93.24 92.83 93.65
SSD 95.28 97.89 96.58 99.23
YOLOv3 93.04 93.70 93.16 94.25
YOLOv4 93.25 94.29 93.87 94.72
YOLOv5-ECA-BiFPN 99.47 99.88 99.84 99.93
Table 3  基线算法性能对比
Fig.11  噪声图像干扰下的识别效果
Fig.12  低像素图像识别效果
Fig.13  不均匀分布图像识别效果
Fig.14  多样化图像识别效果
Fig.15  多类型图像提取结果
Fig.16  语义偏差图表提取效果
Fig.17  矢量图像识别效果对比
Fig.18  表格识别效果
[1] 丁培. 学术图表知识发现技术框架及研究进展[J]. 图书情报工作, 2021, 65(23): 136-148.
doi: 10.13266/j.issn.0252-3116.2021.23.015
[1] (Ding Pei. The Technical Framework and Research Progress of Knowledge Discovery in Academic Figures and Tables[J]. Library and Information Service, 2021, 65(23): 136-148.)
doi: 10.13266/j.issn.0252-3116.2021.23.015
[2] Liu Y L, Si C K, Jin K, et al. FCENet: An Instance Segmentation Model for Extracting Figures and Captions From Material Documents[J]. IEEE Access, 2020, 9: 551-564.
doi: 10.1109/Access.6287639
[3] Clark C, Divvala S. PDFFigures 2.0: Mining Figures from Research Papers[C]// Proceedings of 2016 IEEE/ACM Joint Conference on Digital Libraries. 2016: 143-152.
[4] 于丰畅, 陆伟. 一种学术文献图表位置标注数据集构建方法[J]. 数据分析与知识发现, 2020, 4(6): 35-42.
[4] (Yu Fengchang, Lu Wei. Constructing Data Set for Location Annotations of Academic Literature Figures and Tables[J]. Data Analysis and Knowledge Discovery, 2020, 4(6): 35-42.)
[5] Glyph & Cog. Xpdf[EB/OL]. [2022-09-13]. http://www.xp.dfreader.com.
[6] Choudhury S R, Giles C L. An Architecture for Information Extraction from Figures in Digital Libraries[C]// Proceedings of the 24th International Conference on World Wide Web. 2015: 667-672.
[7] Simon A, Pret J C, Johnson A P. A Fast Algorithm for Bottom-Up Document Layout Analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(3): 273-277.
doi: 10.1109/34.584106
[8] Apache Software Foundation. Apache PDFBox[EB/OL]. [2022-05-13]. https://pdfbox.apache.org.
[9] Yusuke S. PDFMiner[EB/OL]. [2022-09-13]. https://github.com/euske/pdfminer.
[10] Hassan T. Object-Level Document Analysis of PDF Files[C]// Proceedings of the 9th ACM Symposium on Document Engineering. 2009: 47-55.
[11] 于丰畅, 程齐凯, 陆伟. 基于几何对象聚类的学术文献图表定位研究[J]. 数据分析与知识发现, 2021, 5(1): 140-149.
[11] (Yu Fengchang, Cheng Qikai, Lu Wei. Locating Academic Literature Figures and Tables with Geometric Object Clustering[J]. Data Analysis and Knowledge Discovery, 2021, 5(1): 140-149.)
[12] Praczyk P A, Nogueras-Iso J. Automatic Extraction of Figures from Scientific Publications in High-Energy Physics[J]. Information Technology and Libraries, 2013, 32(4): 25-52.
doi: 10.6017/ital.v32i4.3670
[13] Li P Y, Jiang X Y, Shatkay H. Figure and Caption Extraction from Biomedical Documents[J]. Bioinformatics, 2019, 35(21): 4381-4388.
doi: 10.1093/bioinformatics/btz228 pmid: 30949681
[14] Siegel N, Lourie N, Power R, et al. Extracting Scientific Figures with Distantly Supervised Neural Networks[C]// Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries. 2018: 223-232.
[15] Li P Y, Jiang X Y, Shatkay H. Figure and Caption Extraction from Biomedical Documents[J]. Bioinformatics, 2019, 35(21): 4381-4388.
doi: 10.1093/bioinformatics/btz228 pmid: 30949681
[16] Chen K, Seuret M, Liwicki M, et al. Page Segmentation of Historical Document Images with Convolutional Autoencoders[C]// Proceedings of the 13th International Conference on Document Analysis and Recognition. 2015: 1011-1015.
[17] Amin A, Shiu R. Page Segmentation and Classification Utilizing Bottom-Up Approach[J]. International Journal of Image and Graphics, 2001, 1(2): 345-361.
doi: 10.1142/S0219467801000219
[18] Mehri M, Héroux P, Gomez-Krämer P, et al. Texture Feature Benchmarking and Evaluation for Historical Document Image Analysis[J]. International Journal on Document Analysis and Recognition, 2017, 20(1): 1-35.
doi: 10.1007/s10032-016-0278-y
[19] Ha J, Haralick R M, Phillips I T. Recursive X-Y Cut Using Bounding Boxes of Connected Components[C]// Proceedings of the 3rd International Conference on Document Analysis and Recognition. 1995: 952-955.
[20] 张建东, 陈仕吉, 徐小婷, 等. 基于词向量的PDF表格抽取研究[J]. 数据分析与知识发现, 2021, 5(8): 34-44.
[20] (Zhang Jiandong, Chen Shiji, Xu Xiaoting, et al. Extracting PDF Tables Based on Word Vectors[J]. Data Analysis and Knowledge Discovery, 2021, 5(8): 34-44.)
[21] Hassan T, Baumgartner R. Table Recognition and Understanding from PDF Files[C]// Proceedings of the 9th International Conference on Document Analysis and Recognition. 2007: 1143-1147.
[22] 唐锐, 邓建新, 叶志兴, 等. PDF文件的表格抽取研究综述[J]. 计算机应用与软件, 2021, 38(7): 1-7.
[22] (Tang Rui, Deng Jianxin, Ye Zhixing, et al. Survey of Table Extraction in PDF Documents[J]. Computer Applications and Software, 2021, 38(7): 1-7.)
[23] Ultralytics. YOLOv5[OL].[2022-11-12]. https://github.com/ultralytics/yolov5.
[24] Lin T Y, Dollár P, Girshick R, et al. Feature Pyramid Networks for Object Detection[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017: 936-944.
[25] Liu S, Qi L, Qin H F, et al. Path Aggregation Network for Instance Segmentation[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 8759-8768.
[26] Wang Q L, Wu B G, Zhu P F, et al. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 11531-11539.
[27] Tan M X, Pang R M, Le Q V. EfficientDet: Scalable and Efficient Object Detection[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10778-10787.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn