Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (11): 158-171    DOI: 10.11925/infotech.2096-3467.2022.1026
Current Issue | Archive | Adv Search |
Identifying and Extracting Figures and Tables from Academic Literature Based on YOLOv5-ECA-BiFPN
Li Yingqun1,2,Li Yafei1,2(),Pei Lei1,2,Hu Zhiwei1,2,Song Ningyuan1,2
1School of Information Management, Nanjing University, Nanjing 210023, China
2Laboratory of Data Intelligence and Cross Innovation, Nanjing University, Nanjing 210023, China
Download: PDF (22765 KB)   HTML ( 17
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper aims to accurately identify and extract figures and tables from academic literature, which promotes the dissemination of academic achievements. [Methods] First, we introduced the ECA channel attention module into the YOLOv5 algorithm and replaced the PAN module with BiFPN. Then, we randomly chose 1300 scholarly articles from thirteen subjects as experimental data and converted them to high-quality images using poppler-0.68.0. Finally, we examined the performance of the new algorithm on this dataset. [Results] Compared with the suboptimal algorithm, the F1 value of the new model improved by 1.99% to 99.88% when applied to the dataset. [Limitations] The scope and quantity of data annotation needs to be expanded to more scenarios. [Conclusions] YOLOv5-ECA-BiFPN can effectively improve the recognition of figures and tables from academic journals.

Key wordsAcademic Journal Literature      YOLOv5-ECA-BiFPN      Academic Figures and Tables     
Received: 08 September 2022      Published: 22 March 2023
ZTFLH:  TP391 G256  
Fund:Research Project of Laboratory of Data Intelligence and Cross Innovation of Nanjing University
Corresponding Authors: Li Yafei,ORCID:0000-0003-1754-2300,E-mail:dg20140013@smail.nju.edu.cn。   

Cite this article:

Li Yingqun, Li Yafei, Pei Lei, Hu Zhiwei, Song Ningyuan. Identifying and Extracting Figures and Tables from Academic Literature Based on YOLOv5-ECA-BiFPN. Data Analysis and Knowledge Discovery, 2023, 7(11): 158-171.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.1026     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I11/158

Technical Framework
23]
">
YOLOv5 Algorithm[23]
Slicing Operation of Focus Module
26]
">
ECA Module[26]
27]
">
FPN、PAN and BiFPN[27]
Data Set
Example of Annotation
Distribution of Label Dataset
类别 参数
操作系统 Windows 10
CPU Intel Core i7 9700K
GPU
固态硬盘
Python
CUDA
PyTorch
OpenCV
NVIDIA GeForce RTX2080Ti
500GB
Python 3.8
CUDA 11.3
PyTorch 1.11
OpenCV 4.3.2
Experimental Environment
Loss Function Curve
The Training Results
算法 mAP/% F1/% Precision/% Recall/%
Faster R-CNN 92.35 93.24 92.83 93.65
SSD 95.28 97.89 96.58 99.23
YOLOv3 93.04 93.70 93.16 94.25
YOLOv4 93.25 94.29 93.87 94.72
YOLOv5-ECA-BiFPN 99.47 99.88 99.84 99.93
Performance Comparison with Baseline Algorithms
Recognition Effect Under Noise Image Interference
Recognition Effect of Low-Pixel Image
Recognition Effect of Uneven Distribution Image
Recognition Effect of Diversified Image
Extraction Results of Multi-Type Images
Recognition Effect of Semantic Deviation Image
Recognition Effect of Vector Image
Recognition Effect of Table
[1] 丁培. 学术图表知识发现技术框架及研究进展[J]. 图书情报工作, 2021, 65(23): 136-148.
doi: 10.13266/j.issn.0252-3116.2021.23.015
[1] (Ding Pei. The Technical Framework and Research Progress of Knowledge Discovery in Academic Figures and Tables[J]. Library and Information Service, 2021, 65(23): 136-148.)
doi: 10.13266/j.issn.0252-3116.2021.23.015
[2] Liu Y L, Si C K, Jin K, et al. FCENet: An Instance Segmentation Model for Extracting Figures and Captions From Material Documents[J]. IEEE Access, 2020, 9: 551-564.
doi: 10.1109/Access.6287639
[3] Clark C, Divvala S. PDFFigures 2.0: Mining Figures from Research Papers[C]// Proceedings of 2016 IEEE/ACM Joint Conference on Digital Libraries. 2016: 143-152.
[4] 于丰畅, 陆伟. 一种学术文献图表位置标注数据集构建方法[J]. 数据分析与知识发现, 2020, 4(6): 35-42.
[4] (Yu Fengchang, Lu Wei. Constructing Data Set for Location Annotations of Academic Literature Figures and Tables[J]. Data Analysis and Knowledge Discovery, 2020, 4(6): 35-42.)
[5] Glyph & Cog. Xpdf[EB/OL]. [2022-09-13]. http://www.xp.dfreader.com.
[6] Choudhury S R, Giles C L. An Architecture for Information Extraction from Figures in Digital Libraries[C]// Proceedings of the 24th International Conference on World Wide Web. 2015: 667-672.
[7] Simon A, Pret J C, Johnson A P. A Fast Algorithm for Bottom-Up Document Layout Analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(3): 273-277.
doi: 10.1109/34.584106
[8] Apache Software Foundation. Apache PDFBox[EB/OL]. [2022-05-13]. https://pdfbox.apache.org.
[9] Yusuke S. PDFMiner[EB/OL]. [2022-09-13]. https://github.com/euske/pdfminer.
[10] Hassan T. Object-Level Document Analysis of PDF Files[C]// Proceedings of the 9th ACM Symposium on Document Engineering. 2009: 47-55.
[11] 于丰畅, 程齐凯, 陆伟. 基于几何对象聚类的学术文献图表定位研究[J]. 数据分析与知识发现, 2021, 5(1): 140-149.
[11] (Yu Fengchang, Cheng Qikai, Lu Wei. Locating Academic Literature Figures and Tables with Geometric Object Clustering[J]. Data Analysis and Knowledge Discovery, 2021, 5(1): 140-149.)
[12] Praczyk P A, Nogueras-Iso J. Automatic Extraction of Figures from Scientific Publications in High-Energy Physics[J]. Information Technology and Libraries, 2013, 32(4): 25-52.
doi: 10.6017/ital.v32i4.3670
[13] Li P Y, Jiang X Y, Shatkay H. Figure and Caption Extraction from Biomedical Documents[J]. Bioinformatics, 2019, 35(21): 4381-4388.
doi: 10.1093/bioinformatics/btz228 pmid: 30949681
[14] Siegel N, Lourie N, Power R, et al. Extracting Scientific Figures with Distantly Supervised Neural Networks[C]// Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries. 2018: 223-232.
[15] Li P Y, Jiang X Y, Shatkay H. Figure and Caption Extraction from Biomedical Documents[J]. Bioinformatics, 2019, 35(21): 4381-4388.
doi: 10.1093/bioinformatics/btz228 pmid: 30949681
[16] Chen K, Seuret M, Liwicki M, et al. Page Segmentation of Historical Document Images with Convolutional Autoencoders[C]// Proceedings of the 13th International Conference on Document Analysis and Recognition. 2015: 1011-1015.
[17] Amin A, Shiu R. Page Segmentation and Classification Utilizing Bottom-Up Approach[J]. International Journal of Image and Graphics, 2001, 1(2): 345-361.
doi: 10.1142/S0219467801000219
[18] Mehri M, Héroux P, Gomez-Krämer P, et al. Texture Feature Benchmarking and Evaluation for Historical Document Image Analysis[J]. International Journal on Document Analysis and Recognition, 2017, 20(1): 1-35.
doi: 10.1007/s10032-016-0278-y
[19] Ha J, Haralick R M, Phillips I T. Recursive X-Y Cut Using Bounding Boxes of Connected Components[C]// Proceedings of the 3rd International Conference on Document Analysis and Recognition. 1995: 952-955.
[20] 张建东, 陈仕吉, 徐小婷, 等. 基于词向量的PDF表格抽取研究[J]. 数据分析与知识发现, 2021, 5(8): 34-44.
[20] (Zhang Jiandong, Chen Shiji, Xu Xiaoting, et al. Extracting PDF Tables Based on Word Vectors[J]. Data Analysis and Knowledge Discovery, 2021, 5(8): 34-44.)
[21] Hassan T, Baumgartner R. Table Recognition and Understanding from PDF Files[C]// Proceedings of the 9th International Conference on Document Analysis and Recognition. 2007: 1143-1147.
[22] 唐锐, 邓建新, 叶志兴, 等. PDF文件的表格抽取研究综述[J]. 计算机应用与软件, 2021, 38(7): 1-7.
[22] (Tang Rui, Deng Jianxin, Ye Zhixing, et al. Survey of Table Extraction in PDF Documents[J]. Computer Applications and Software, 2021, 38(7): 1-7.)
[23] Ultralytics. YOLOv5[OL].[2022-11-12]. https://github.com/ultralytics/yolov5.
[24] Lin T Y, Dollár P, Girshick R, et al. Feature Pyramid Networks for Object Detection[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017: 936-944.
[25] Liu S, Qi L, Qin H F, et al. Path Aggregation Network for Instance Segmentation[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 8759-8768.
[26] Wang Q L, Wu B G, Zhu P F, et al. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 11531-11539.
[27] Tan M X, Pang R M, Le Q V. EfficientDet: Scalable and Efficient Object Detection[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10778-10787.
No related articles found!
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn