Identifying and Extracting Figures and Tables from Academic Literature Based on YOLOv5-ECA-BiFPN
Li Yingqun1,2,Li Yafei1,2(),Pei Lei1,2,Hu Zhiwei1,2,Song Ningyuan1,2
1School of Information Management, Nanjing University, Nanjing 210023, China 2Laboratory of Data Intelligence and Cross Innovation, Nanjing University, Nanjing 210023, China
[Objective] This paper aims to accurately identify and extract figures and tables from academic literature, which promotes the dissemination of academic achievements. [Methods] First, we introduced the ECA channel attention module into the YOLOv5 algorithm and replaced the PAN module with BiFPN. Then, we randomly chose 1300 scholarly articles from thirteen subjects as experimental data and converted them to high-quality images using poppler-0.68.0. Finally, we examined the performance of the new algorithm on this dataset. [Results] Compared with the suboptimal algorithm, the F1 value of the new model improved by 1.99% to 99.88% when applied to the dataset. [Limitations] The scope and quantity of data annotation needs to be expanded to more scenarios. [Conclusions] YOLOv5-ECA-BiFPN can effectively improve the recognition of figures and tables from academic journals.
李英群, 李亚菲, 裴雷, 胡志伟, 宋宁远. 基于YOLOv5-ECA-BiFPN的学术期刊文献图表识别与提取方法研究*[J]. 数据分析与知识发现, 2023, 7(11): 158-171.
Li Yingqun, Li Yafei, Pei Lei, Hu Zhiwei, Song Ningyuan. Identifying and Extracting Figures and Tables from Academic Literature Based on YOLOv5-ECA-BiFPN. Data Analysis and Knowledge Discovery, 2023, 7(11): 158-171.
(Ding Pei. The Technical Framework and Research Progress of Knowledge Discovery in Academic Figures and Tables[J]. Library and Information Service, 2021, 65(23): 136-148.)
doi: 10.13266/j.issn.0252-3116.2021.23.015
[2]
Liu Y L, Si C K, Jin K, et al. FCENet: An Instance Segmentation Model for Extracting Figures and Captions From Material Documents[J]. IEEE Access, 2020, 9: 551-564.
doi: 10.1109/Access.6287639
[3]
Clark C, Divvala S. PDFFigures 2.0: Mining Figures from Research Papers[C]// Proceedings of 2016 IEEE/ACM Joint Conference on Digital Libraries. 2016: 143-152.
(Yu Fengchang, Lu Wei. Constructing Data Set for Location Annotations of Academic Literature Figures and Tables[J]. Data Analysis and Knowledge Discovery, 2020, 4(6): 35-42.)
Choudhury S R, Giles C L. An Architecture for Information Extraction from Figures in Digital Libraries[C]// Proceedings of the 24th International Conference on World Wide Web. 2015: 667-672.
[7]
Simon A, Pret J C, Johnson A P. A Fast Algorithm for Bottom-Up Document Layout Analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(3): 273-277.
doi: 10.1109/34.584106
(Yu Fengchang, Cheng Qikai, Lu Wei. Locating Academic Literature Figures and Tables with Geometric Object Clustering[J]. Data Analysis and Knowledge Discovery, 2021, 5(1): 140-149.)
[12]
Praczyk P A, Nogueras-Iso J. Automatic Extraction of Figures from Scientific Publications in High-Energy Physics[J]. Information Technology and Libraries, 2013, 32(4): 25-52.
doi: 10.6017/ital.v32i4.3670
[13]
Li P Y, Jiang X Y, Shatkay H. Figure and Caption Extraction from Biomedical Documents[J]. Bioinformatics, 2019, 35(21): 4381-4388.
doi: 10.1093/bioinformatics/btz228
pmid: 30949681
[14]
Siegel N, Lourie N, Power R, et al. Extracting Scientific Figures with Distantly Supervised Neural Networks[C]// Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries. 2018: 223-232.
[15]
Li P Y, Jiang X Y, Shatkay H. Figure and Caption Extraction from Biomedical Documents[J]. Bioinformatics, 2019, 35(21): 4381-4388.
doi: 10.1093/bioinformatics/btz228
pmid: 30949681
[16]
Chen K, Seuret M, Liwicki M, et al. Page Segmentation of Historical Document Images with Convolutional Autoencoders[C]// Proceedings of the 13th International Conference on Document Analysis and Recognition. 2015: 1011-1015.
[17]
Amin A, Shiu R. Page Segmentation and Classification Utilizing Bottom-Up Approach[J]. International Journal of Image and Graphics, 2001, 1(2): 345-361.
doi: 10.1142/S0219467801000219
[18]
Mehri M, Héroux P, Gomez-Krämer P, et al. Texture Feature Benchmarking and Evaluation for Historical Document Image Analysis[J]. International Journal on Document Analysis and Recognition, 2017, 20(1): 1-35.
doi: 10.1007/s10032-016-0278-y
[19]
Ha J, Haralick R M, Phillips I T. Recursive X-Y Cut Using Bounding Boxes of Connected Components[C]// Proceedings of the 3rd International Conference on Document Analysis and Recognition. 1995: 952-955.
(Zhang Jiandong, Chen Shiji, Xu Xiaoting, et al. Extracting PDF Tables Based on Word Vectors[J]. Data Analysis and Knowledge Discovery, 2021, 5(8): 34-44.)
[21]
Hassan T, Baumgartner R. Table Recognition and Understanding from PDF Files[C]// Proceedings of the 9th International Conference on Document Analysis and Recognition. 2007: 1143-1147.
Lin T Y, Dollár P, Girshick R, et al. Feature Pyramid Networks for Object Detection[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017: 936-944.
[25]
Liu S, Qi L, Qin H F, et al. Path Aggregation Network for Instance Segmentation[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 8759-8768.
[26]
Wang Q L, Wu B G, Zhu P F, et al. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 11531-11539.
[27]
Tan M X, Pang R M, Le Q V. EfficientDet: Scalable and Efficient Object Detection[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10778-10787.