|
|
Constructing Data Set for Location Annotations of Academic Literature Figures and Tables |
Yu Fengchang,Lu Wei() |
School of Information Management, Wuhan University, Wuhan 430072, China |
|
|
Abstract [Objective] This study proposes a size-adaptive template matching algorithm to quickly construct large-scale data set for academic literature figure and table positions. [Methods] First, we used the PubMed Open Access database to retrieve documents with figure/table images, and parsed their contents. Then, we matched document pages and pictures to extract their features. Finally, we identified the figure/table positions based on matched feature points. [Results] The proposed method’s precision and F1 value reached 98.87% and 97.44%, respectively. [Limitations] We only used simple keywords to match literature pages and figure/table pictures. [Conclusions] ;The proposed algorithm could quickly construct data set for chart positions in academic literature.
|
Received: 13 December 2019
Published: 23 April 2020
|
|
Corresponding Authors:
Lu Wei
E-mail: weilu@whu.edu.cn
|
[1] |
胡蓉, 唐振贵, 赵宇翔, 等. 文内视觉资源的分析框架与计量探索[J]. 情报学报, 2017,36(2):141-151.
|
[1] |
( Hu Rong, Tang Zhengui, Zhao Yuxiang, et al. Integrated Framework and Visual Knowledgometrics Exploration for Analyzing Visual Resources in Academic Literature[J]. Journal of the China Society for Scientific and Technical Information, 2017,36(2):141-151.)
|
[2] |
AJE Scholar . Scholarly Publishing in 2016: A Look Back at Global and National Trends in Research Publication [R/OL]. [2019-09-08]. https://www.aje.com/arc/scholarly-publishing-trends-2016/.
|
[3] |
方浩, 尚媛媛, 张锐, 等. 数据新闻中信息图表的阅读效果:来自眼动的证据[J]. 图书情报工作, 2019,63(8):74-86.
|
[3] |
( Fang Hao, Shang Yuanyuan, Zhang Rui, et al. Research on Reading Effect of the Information Chart in the Data News: Evidence from the Eye Movement[J]. Library and Information Service, 2019,63(8):74-86.)
|
[4] |
Cabanac G, Hubert G, Hartley J. Solo Versus Collaborative Writing: Discrepancies in the Use of Tables and Graphs in Academic Articles[J]. Journal of the Association for Information Science and Technology, 2014,65(4):812-820.
doi: 10.1002/asi.23014
|
[5] |
Lee P S, West J D, Howe B. Viziometrics: Analyzing Visual Information in the Scientific Literature[J]. IEEE Transactions on Big Data, 2016,4(1):117-129.
doi: 10.1109/TBDATA.2017.2689038
|
[6] |
Apostolova E, You D, Xue Z, et al. Image Retrieval from Scientific Publications: Text and Image Content Processing to Separate Multipanel Figures[J]. Journal of the American Society for Information Science and Technology, 2013,64(5):893-908.
doi: 10.1002/asi.2013.64.issue-5
|
[7] |
Splendiani B, Ribera M. How to Textually Describe Images in Medical Academic Publications [C]//Proceedings of the XV International Conference on Human Computer Interaction. 2014.
|
[8] |
Ha J, Haralick RM, Phillips IT. Recursive XY Cut Using Bounding Boxes of Connected Components [C]//Proceedings of the 3rd International Conference on Document Analysis and Recognition. 1995.
|
[9] |
Bloomberg D S. Multiresolution Morphological Approach to Document Image Analysis [C]//Proceedings of the International Conference on Document Analysis and Recognition. 1991.
|
[10] |
Clark C, Divvala S. PDFFigures 2.0: Mining Figures from Research Papers[C]// Proceedings of the 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL). 2016.
|
[11] |
于丰畅, 陆伟. 基于机器视觉的PDF学术文献结构识别[J]. 情报学报, 2019,38(4):54-60.
|
[11] |
( Yu Fengchang, Lu Wei. Structural Recognition of PDF Academic Literature Based on Computer Vision[J]. Journal of the China Society for Scientific and Technical Information, 2019,38(4):54-60.)
|
[12] |
Choudhury S R, Mitra P, Giles C L. Automatic Extraction of Figures from Scholarly Documents [C]//Proceedings of the 2015 ACM Symposium on Document Engineering, New York, USA: ACM, 2015: 47-50.
|
[13] |
Rastan R, Paik H Y, Shepherd J. TEXUS: A Unified Framework for Extracting and Understanding Tables in PDF Documents[J]. Information Processing & Management, 2019,56(3):895-918.
doi: 10.1016/j.ipm.2019.01.008
|
[14] |
He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
|
[15] |
Siegel N, Lourie N, Power R, et al. Extracting Scientific Figures with Distantly Supervised Neural Networks[C]// Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries. 2018.
|
[16] |
Li P, Jiang X, Shatkay H. Extracting Figures and Captions from Scientific Publications [C]//Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018.
|
[17] |
Perez-Arriaga M O, Estrada T, Abad-Mota S. TAO: System for Table Detection and Extraction from PDF Documents [C]//Proceedings of the 29th International Flairs Conference. 2016.
|
[18] |
Mesbah S, Fragkeskos K, Lofi C, et al. Semantic Annotation of Data Processing Pipelines in Scientific Publications[C]//Proceedings of the 14th International Conference on the Semantic Web(ESWC). 2017.
|
[19] |
Zech J, Pain M, Titano J, et al. Natural Language-based Machine Learning Models for the Annotation of Clinical Radiology Reports[J]. Radiology, 2018,287(2):570-580.
doi: 10.1148/radiol.2018171093
pmid: 29381109
|
[20] |
Remez T, Huang J, Brown M. Learning to Segment via Cut-And-Paste[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018.
|
[21] |
Liljekvist M S, Andresen K, Pommergaard H C, et al. For 481 Biomedical Open Access Journals, Articles are Not Searchable in the Directory of Open Access Journals Nor in Conventional Biomedical Databases[J]. PeerJ, 2015,3(5):e972.
doi: 10.7717/peerj.972
|
[22] |
Hanebeck U D. Template Matching Using Fast Normalized Cross Correlation [C]//Proceedings of SPIE: Optical Pattern Recognition XII. 2001.
|
[23] |
Lowe D G. Distinctive Image Features from Scale-Invariant Keypoints[J]. International Journal of Computer Vision, 2004,60(2):91-110.
doi: 10.1023/B:VISI.0000029664.99615.94
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|