Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (10): 21-31    DOI: 10.11925/infotech.2096-3467.2017.0491
Orginal Article Current Issue | Archive | Adv Search |
Summarizing Figures of Chinese Scholarly Articles of Library and Information Science
Bao Chuhan1, Jia Danping1, He Lin1,2(), Ma Xiaowen1, Ai Yuxi1
1College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
2Research Center for Correlation of Domain Knowledge, Nanjing Agricultural University, Nanjing 210095, China
Download: PDF (882 KB)   HTML ( 2
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper studies the figures of Chinese articles in the field of library and information science (LIS), aiming to establish new principles to summarize them. [Methods] We proposed the framework and rules for figure summarization based on manual indexing and features of LIS papers. Then, we evaluated the performance of the new system with the help of SPSS. [Results] Compared with the existing figure-text model, our method could more effectively process information from the figures. [Limitations] We need to extract more information from the figures, analyze the influences of different charts, and add automatic indexing functions to the new system. [Conclusions] The proposed method could effectively summarize figures from the scholarly articles.

Key wordsFigure Indexing      Abstract in Chinese      Likert Scale     
Received: 31 May 2017      Published: 08 November 2017
ZTFLH:  G25  

Cite this article:

Bao Chuhan,Jia Danping,He Lin,Ma Xiaowen,Ai Yuxi. Summarizing Figures of Chinese Scholarly Articles of Library and Information Science. Data Analysis and Knowledge Discovery, 2017, 1(10): 21-31.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.0491     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I10/21

图表-文本组合模式 信息
理解程度
图片
理解效率
信息
覆盖率
确信度
图片+标题
图片+标题+摘要
图片+全文
图片+图表摘要
因变量 总评 确信度
III型平方和 df 均方 F Sig. III型平方和 df 均方 F Sig.
校正模型 1200.014 3 400.005 318.681 0 163.569 3 54.523 20.365 0
截距 57355.779 1 57355.779 45694.903 0 64489.341 1 64489.341 24087.159 0
类型 1200.014 3 400.005 318.681 0 163.569 3 54.523 20.365 0
误差 1501.207 1196 1.255 3202.090 1196 2.677
总计 60057.000 1200 67855.000 1200
校正的总计 2701.221 1199 3365.659 1199
图表-文本组合模式 总评 确信度
图片+标题 5.38±1.25 6.88±1.65
图片+标题+摘要 6.68±1.13 7.33±1.58
图片+图表摘要(本研究构建) 7.62±1.18 7.18±1.79
图片+全文 7.96±0.89 7.37±1.68
(I)类型 (J)类型 均值差值(I-J) 标准
误差
Sig.
图片+标题 图片+标题+摘要 -1.297 0.091 0.001
图片+全文 -2.239 0.091 0
图片+图表摘要 -2.580 0.091 0
图片+标题+摘要 图片+标题 1.297 0.091 0.001
图片+全文 -0.942 0.091 0.006
图片+图表摘要 -1.283 0.091 0
图片+全文 图片+标题 2.239 0.091 0
图片+标题+摘要 0.942 0.091 0.006
图片+全文 -0.341 0.091 1.000
图片+图表摘要 图片+标题 2.580 0.091 0
图片+标题+摘要 1.283 0.091 0
图片+全文 0.341 0.091 1.000
模式 信息理解程度 信息理解效率 信息覆盖率
I类型 J类型 均值差值(I-J) 标准
无误
Sig. 均值差值(I-J) 标准
无误
Sig. 均值差值(I-J) 标准
无误
Sig.
图片+标题 图片+标题+摘要 -1.597 0.124 0 -0.937 0.130 0.639 -1.417 0.112 0
图片+全文 -3.127 0.124 0 -0.323 0.130 1.000 -4.293 0.112 0
图片+图表摘要 -2.697 0.124 0 -2.260 0.130 0 -2.987 0.112 0
图片+标题+摘要 图片+标题 1.597 0.124 0 0.937 0.130 0.639 1.417 0.112 0
图片+全文 -1.530 0.124 0 0.613 0.130 0.639 -2.877 0.112 0
图片+图表摘要 -1.100 0.124 0.061 -1.323 0.130 0.029 -1.570 0.112 0.024
图片+全文 图片+标题 3.127 0.124 0 0.323 0.130 1.000 4.293 0.112 0
图片+标题+摘要 1.530 0.124 0 -0.613 0.130 0.639 2.877 0.112 0
图片+图表摘要 0.430 0.124 0.139 -1.937 0.130 0 1.307 0.112 0
图片+图表摘要 图片+标题 2.697 0.124 0 2.260 0.130 0 2.987 0.112 0
图片+标题+摘要 1.100 0.124 0.061 1.323 0.130 0.029 1.570 0.112 0.024
图片+全文 -0.430 0.124 0.139 1.937 0.130 0 -1.307 0.112 0
[1] Kim D, Yu H.Figure Text Extraction in Biomedical Literature[J]. PLoS One, 2011, 6(1): e15338.
doi: 10.1371/journal.pone.0015338 pmid: 21249186
[2] Yu H, Lee M.Accessing Bioscience Images from Abstract Sentences[J]. Bioinformatics, 2006, 22(14): 547-556.
doi: 10.1093/bioinformatics/btl261
[3] Agarwal S, Yu H.Figure Summarizer Browser Extensions for PubMed Central[J]. Bioinformatics, 2011, 27(12): 1723-1724.
doi: 10.1093/bioinformatics/btr194
[4] Futrelle R P.Handling Figures in Document Summarization Abstract[C]//Proceedings of Meeting of the Association for Computational Linguistics. 2004.
[5] Luhn H P.The Automatic Creation of Literature Abstracts[J]. IBM Journal of Research and Development, 1958, 2(2): 159-165.
doi: 10.1147/rd.22.0159
[6] Nakov P I, Schwartz A S, Hearst M A.Citances: Citation Sentences for Semantic Analysis of Bioscience Text[C]// Proceedings of the SIGIR’04 Workshop on Search and Discovery in Bioinformatics. 2004.
[7] 周浪, 张亮, 冯冲, 等. 基于词频分布变化统计的术语抽取方法[J].计算机科学, 2009, 36(5): 177-180.
doi: 10.3969/j.issn.1002-137X.2009.05.045
[7] (Zhou Lang, Zhang Liang, Feng Chong, et al.Terminology Extraction Based on Statistical Word Frequency Distribution Variety[J]. Computer Science, 2009, 36(5): 177-180.)
doi: 10.3969/j.issn.1002-137X.2009.05.045
[8] Hirao T, Isozaki H, Maeda E, et al.Extracting Important Sentences with Support Vector Machines[C]//Proceedings of the 19th International Conference on Computational Linguistics. 2002: 1-7.
[9] 张帆, 乐小虬. 面向领域科技文献的句子级创新点抽取研究[J]. 现代图书情报技术, 2014(9): 15-21.
[9] (Zhang Fan, Le Xiaoqiu.Research on Innovation Points Extraction from Scientific Research Paper Based on Field Thesaurus[J].New Technology of Library and Information Service, 2014(9): 15-21.)
[10] Brunn M, Chali Y, Pinchak C.Text Summarization Using Lexical Chains[C]//Proceedings of the Document Understanding Conference, 2001: 135-140.
[11] 王芳, 史海燕, 纪雪梅. 我国情报学研究中理论的应用: 基于《情报学报》的内容分析[J]. 情报学报, 2015, 34(6): 581-591.
doi: 10.3772/j.issn.1000-0135.2015.006.003
[11] (Wang Fang, Shi Haiyan, Ji Xuemei.The Use of Theory in Chinese Information Science Research Based on the Content Analysis of the Journal of the China Society for Scientific and Technical Information[J]. Journal of the China Society for Scientific and Technical Information, 2015, 34(6): 581-591.)
doi: 10.3772/j.issn.1000-0135.2015.006.003
[12] Dahl T.Contributing to the Academic Conversation: A Study of New Knowledge Claims in Economics and Linguistics[J]. Journal of Pragmatics, 2008, 40(7): 1184-1201.
doi: 10.1016/j.pragma.2007.11.006
[13] Parkinson J.The Discussion Section as Argument: The Language Used to Prove Knowledge Claims[J]. English for Specific Purposes, 2011, 30(3): 164-175.
doi: 10.1016/j.esp.2011.03.001
[14] Ramesh B P, Sethi R J, Yu H.Figure-Associated Text Summarization and Evaluation[J]. PLoS One, 2015, 10(2): e0115671.
doi: 10.1371/journal.pone.0115671 pmid: 4313946
[15] Herbrich R, Graepel T, Obermayer K.Support Vector Learning for Ordinal Regression[C]//Proceedings of the 9th International Conference on Artificial Neural Networks. IET, DOI: 10.1049/cp: 19991091.
[16] 关鹏, 王曰芬, 傅柱. 不同语料下基于LDA主题模型的科学文献主题抽取效果分析[J]. 图书情报工作, 2016, 60(2): 112-121.
doi: 10.13266/j.issn.0252-3116.2016.02.018
[16] (Guan Peng, Wang Yuefen, Fu Zhu.Effect Analysis of Scientific Literature Topic Extraction Based on LDA Topic Model with Different Corpus[J]. Library and Information Service, 2016, 60(2): 112-121.)
doi: 10.13266/j.issn.0252-3116.2016.02.018
[17] Radev D R, Jing H, Styś M, et al.Centroid-based Summarization of Multiple Documents[J]. Information Processing & Management, 2004, 40(6): 919-938.
doi: 10.1016/j.ipm.2003.10.006
[18] Agarwal S, Yu H.FigSum: Automatically Generating Structured Text Summaries for Figures in Biomedical Literature[C]//Proceedings of AMIA Annual Symposium. 2009.
[19] 朱丽萍, 李洪奇, 杨中国, 等. 一种面向科技文献引言的信息抽取方法[J]. 山东大学学报: 理学版, 2015, 50(7): 23-30, 37.
[19] (Zhu Liping, Li Hongqi, Yang Zhongguo, et al.An Information Extraction Method for Scientific Literature Introduction[J]. Journal of Shandong University: Natural Science, 2015, 50(7): 23-30, 37.)
[20] 杜威, 邹先霞. 基于数据流的滑动窗口机制的研究[J]. 计算机工程与设计, 2005, 26(11): 2922-2944.
doi: 10.3969/j.issn.1000-7024.2005.11.019
[20] (Du Wei, Zou Xianxia.Research of Sliding Windows Scheme Based on Data Stream[J]. Computer Engineering and Design, 2005, 26(11): 2922-2944.)
doi: 10.3969/j.issn.1000-7024.2005.11.019
[21] Yu H, Agarwal S, Johnston M, et al.Are Figure Legends Sufficient? Evaluating the Contribution of Associated Text to Biomedical Figure Comprehension[J]. Journal of Biomedical Discovery and Collaboration, 2009, 4(1). DOI: 10.1186/1747- 5333-4-1.
doi: 10.1186/1747-5333-4-1 pmid: 19126221
[22] 方宝. Likert等级量表调查结果有效性的影响因素探析[J]. 十堰职业技术学院学报, 2009, 22(2): 25-28.
doi: 10.3969/j.issn.1008-4738.2009.02.007
[22] (Fang Bao.An Analysis of the Factors Influencing the Effectiveness of Likert Rating Scale’s Investigation Result[J]. Journal of Shiyan Technical Institute, 2009, 22(2): 25-28.)
doi: 10.3969/j.issn.1008-4738.2009.02.007
[23] Lin C Y, Hovy E.Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics[C]//Proceedings of the 2003 Conference of North American Chapter of the Association for Computational Linguistics on Human Language. 2003: 71-78.
[24] 傅间莲, 陈群秀. 一种新的自动文摘系统评价方法[J]. 计算机工程与应用, 2006(18): 176-177.
[24] (Fu Jianlian, Chen Qunxiu.A New Evaluation Method for Automatic Text Summarization[J]. Computer Engineering and Applications, 2006(18): 176-177.)
[25] Lin C Y.ROUGE: A Package for Automatic Evaluation of Summaries[C]//Proceedings of the Workshop on Text Summarization Branches out.2004: 74-81.
[1] Sheng Shu, Huang Qi, Yang Yang, Xie Qiwen, Qin Xinguo. Exchanging Chinese Medical Information Based on HL7 FHIR[J]. 数据分析与知识发现, 2021, 5(11): 13-28.
[2] Chai Qingfeng, Shi Linyan, Mei Shan, Xiong Haitao, He Huixin. Extracting Knowledge Elements of Sci-Tech Literature Based on Artificial and Machine Features[J]. 数据分析与知识发现, 2021, 5(8): 132-144.
[3] Tan Ying, Tang Yifei. Extracting Citation Contents with Coreference Resolution[J]. 数据分析与知识发现, 2021, 5(8): 25-33.
[4] Wang Qinjie, Qin Chunxiu, Ma Xubu, Liu Huailiang, Xu Cunzhen. Recommending Scientific Literature Based on Author Preference and Heterogeneous Information Network[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[5] Han Pu,Zhang Zhanpeng,Zhang Mingtao,Gu Liang. Normalizing Chinese Disease Names with Multi-feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 83-94.
[6] Li He,Liu Jiayu,Li Shiyu,Wu Di,Jin Shuaiqi. Optimizing Automatic Question Answering System Based on Disease Knowledge Graph[J]. 数据分析与知识发现, 2021, 5(5): 115-126.
[7] Li Yueyan,Wang Hao,Deng Sanhong,Wang Wei. Research Trends of Information Retrieval——Case Study of SIGIR Conference Papers[J]. 数据分析与知识发现, 2021, 5(4): 13-24.
[8] Yi Huifang,Liu Xiwen. Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[9] Wang Hongbin,Wang Jianxiong,Zhang Yafei,Yang Heng. Topic Recognition of News Reports with Imbalanced Contents[J]. 数据分析与知识发现, 2021, 5(3): 109-120.
[10] Hu Haotian,Ji Jinfeng,Wang Dongbo,Deng Sanhong. An Integrated Platform for Food Safety Incident Entities Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[11] Wang Qian,Wang Dongbo,Li Bin,Xu Chao. Deep Learning Based Automatic Sentence Segmentation and Punctuation Model for Massive Classical Chinese Literature[J]. 数据分析与知识发现, 2021, 5(3): 25-34.
[12] Shen Si,Li Qinyu,Ye Yuan,Sun Hao,Ye Wenhao. Topic Mining and Evolution Analysis of Medical Sci-Tech Reports with TWE Model[J]. 数据分析与知识发现, 2021, 5(3): 35-44.
[13] Chang Zhijun,Qian Li,Xie Jing,Wu Zhenxin,Zhang Hu,Yu Qianqian,Wang Ying,Wang Yongji. Big Data Platform for Sci-Tech Literature Based on Distributed Technology[J]. 数据分析与知识发现, 2021, 5(3): 69-77.
[14] Hu Shaohu,Zhang Yingyi,Zhang Chengzhi. Review of Keyword Extraction Studies[J]. 数据分析与知识发现, 2021, 5(3): 45-59.
[15] Liu Tong, Liu Chen, Ni Weijian. A semi-supervised Chinese sentiment analysis method based on multi-level data augmentation [J]. 数据分析与知识发现, 0, (): 1-.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn