Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (1): 21-28    DOI: 10.11925/infotech.2096-3467.2017.1091
Orginal Article Current Issue | Archive | Adv Search |
Identifying Actual Value of Numerical Indicator from Scientific Paper
Guo Shaoqing1,2, Le Xiaoqiu1()
1(National Science Library, Chinese Academy of Sciences, Beijing 100190, China)
2 (University of Chinese Academy of Sciences, Beijing 100049, China)
Download: PDF (642 KB)   HTML ( 1
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper aims to identify the actual value of numerical indicators from the scientific literatures. [Methods] Firstly, we analyzed the Shortest-Path-Tree between the indicator and the digital entities. Then, we used by distant supervision to learn the syntactic and description characteristics of the numerical indicator sentence. Third, we created four types of relationship templates of “more than”, “less than”, “equal” and “times”. Finally, we obtained the real value of these indicators. [Results] We examined the proposed method in the fields of climate changes and astronomy. The F-values were 82.35% and 77.55%, which were above the average of related studies. [Limitations] We did not investigate the indicator real value across multiple sentences. [Conclusions] The proposed method could help us obtain the actual value of numerical indicators effectively.

Key wordsNumerical Indicator      Actual Value      Template Recognition      Distant Supervision     
Received: 05 November 2017      Published: 05 February 2018
ZTFLH:  G250.76  

Cite this article:

Guo Shaoqing,Le Xiaoqiu. Identifying Actual Value of Numerical Indicator from Scientific Paper. Data Analysis and Knowledge Discovery, 2018, 2(1): 21-28.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.1091     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I1/21

取值关
系类型
实例
等于关系 …annual precipitation measured in this study is 734mm…
大于关系 …temperature has risen by about 5 ℃ above yesterday…
小于关系 …CO2 concentration is 5% lower than the PM10 concentration…
倍数关系 …capacity of this bottle is 2/3 of the other one…
JJR(词性) BRB(词性) as…as(词组) Of NN(词性+词组)
Above Over Below Under
Twice Thrice Half More
Before Behind Ahead ……
类型 取值关系 换算关系
大于类型 %、times等倍数单位 Baseline entity × ( 1 + value unit )
其他单位 ( Baseline entity + value ) unit
小于类型 %、times等倍数单位 Baseline entity × ( 1 - value unit )
其他单位 ( Baseline entity - value ) unit
倍数/分数类型 所有单位 Baseline entity × value [%]
等于类型 所有单位 Value unit
指标 单位 指标 单位
Mass median diameter mm Survival rate %
Vechicle speed kmh-1 Total weight kg
Scattering angle ° ……
模板 频次 支持度 模板 频次 支持度
NN|NP|PP[of]|NP|CD 1591 65.61% NN|NP|VP[be]|PP|NP|CD 766 53.75%
NN|NP|PP[between]|NP|CD 228 9.41% NN|NP|VP|PP[from]|NP|CD 510 35.79%
流程 正确率 召回率 F值
(1) 原识别流程 78.15% 81.21% 79.11%
(2) 将子句判断加入(1)中 85.31% 75.62% 80.18%
(3) 将常用模板加入 (1)(2)中 84.01% 80.76% 82.35%
[1] Maiya A S, Visser D, Wan A.Mining Measured Information from Text[C]//Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile. New York, USA: ACM, 2015.
[2] Santos A, Nogueira R, Lourenco A.Applying a Text Mining Framework to the Extraction of Numerical Parameters from Scientific Literature in the Biotechnology Domain[J]. Advances in Distributed Computing & Artificial Intelligence Journal, 2012(S1): 1-8.
doi: 10.14201/ADCAIJ20121118
[3] 毋菲. 数值信息的抽取方法研究[D]. 太原: 山西大学, 2010.
[3] (Wu Fei.Research on Value Extraction from Chinese Text[D]. Taiyuan: Shanxi University, 2010.)
[4] Sarker A.Automated Extraction of Number of Subjects in Randomised Controlled Trials[L]. ArXiv Preprint, arXiv: 1606.07137.
[5] Sarath P R, Mandhan S, Niwa Y.Numerical Atrribute Extraction from Clinical Texts[L]. ArXiv Preprint, arXiv: 1602.00269.
[6] Murata M, Ma Q, Torisawa K, et al.Extraction and Visualization of Numerical and Named Entity Information from a Large Number of Documents[C]//Proceedings of the 2008 International Conference on Natural Language Processing and Knowledge Engineering, Beijing, China. New York, USA: IEEE, 2009:1-8.
[7] 杨少华, 林海略, 韩燕波. 针对模板生成网页的一种数据自动抽取方法[J]. 软件学报, 2008, 19(2): 209-223.
doi: 10.3724/SP.J.1001.2008.00209
[7] (Yang Shaohua, Lin Hailue, Yanbo. Automatic Data Extraction from Template- Generated Web Pages[J]. Journal of Software, 2008, 19(2): 209-223.)
doi: 10.3724/SP.J.1001.2008.00209
[8] Madaan A, Mittal A, Ramakrishnan G, et al.Numerical Relation Extraction with Minimal Supervision[C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence.USA: AAAI Press, 2016: 2764-2771.
[9] 吴胜, 刘茂福, 胡慧君, 等. 中文文本中实体数值型关系无监督抽取方法[J]. 武汉大学学报:理学版, 2016, 62(6): 552-560.
doi: 10.14188/j.1671-8836.2016.06.011
[9] (Wu Sheng, Liu Maofu, Hu Huijun, et al.Unsupervised Extraction of Attribute-Value Entity Relation from Chinese Texts[J]. Journal of Wuhan University: National Science Edition, 2016, 62(6): 552-560.)
doi: 10.14188/j.1671-8836.2016.06.011
[10] Lee T, Wang Z, Wang H, et al.Attribute Extraction and Scoring: A Probabilistic Approach[C]//Proceedings of the 29th International Conference on Data Engineering, Brisbane, QLD, Australia. USA: IEEE, 2013: 194-205.
[11] Davidov D, Rappoport A.Extraction and Approximation of Numerical Attributes from the Web[C]// Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010: 1308-1317.
[12] Chaganty A T, Liang P.How Much is 131 Million Dollars? Putting Numbers in Perspective with Compositional Descriptions[C] //Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 578-587.
[13] Mintz M, Bills S, Snow R, et al.Distant Supervision for Relation Extraction Without Labeled Data[C]// Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics. 2009.
[14] Aho A V.Efficient String Matching: An Aid to Bibliographic Search[J]. Communications of the ACM, 1975, 18(6): 333-340.
doi: 10.1145/360825.360855
[15] Zhang M, Zhang J, Su J.Exploring Syntactic Features for Relation Extraction Using a Convolution Tree Kernel[C]// Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 2006.
[16] Jindal N, Liu B.Identifying Comparative Sentences in Text Documents[C]// Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2006: 244-251.
[17] Maguire A J, Kolian M, Rosseel K, et al.Climate Change Indicators in the United States [EB/OL]. [2017-09-11]..
[18] Manning C D, Surdeanu M, Bauer J, et al.The Stanford CoreNLP Natural Language Processing Toolkit[C]// Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations.2014.
[19] 吴超, 郑彦宁, 化柏林. 数值信息抽取研究进展综述[J]. 中国图书馆学报, 2014, 40(2): 107-119.
[19] (Wu Chao, Zheng Yanning, Hua Bolin.Numerical Information Extraction: A Review of Research[J]. Journal of Library Science in China, 2014, 40(2): 107-119.)
[1] Sun Yi'nan, Ku Liping, Song Xiufang, Liu Jingjing, Jiang Xian. The Policy Research and Analysis of Subject Data Repository ——Cases Study of Life Sciences[J]. 现代图书情报技术, 2015, 31(12): 13-20.
[2] Zhu Guang. Copyright Protection Scheme of Color Images for Libraries, Museums and Archives Based on Zero-Watermarking[J]. 现代图书情报技术, 2015, 31(12): 89-94.
[3] Liu Dan. Personalized Book Recommender Service Deployment Using Apache Mahout[J]. 现代图书情报技术, 2015, 31(10): 102-108.
[4] Wang Ying, Wu Zhenxin, Xie Jing. Review on Semantic Retrieval System for Scientific Literature[J]. 现代图书情报技术, 2015, 31(5): 1-7.
[5] Wu Yue, Zhou Yigang, Cui Haiyuan, Nie Hua. Peking University Library Website Redesign Based on Usability Study[J]. 现代图书情报技术, 2014, 30(11): 88-94.
[6] Yao Xiaona, Zhu Zhongming, Lu Linong, Liu Wei, Zhang Wangqiang. Research on Data Synchronization of OAI Interoperability of Institutional Repository[J]. 现代图书情报技术, 2014, 30(3): 14-18.
[7] Wu Kun, Xie Xiaqing, Wu Xu. Design and Implementation of Trustworthiness Validation in Cloud Library Virtualized Environment[J]. 现代图书情报技术, 2014, 30(3): 35-41.
[8] Zhang Wangqiang, Zhu Zhongming, Lu Linong. Comparative Analysis of Several Typical New Open Source Institutional Repository Software[J]. 现代图书情报技术, 2014, 30(2): 17-24.
[9] Wang Feng, Wei Feng, Liu Yi, Zhou Hong, Zhao De. Application of Open Source Search Engine Solr to Build Standards Information Management and Analysis Platform[J]. 现代图书情报技术, 2014, 30(2): 92-98.
[10] Yao Xiaona, Zhu Zhongming, Wang Sili. Research on Automatic Semantic Annotation for Geosciences[J]. 现代图书情报技术, 2013, (4): 48-53.
[11] Ma Ningning, Li Chao, Qu Yunpeng. Design and Implementation of an Automatic Obsolescence Management System for Digital Preservation[J]. 现代图书情报技术, 2013, (4): 69-76.
[12] Ma Yumeng, Zhu Zhongming. Research on Representative Semantic Models for Linking and Organizing Digital Objects[J]. 现代图书情报技术, 2013, 29(1): 1-7.
[13] Huang Yongwen, Qian Li. Research on Information Retrieval Service Towards Linked Data[J]. 现代图书情报技术, 2012, (12): 2-8.
[14] Li Chunwang, Fei Dayu, Zhou Qiang. Study on Mashup Workflow Engine[J]. 现代图书情报技术, 2012, (12): 27-31.
[15] Niu Yazhen, Zhu Zhongming. A Linked Data-driven Semantic User Modeling Framework for Personalization Service[J]. 现代图书情报技术, 2012, (10): 1-7.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn