Please wait a minute...
New Technology of Library and Information Service  2013, Vol. Issue (5): 1-20    DOI: 10.11925/infotech.1003-3513.2013.05.01
Current Issue | Archive | Adv Search |
The Conundrum of Sharing Research Data
Christine L. Borgman
UCLA Department of Information Studies, Los Angeles, CA 90095, USA
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

Researchers are producing an unprecedented deluge of data by using new methods and instrumentation. Others may wish to mine these data for new discoveries and innovations. However, research data are not readily available as sharing is common in only a few fields such as astronomy and genomics.Data sharing practices in other fields vary widely. Moreover,research data take many forms, are handled in many ways, using many approaches, and often are difficult to interpret once removed from their initial context. Data sharing is thus a conundrum. Four rationales for sharing data are examined, drawing examples from the sciences, social sciences, and humanities: (1) to reproduce or to verify research, (2) to make results of publicly funded research available to the public, (3) to enable others to ask new questions of extant data, and (4) to advance the state of research and innovation. These rationales differ by the arguments for sharing, by beneficiaries, and by the motivations and incentives of the many stakeholders involved. The challenges are to understand which data might be shared, by whom, with whom, under what conditions, why, and to what effects. Answers will inform data policy and practice.

Received: 24 April 2013      Published: 03 July 2013
:  G250  

Cite this article:

Christine L. Borgman. The Conundrum of Sharing Research Data. New Technology of Library and Information Service, 2013, (5): 1-20.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2013.05.01     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2013/V/I5/1

[1] Hey, A. J. G. & Trefethen, A. (2003). The Data Deluge: An e-Science Perspective. InBerman, F., Fox, G. & Hey, A. J. G. (Eds.). Grid Computing: Making the GlobalInfrastructure a Reality. Chichester, Wiley. Retrieved from http://www.rcuk.ac.uk/escience/documents/report_datadeluge.pdf.

[2] Community cleverness required. (2008). Nature, 455(7209): 1-1.

[3] Data’s shameful neglect. (2009). Nature, 461(7261): 145-145.

[4] Dealing with data. (2011). Science, 331(6018): 692-729.

[5] Anderson, C. (2008). The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired.http://www.wired.com/science/discoveries/magazine/16-07/pb_theory.

[6] Data, Data Everywhere. (2010). Economist: 16.

[7] The University’s Role in the Dissemination of Research and Scholarship. (2009).Association of Research Libraries.1-8. http://www.arl.org/disseminating_research_2009.

[8] Lyon, L. (2007). Dealing with Data: Roles, Rights, Responsibilities, and Relationships.UKOLN.http://www.jisc.ac.uk/whatwedo/programmes/programme_digital_repositories/project_dealing_with_data.aspx.

[9] Borgman, C. L. (2009). The Digital Future is Now: A Call to Action for the Humanities. Digital Humanities Quarterly, 3(4).http://digitalhumanities.org/dhq/vol/3/4/000077/000077.html.

[10] Hey, T., Tansley, S. & Tolle, K. (Eds.). (2009). The Fourth Paradigm: Data-IntensiveScientific Discovery. Redmond, WA: Microsoft. http://research.microsoft.com/en-us/collaboration/fourthparadigm/.

[11] Merriam-Webster’s Collegiate Dictionary. (2005). (11th ed.). Springfield, MA: Merriam-Webster.

[12] Piwowar, H. A., Becich, M. J., Bilofsky, H. & Crowley, R.S. (2008). Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. Plos Medicine, 5(9): 1315-1319.

[13] Piwowar, H. A. & Chapman, W. W. (2010). Public Sharing of Research Datasets: A Pilotstudy of Associations. Journal of Informetrics, 4(2): 148-156.

[14] Piwowar, H. A., Day, R. S. & Fridsma, D. B. (2007). Sharing Detailed Research Data is Associated with Increased Citation Rate. Plos One, 2(3).

[15] Patterns of Information Use and Exchange: Case Studies of Researchers in the Lifesciences. (2009). British Library.http://www.rin.ac.uk/ourwork/using-and-accessing-information-resources/disciplinary-case-studies-lifesciences.

[16] Cragin, M. H., Palmer, C. L., Carlson, J. R. & Witt, M. (2010).Data Sharing, Small Science and Institutional Repositories. Philosophical Transactions of the Royal Society A:Mathematical, Physical and Engineering Sciences, 368(1926): 4023-4038.

[17] Palmer, C. L., Cragin, M. H., Heidorn, P. B. & Smith, L. C. (2007).Studies of Data Curation for the Long Tail of Science.3rd International Digital Curation Conference, Washington, DC, Digital Curation Center.http://www.dcc.ac.uk/events/dcc-2007/.

[18] Wynholds, L., Fearon Jr, D. S., Borgman, C. L. & Traweek, S. (2011). When Use Cases are not Useful: Data Practices, Astronomy, and Digital Libraries Joint Conference on Digital Libraries, Ottawa, ACM. http://portal.acm.org/citation.cfm?id=1998146.

[19] Mayernik, M. S. (2011). Metadata Realities for Cyberinfrastructure: Data Authors as Metadata Creators. PhD Dissertation.Information Studies.UCLA. Los Angeles.http://beta.sensorbase.org/~mayernik/mayernik_dissertation_submitted_08June2011.pdf.

[20] Wallis, J. C., Mayernik, M. S., Borgman, C. L. & Pepe, A. (2010). Digital Libraries for Scientific Data Discovery and Reuse: From Vision to Practical Reality. Joint Conference on Digital Libraries, Gold Coast, Queensland, Australia, Association for Computing Machinery.

[21] Fienberg, S. E., Martin, M. E. & Straf, M. L. (Eds.). (1985). Sharing Research Data.Washington, DC: National Academy Press. http://books.nap.edu/catalog.php?record_id=2033.

[22] Preserving Scientific Data on Our Physical Universe.A New Strategy for Archiving the Nation’s Scientific Information Resources (1995). Washington, D.C.: National Academy Press. http://www.nap.edu/catalog.php?record_id=4871.

[23] Bits of Power: Issues in Global Access to Scientific Data. (1997). Washington, DC: National Academy Press. http://www.nap.edu.

[24] Long-Lived Digital Data Collections.(2005). National Science Board.http://www.nsf.gov/pubs/2005/nsb0540/.

[25] Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. (2009). Washington, D.C.: National Academy Press.

[26] Harnessing the Power of Digital Data for Science and Society. (2009).Report of the Interagency Working Group on Digital Data to the Committee on Science of the National Science and Technology Council.http://www.nitrd.gov/about/Harnessing_Power_Web.pdf.

[27] Berman, F., Lavoie B, et al. (2010). Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information. http://brtf.sdsc.edu/publications.html.

[28] Dalrymple, D. (2003). Scientific Knowledge as a Global Public Good: Contributions to Innovation and the Economy. In Esanu, J. M. & Uhlir, P. F. (Eds.).The Role of Scientific and Technical Data and Information in the Public Domain. Washington,DC, The National Academies Press: 35-51. http://books.nap.edu/catalog/10785.html.

[29] Esanu, J. M. & Uhlir, P. F. (Eds.). (2003). The Role of Scientific and Technical Data and Information in the Public Domain. Washington, DC: The National Academies Press.http://books.nap.edu/catalog/10785.html.

[30] Esanu, J. M. & Uhlir, P. F. (Eds.). (2004). Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium.Washington, DC: The National Academies Press.

[31] Hanson, B., Sugden, A. & Alberts, B. (2011).Making Data Maximally Available. Science,331(6018): 649-649.

[32] Grant Policy Manual.(2001). National Science Foundation.http://www.nsf.gov/publications/.

[33] NSF Data Sharing Policy. (2010).National Science Foundation.http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/aag_6.jsp#VID4.

[34] NSF Data Management Plans. (2010).National Science Foundation.http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_2.jsp#dmp.

[35] NSF Proposal Preparation Instructions. (2011). Award and Administrative Guide:National Science Foundation.http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_2.jsp#dmp.

[36] Wellcome Trust Statement on Genome Data Release. (1997). http://www.wellcome.ac.uk/doc%5Fwtd002751.html.

[37] Wellcome Trust Policy on Access to Bioinformatics Resources by Trust-Funded Researchers.(2001). Wellcome Trust.http://www.wellcome.ac.uk/doc%5Fwtd002759.html.

[38] Sharing Data from Large-scale Biological Research Projects: A System of Tripartite Responsibility. (2003). Meeting Organized by the Wellcome Trust, Fort Lauderdale, Florida, Wellcome Trust. http://www.wellcome.ac.uk/.../groups/corporatesite/@policy_communications/documents/web_document/wtd003207.pdf.

[39] ESRC Research Data Policy. (2010).Economic and Social Research Council.http://www.esrc.ac.uk/about-esrc/information/data-policy.aspx.

[40] DCC Data Management Plans.(2011). Digital Curation Centre.http://www.dcc.ac.uk/resources/data-management-plans.

[41] Abrams, S., Cruse, P. & Kunze, J. (2009). Preservation is not a Place. International Journal of Digital Curation, 4(1).http://www.ijdc.net/index.php/ijdc/article/viewFile/98/73.

[42] Witt, M., Carlson, J., Brandt, D. S. &Cragin, M. H. (2009).Constructing Data Curation Profiles.International Journal of Digital Curation, 4(3).http://www.ijdc.net/index.php/ijdc/article/viewFile/137/165.

[43] Summary of Principles. (1996). International Strategy Meeting on Human Genome Sequencing, Bermuda, The Wellcome Trust. http://www.gene.ucl.ac.uk/hugo/bermuda.htm.

[44] Genome Canada Data Release and Sharing Policy. (2005).http://www.genomecanada.ca/xcorporate/policies/DataReleasePolicy.pdf.

[45] Berman, H. M., Westbrook, J., Feng, J., Gilliland, G., Bhat, T. N., Wessig, H., Shindyalov, I. N. & Bourne, P. E. (2000).The Protein Data Bank. Nucleic Acids Research, 28: 235-242.

[46] Hilgartner, S. (1998).Data Access Policy in Genome Research.In Thakray, A. (Ed.).Private Science. Oxford, Oxford University Press: 202-218.

[47] Protein Data Bank. (2011). Retrieved from http://www.rcsb.org/pdb/ on 29 April 2011.

[48] Dryad. (2011). Joint Data Archiving Policy.http://datadryad.org/jdap.

[49] Whitlock, M. C., McPeek, M. A., Rausher, M. D., Rieseberg, L. & Moore, A. J. (2010).Data Archiving. American Naturalist, 175(2): E45-146.

[50] Data Citation Standards and Practices. (2010). International Council for Science :Committee on Data for Science and Technology.http://www.codata.org/taskgroups/TGdatacitation/index.html.

[51] Developing Data Attribution and Citation Practices and Standards: An International Symposium and Workshop. (2011). Berkeley, CA, US CODATA and the Board on Research Data and Information, in Collaboration with CODATA-ICSTITask Group on Data Citation Standards and Practices. http://sites.nationalacademies.org/PGA/brdi/PGA_064019.

[52] Buckland, M. K. (1991). Information as Thing. Journal of the American Society for Information Science, 42(5): 351-360.

[53] A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. (1999). Washington, DC: National Academy Press.

[54] Uhlir, P. F. & Cohen, D. (2011).Personal Communication. Board on Research Data and Information, Policy and Global Affairs Division, National Academy of Sciences.

[55] Borgman, C. L. (2007). Scholarship in the Digital Age: Information, Infrastructure, and the Internet. Cambridge, MA: MIT Press.

[56] Renear, A. H., Sacchi, S. & Wickett, K. M. (2010). Definitions of Dataset in the Scientific and Technical Literature.American Society for Information Science and Technology, Pittsburgh, Information Today.1-4.http://portal.acm.org/citation.cfm?id=1920447.

[57] Borgman, C. L. (2011). Why are the Attribution and Citation of Scientific Data Important?(Keynote). Developing Data Attribution and Citation Practices and Standards: An International Symposium and Workshop, Berkeley, CA, US CODATA and the Board on Research Data and Information, in Collaboration with CODATA-ICSTITask Group on Data Citation Standards and Practices.http://sites.nationalacademies.org/PGA/brdi/PGA_064019.

[58] Reference Model for an Open Archival Information System. (2002). Recommendation for Space Data System Standards: Consultative Committee for Space Data Systems Secretariat, Program Integration Division (Code M-3), National Aeronautics and Space Administration. http://public.ccsds.org/publications/archive/650x0b1.pdf.

[59] Lave, J. & Wenger, E. (1991). Situated Learning: Legitimate Peripheral Participation.Cambridge, UK: Cambridge University Press.

[60] Wenger, E. (1998). Communities of Practice: Learning, Meaning, and Identity. Cambridge, UK: Cambridge University Press.

[61] Knorr-Cetina, K. (1999). Epistemic Cultures: How the Sciences Make Knowledge.Cambridge, MA: Harvard University Press.

[62] Osterlund, C. & Carlile, P. (2005). Relations in Practice: Sorting Through Practice Theories on Knowledge Sharing in Complex Organizations. The Information Society, 21(2): 91-107.

[63] Van House, N. A. (2004). Science and Technology Studies and Information Studies.In Cronin, B. (Ed.).Annual Review of Information Science and Technology.Medford, NJ, Information Today. 38: 3-86.

[64] Bowker, G. C. (2000). Biodiversity Data Diversity. Social Studies of Science, 30(5): 643-683.

[65] Bowker, G. C. (2005). Memory Practices in the Sciences. Cambridge, MA: MIT Press.

[66] Edwards, P. N., Mayernik, M. S., Batcheller, A. L., Bowker, G. C. & Borgman, C. L.(2011). Science Friction: Data, Metadata, and Collaboration. Social Studies of Science, 41(5): 667-690.

[67] Karasti, H., Baker, K. S. & Halkola, E. (2006).Enriching the Notion of Data Curation in e-Science: Data Managing and Information Infrastructuring in the Long Term Ecological Research (LTER) Network. Journal of Computer-Supported Cooperative Work, 15(3-4): 321-358.

[68] Mayernik, M. S., Batcheller, A. L. & Borgman, C. L. (2011). How Institutional Factors Influence the Creation of Scientific Metadata. iConference, Seattle, Association for Computing Machinery.

[69] Palmer, C. L. (2005). Scholarly Work and the Shaping of Digital Access. Journal of the American Society for Information Science and Technology, 56(11): 1140-1153.

[70] Renear, A. H. & Palmer, C. L. (2009). Strategic Reading, Ontologies, and the Future of Scientific Publishing. Science, 325(5942): 828-832.

[71] Ribes, D., Baker, K. S., Millerand, F. & Bowker, G. C. (2005). Comparative Interoperability Project: Configurations of Community, Technology,Organization. Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries.

[72] Ribes, D. & Finholt, T. A. (2007). Tensions Across the Scales: Planning Infrastructure for the Long-term. Proceedings of the 2007 International ACM SIGGROUP Conference on Supporting Group Work, Sanibel Island, Florida, USA, Sanibel Island, Florida, Association for Computing Machinery. 229-238.

[73] Zimmerman, A. S. (2007). Not by Metadata Alone: The Use of Diverse Forms of Knowledge to Locate Data for Reuse. International Journal of Digital Libraries, 7(1-2):5-16.

[74] National Ecological Observatory Network. (2010). http://www.neoninc.org/.

[75] U.S. Long Term Ecological Research Network. (2010). http://lternet.edu/.

[76] Porter, J. H. (2010). A Brief History of Data Sharing in the U.S. Long Term Ecological Research Network. Bulletin of the Ecological Society of America, 91: 14-20.http://dx.doi.org/10.1890/0012-9623-91.1.14.

[77] GEON.(2011). http://www.geongrid.org/.

[78] Ribes, D. & Bowker, G. C. (2008).Organizing for Multidisciplinary Collaboration: The Case of the Geosciences Network. In Olson, G. M., Zimmerman, A. & Bos, N.(Eds.). Science on the Internet. Cambridge, MIT Press.

[79] PAN-STARRS.(2009). Panoramic Survey Telescope & Rapid Response System.http://pan-starrs.ifa.hawaii.edu/public/.

[80] Large Synoptic Sky Telescope.(2010). http://www.lsst.org/lsst.

[81] Sloan Digital Sky Survey. (2010). http://www.sdss.org/.

[82] Edwards, P. N. (2010). A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming. Cambridge, MA: MIT Press.

[83] Borgman, C. L., Wallis, J. C., Mayernik, M. S. & Pepe, A. (2007).Drowning in Data: Digital Library Architecture to Support Scientific Use of Embedded Sensor Networks.Joint Conference on Digital Libraries, Vancouver, British Columbia, Canada,Association for Computing Machinery. 269-277. http://doi.acm.org/10.1145/1255175.1255228.

[84] Gobler, C. J., Boneillo, G. E., Debenham, C. J. & Caron, D. A. (2004). Nutrientlimitation, Organic Matter Cycling, and Plankton Dynamics During an Aureococcus Anophagefferens Bloom. Aquatic Microbial Ecology, 35: 31-43.

[85] Borgman, C. L., Wallis, J. C. & Enyedy, N. (2006).Building Digital Libraries for Scientific Data: An Exploratory Study of Data Practices in Habitat Ecology. 10th European Conference on Digital Libraries, Alicante, Spain. Berlin: Springer. 170-183.

[86] Karasti, H., Baker, K. S. & Millerand, F. (2010). Infrastructure Time: Long-term Mattersin Collaborative Development. Computer Supported Cooperative Work, 19(3-4): 377-415.

[87] Aronova, E., Baker, K. S. & Oreskes, N. (2010). Big Science and Big Data in Biology: From the International Geophysical Year through the International Biological Program to the Long Term Ecological Research (LTER) Network, 1957-Present. Historical Studies in the Natural Sciences, 40(2): 183-224.

[88] Moore, A. J., McPeek, M. A., Rausher, M. D., Rieseberg, L. & Whitlock, M. C. (2010).The Need for Archiving Data in Evolutionary Biology. Journal of Evolutionary Biology, 23(4): 659-660.

[1] Chai Qingfeng, Shi Linyan, Mei Shan, Xiong Haitao, He Huixin. Extracting Knowledge Elements of Sci-Tech Literature Based on Artificial and Machine Features[J]. 数据分析与知识发现, 2021, 5(8): 132-144.
[2] Tan Ying, Tang Yifei. Extracting Citation Contents with Coreference Resolution[J]. 数据分析与知识发现, 2021, 5(8): 25-33.
[3] Wang Qinjie, Qin Chunxiu, Ma Xubu, Liu Huailiang, Xu Cunzhen. Recommending Scientific Literature Based on Author Preference and Heterogeneous Information Network[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[4] Han Pu,Zhang Zhanpeng,Zhang Mingtao,Gu Liang. Normalizing Chinese Disease Names with Multi-feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 83-94.
[5] Li He,Liu Jiayu,Li Shiyu,Wu Di,Jin Shuaiqi. Optimizing Automatic Question Answering System Based on Disease Knowledge Graph[J]. 数据分析与知识发现, 2021, 5(5): 115-126.
[6] Li Yueyan,Wang Hao,Deng Sanhong,Wang Wei. Research Trends of Information Retrieval——Case Study of SIGIR Conference Papers[J]. 数据分析与知识发现, 2021, 5(4): 13-24.
[7] Yi Huifang,Liu Xiwen. Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[8] Wang Hongbin,Wang Jianxiong,Zhang Yafei,Yang Heng. Topic Recognition of News Reports with Imbalanced Contents[J]. 数据分析与知识发现, 2021, 5(3): 109-120.
[9] Chang Zhijun,Qian Li,Xie Jing,Wu Zhenxin,Zhang Hu,Yu Qianqian,Wang Ying,Wang Yongji. Big Data Platform for Sci-Tech Literature Based on Distributed Technology[J]. 数据分析与知识发现, 2021, 5(3): 69-77.
[10] Hu Shaohu,Zhang Yingyi,Zhang Chengzhi. Review of Keyword Extraction Studies[J]. 数据分析与知识发现, 2021, 5(3): 45-59.
[11] Liu Tong, Liu Chen, Ni Weijian. A semi-supervised Chinese sentiment analysis method based on multi-level data augmentation [J]. 数据分析与知识发现, 0, (): 1-.
[12] Wang Hongbin, Wang Jianxiong, Zhang Yafei, Yang Heng. Topic Recognition Research on Topic Imbalanced News Text Data Set [J]. 数据分析与知识发现, 0, (): 1-.
[13] Sifan Zhang, Zhendong Niu, Hao Lu, Yifan Zhu, Rongrong Wang. Graph Convolution Embedding and Feature Cross Based Literature Citation Prediction Method:Taking the Transportation Field as An Example [J]. 数据分析与知识发现, 0, (): 1-.
[14] Qi Ruihua, Jian Yue, Guo Xu, Guan Jinghua, Yang Mingxi. Sentiment Analysis of Cross-Domain Product Reviews Based on Feature Fusion and Attention Mechanism [J]. 数据分析与知识发现, 0, (): 1-.
[15] Li Jiao, Huang Yongwen, Luo Tingting, Zhao Ruixue, Xian Guojian. Automatic Classification based on Multi-factor Algorithm [J]. 数据分析与知识发现, 0, (): 1-.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn