Data at Work: Supporting Sharing in Science and Engineering (Birnholtz & Bietz, 2003) Adam Worrall LIS 6269 Seminar in Information Science 3/30/2010
Data and data sharing Information science needs “a better understanding of the use of data in practice” (p. 339) Data fundamentally “different from documents” (p. 339) Data sharing important (p ) – “Openness” of scientific process Confirm findings, replicate results Build on previous work – Large data sets require distributed collaboration Collaboratories, e-science 3/30/20102LIS 6269 Seminar in Information Science
Data sharing problems Collaborating and sharing of data should be encouraged – But it “is not easy” to do so (p. 340) Why? 3/30/20103LIS 6269 Seminar in Information Science – Lack of willingness to share, trust others Competition for “revenue” (p. 345) Restrictions imposed by commercial interests Trust of sources Trust of others; will they use data well? (see also Van House, 2003)
Data sharing problems Reasons (continued) – Problems with finding shared data Negotiate access – Difficulties interpreting and using shared data How collected? How analyzed? What format? Metadata – Format, encoding, controlled vocabularies, etc. Data quality (see also Stvilia et al., 2008; Wand & Wang, 1996) “Tacit” knowledge of data (p. 340) 3/30/2010LIS 6269 Seminar in Information Science4
Methodology Three disciplines – Earthquake engineering – HIV / AIDS research – Space physics Observation and interviews of all three, surveys of earthquake engineers Inductive, grounded approach – Claimed they made “no assumptions about the purpose of data” (p. 340) 3/30/2010LIS 6269 Seminar in Information Science5
Data dimensions Two dimensions identified (p. 341) – “news” vs. “confirmation” Confirm existing or expected results Something unexpected needing further exploration Something not fitting expected / prevailing model – “streams” vs. “events” Longitudinal vs. cross-sectional Context for data may change Rate of data different Different disciplines, different data use 3/30/2010LIS 6269 Seminar in Information Science6
Data’s role in scientific communities Defines boundaries between communities – Experimental, deductive More possessive of data – Theoretical, inductive More interested in sharing data More interested in using shared data – Increasing blurring of boundaries in some fields Provides gateway into communities – Access to data, knowledge about data is “valuable resource” (p. 343) – Those who control data and knowledge, and access to it, act as “gatekeepers of the field” (p. 343) 3/30/2010LIS 6269 Seminar in Information Science7
Data’s role in scientific communities Indicates status in community – Using one’s own data “seen as ‘better’” than using public data (p. 344) “Analyzing somebody else’s data … arguably ‘counts’ for less” (p. 344) – Higher quality data means better reputation For researchers, research groups, and institutions Enables indoctrination into community – Students often work with collecting, managing data – Degree of sharing of responsibilities differs between fields, sometimes by seniority in field 3/30/2010LIS 6269 Seminar in Information Science8
Categories of data uses (p. 345) Identified with an eye to “revenue” from use – Benefits: reputation, publications, funding, etc. 1.“A scientist’s data set is her [or his] castle” – Researcher wants to and is able to use data to solve a particular problem or question – Will increase revenue 2.“With a little help from my friends” – Researcher wants to use data, but needs to collaborate with others in order to do so successfully – Data can be shared privately Limited risk (but still some risk) – Will increase revenue 3/30/2010LIS 6269 Seminar in Information Science9
Categories of data uses (p. 345) 3.“One scientist’s junk is another one’s treasure” – Researcher has no interest in using the data for a particular problem, but others do have interest – Sharing data will slightly increase revenue – May not be worth risk of losing other revenues 4.“D’oh!” – Researcher has not thought of a use, but it would be relevant to them and help them with a problem or question – Sharing data could be embarrassing, decrease revenue 3/30/2010LIS 6269 Seminar in Information Science10
Categories of data use Researchers will be less willing to share data unless incentives high, risks low Data sharing follows social networks Provide facilities for communication around abstractions of data sets – Encourage sharing and collaboration (category 2) Extend researcher’s social network – Reduce risks of embarrassment (category 4) Preliminary abstractions allow questions / comments before they are embarrassing – Increase incentives and benefits (categories 2 & 3) Beyond boundaries of researcher’s community 3/30/2010LIS 6269 Seminar in Information Science11
Recommendations and conclusions Efforts to support “social interaction around data abstractions and the data themselves” should be made (p. 346) Metadata should be augmented through “the sharing of supplementary materials” (i.e. abstractions) (p. 346) Consideration of the “social and scientific roles of data” and how to support them necessary in future research (p. 346) Better understanding of data abstractions needed (p. 347) 3/30/2010LIS 6269 Seminar in Information Science12
Issues with study and article Bias towards natural sciences – Social scientists may use, share data differently Only 3 disciplines studied, others may differ further Generally coherent, but some parts hard to follow – Indoctrination examples appeared similar, despite what authors termed “critical” distinction (p. 344) – Promised “three aspects of the way data are used” but only discussed two dimensions (p. 341) Limitations only discussed briefly 3/30/2010LIS 6269 Seminar in Information Science13
Questions, comments? 3/30/2010LIS 6269 Seminar in Information Science14