Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using DCO Data (Infrastructure, Management, Analysis, Visualization, …) Peter (Marshall Ma) and the Data Science

Similar presentations


Presentation on theme: "Using DCO Data (Infrastructure, Management, Analysis, Visualization, …) Peter (Marshall Ma) and the Data Science"— Presentation transcript:

1 Using DCO Data (Infrastructure, Management, Analysis, Visualization, …) Peter Fox @taswegian, pfox@cs.rpi.edu (Marshall Ma) and the Data Science Teampfox@cs.rpi.edu Tetherless World Constellation Rensselaer Polytechnic Institute DCO Summer School, July 14, 2014. Big Sky, MT Data Science https://deepcarbon.net/group/dco-summer-school-2014

2 Deep Carbon Observatory Global community of ‘Carbon’ scientists (~1000 from ~40 countries) contributing to a Deep Earth Computer (data legacy) comprising: Global Earth Mineral Laboratory Global Census of Deep Fluids Global Volcano Gas Emissions Global Census of Deep Microbial Life Global State of High Pressure and Temperature Carbon and Related Materials Global Inventory of Diamonds with Inclusions …

3 Data Science is … Doing science with someone else’s data … – across datasets – with models – multi-dimensional, multi-scale, multi-mode – complex data-types – needing new analytic and visual approaches Especially in multiple “dimensions” (functional) – E.g. Detection/ attribution methods/ algorithms – Visual exploration Data Science

4 You may see many diagrams like 4

5 5 Physical quantity versus measured as quantity Value and units? Reference frame? Reference units? Value and units?

6 Data A scientist bringing new data Spreadsheet Diagram Digital Map Report A data manager transforming data Transformed data ready for import Repository staff/ Data librarian (Fleischer, 2011) Importing tool A data repository Internet Use case: How DCO Finds Out About Data

7 Data-Information- Knowledge “Ecosystem” 7 DataInformationKnowledge ProducersConsumers Context Presentation Organization Integration Conversation Creation Gathering Experience

8 8 ProducersConsumers Quality Control Fitness for Purpose Fitness for Use Quality Assessment Trustee Trustor

9 Spreadsheets E.g. Excel – import data 9

10 Documentation? 10

11 Substantial metadata – how to visualize THIS? Census of Deep Life

12 To incline to one side; to give a particular direction to; to influence; to prejudice; to prepossess. [1913 Webster] A partiality that prevents objective consideration of an issue or situation [syn: prejudice, preconception] For acquisition – sampling bias is your enemy Cognitive bias is (due to) YOU! 12

13 Provenance* Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility –Internal –External

14 How you find DCO data…? http://deepcarbon.net/dco_datasets – Will soon be a window into community-based sources http://metpetdb.rpi.edu http://earthchem.org/ http://www.earthchem.org/petdb http://vamps.mbl.edu/portals/deep_carbon/cdl.ph p http://vamps.mbl.edu/portals/deep_carbon/cdl.ph p …

15 Browser

16 All information is linked and traceable! 16

17

18 E.g. Deep Life (CoDL) New tools: R (statistics, visualization, modeling), D3.js (visualization) NOT just of the data, but of all types of information, knowledge! iPython Notebooks?

19 When You Use Data – Science 2.0 Version/ subsetting and converting to a format you are familiar with is very common but mysterious – Take notes – document – provenance Software – what did you use and how? Derived products – what did you create, how, why, etc. Use the metadata every chance you get, e.g. filenames! Place them in a Web-accessible folder, consider getting an identifier Use social media, blogs, etc. to discuss it..

20 4 R’s … Goble and others

21

22 Exercise 1 Search for and access a dataset that you are not familiar with: Can you read it? Can you make sense of it? Can you assess quality, uncertainty? Any sources of bias? What would you need to do to make it useful?

23 When You Generate Data – Science 2.0 How the data was generated, why, for what, when and in what format – Take notes – document – provenance Software – what did you use and how? Derived products – what did you create, how, why, etc. Use the metadata every chance you get, e.g. filenames! Place them in a Web-accessible folder, consider getting an identifier Use social media, blogs, etc. to discuss it..

24 Make it visible to DCO (can be private) https://deepcarbon.net/dco/dco-open- access-and-data-policies https://deepcarbon.net/page/submit- community-data https://deepcarbon.net/dco/dco-open- access-and-data-policies https://deepcarbon.net/page/submit- community-data You get an identifier! DCO-ID, can be cited, rewarded and much more… Share…

25 DCO checklist: what people have to do (courtesy UC3) Your data management plan Funding agency requirements Creating your data Organizing your data Managing your data Sharing your data Domain Scientist Data manager Repository staff Data Scientist Curation Services & Tools Domain scientists often also take up these two roles, which however is not efficient and effective (i.e., the 80-20 rule). Data Science

26 DCO checklist: a service & tool perspective Your data management plan AP Sloan requirements+ Creating your data Organizing your data Managing your data Sharing your data e.g., NSF New Proposal and Award Policies and Procedures Guide (effective January 14, 2013)Proposal and Award Policies and Procedures Guide Object Modeling Identity Services Storage Services Ingest Services Discovery Service Characterization Services Access Services CKAN, community Faceted search and Drupal etc. DCO-ID (Handle+DOI) + Linked Data, community Schema.org, etc. Use cases, info. model

27 Exercise 2 Begin with a recent dataset that you generated or we’re involved in generating Can someone else read it? Can someone make sense of it? Have you asserted quality, uncertainty? Have you described known sources of bias? What else would you now do to make it more useful?

28 Further reading Data Science course at RPI: http://tw.rpi.edu/web/Courses/DataScience/2013 http://tw.rpi.edu/web/Courses/DataScience/2013 Fourth Paradigm: http://research.microsoft.com/en- us/collaboration/fourthparadigm/ http://research.microsoft.com/en- us/collaboration/fourthparadigm/ Data Management Planning tools: – http://tw.rpi.edu/web/project/DCO- DS/WorkingGroups/DMP http://tw.rpi.edu/web/project/DCO- DS/WorkingGroups/DMP – http://www.iedadata.org/compliance/plan http://www.iedadata.org/compliance/plan – https://dmp.cdlib.org/ https://dmp.cdlib.org/

29 Breakout Session Today Exercises 1 and 2 Discussion

30 Friday Marshall (Xiaogang) Ma will round out the data discussion DCO goal for data: in the interim, – help you become data scientists (as well as your specialty) Then, in time… – you can drop “data” because you will handle data as easily as you do field work, use instruments, etc…


Download ppt "Using DCO Data (Infrastructure, Management, Analysis, Visualization, …) Peter (Marshall Ma) and the Data Science"

Similar presentations


Ads by Google