Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dataset Classes A dataset class tells us: – How to handle a particular type of dataset – Exactly how to put it into manual delivery (it specifies the API.

Similar presentations


Presentation on theme: "Dataset Classes A dataset class tells us: – How to handle a particular type of dataset – Exactly how to put it into manual delivery (it specifies the API."— Presentation transcript:

1 Dataset Classes A dataset class tells us: – How to handle a particular type of dataset – Exactly how to put it into manual delivery (it specifies the API for manual delivery) – How to put it in the database (resource XML) – How to process it in the workflow (graph XML)

2 Human Roles Dataset Integrator – Puts datasets into manual delivery (conforming to the dataset class API) – Provides a specification of each dataset for the workflow. Workflow Pilot – Configures the workflow – Runs the workflow Workflow Developer – Writes dataset classes – Writes graph files – Writes step classes – Writes plugins ReFlow Developer – Develops underlying workflow system

3 Organism Abbrev Throughout the workflow system, we use a unique, stable “identifier” for an organism: its organism abbrev We do not use things like taxon IDs, scientific names, etc. Examples: – tgonME49 – pfal3D7 – ncanLIV It always includes: – One letter for the genus – Three letters for the species – The strain Once it is set, it does not change, even if we adjust the name of the organism

4 Manual Delivery Manual delivery has a very specific structure: manualDelivery/ project/ organismAbbrev/ category/ datasetName/ datasetVersion/ final/ fromProvider/ workspace/ README final/ contains standard file names that conform to the dataset class API – Eg: SNPs.gff – They never have the name of the provider or any other dataset specific info

5 … myOrg uniprot 2.0 … … <subgraph name=“${orgAbbrev}_${name}_dbxrefs” xmlFile=“loadResources.xml”> for.. Top Level Graph Datasets Dataset Classes Workflow Plan Code generator Another Graph Another Graph myOrg.xml classes.xmldbXRefs.xml myOrg.xml myOrg/dbXRefs.xml Resources Workflow Graph Generated files

6 Graph FilesResource Files Dataset Files ToxoDB.xml ToxoDB/tgonME49.xml ToxoDB/tgonME49/Einstein.xml ToxoDB.xml ToxoDB/tgonME49.xml ToxoDB/tgonME49/Einstein.xml ToxoDB/project.xml ToxoDB/tgonME49/ESTs.xml ToxoDB/tgonME49/Einstein/chipChipSamples.xml ToxoDB/tgonME49/dbXRefs.xml ToxoDB/tgonME49/arrayStudies.xml ToxoDB/tgonME49/SNPs.xml Generates

7 DataSource We store simple meta information in the database about each dataset – Provider contact info – Descriptions – Display names – References to WDK searches, tables and attributes that use the data The information is stored in two tables: – DataSource -- pulled right from the – DataSourceInfo -- provided by a specific file after loading data is completed And it available in the WDK as a DataSource record – The search and record pages (eg Gene) can access this info for display purposes – Soon we will support searches for these, eg, find all searches that involve a certain dataset It makes no sense to have two names: – – DataSource table and perl objects So, either: – Rename to This is a pain to transition to in our code, – Or, rename DataSource to DataResource and keep as is

8 DataResource? It makes no sense to have two names: – – DataSource table, perl objects, and WDK record So, either: – Rename to This is a pain to transition to in our code, – Or, rename DataSource to DataResource and keep as is

9 DataResourceInfo DatasetClasses do not include meta info about the dataset: – Contact info – Description – Mapping to wdk searches and records DatasetClasses describe how to load the data But, we can have DatasetClass


Download ppt "Dataset Classes A dataset class tells us: – How to handle a particular type of dataset – Exactly how to put it into manual delivery (it specifies the API."

Similar presentations


Ads by Google