Presentation is loading. Please wait.

Presentation is loading. Please wait.

Loading Data into GERMINATE How data is loading into the GERMINATE tables.

Similar presentations


Presentation on theme: "Loading Data into GERMINATE How data is loading into the GERMINATE tables."— Presentation transcript:

1 Loading Data into GERMINATE How data is loading into the GERMINATE tables.

2 Loading Molecular Marker Data DATA Ac1Ac2Ac3Ac4Ac5 M1+-0++ M2+++++ M3+/-++++ M4---0- M5+--+/-+ M60---+ M7---+/-- M8+--++ M9----- M10++++/-- 81Ac176.. 86Ac276.. 92Ac376.. 45Ac476.. 63Ac576.. accession_id accenumb instcode_id Accessions [dataset 1, dimension0] ReferenceData Accession metadataset 21815 22865 23925 24455 25635 dataset_id index_id reference_idtable_id 5 -> Accessions table reference_id = accession_id [dataset 1, dimension 1] StringData Marker metadataset 31M1 32M2 33M3 34M4 35M5 36M6 37M7 38M8 39M9 310M10 dataset id index id string data dataset 1 111 122 133 141 1481 1494 1502 dataset_id index_id integer_data (enum_index) 1Ac1 2Ac2 3Ac3 4Ac4 5Ac5 Accession Order 1+ 2- 30 4+ 48+ 49+/- 50- Data Order + - 0 +/- Unique Allelic States 71[+] 72[-] 73[0] 74[+,-] unit id enum index EnumUnits- ArraysText 1[1] 2[2] 3[3] 4[1,2] enum index AlleleIndex text[ ] allele index array IntegerData dataset 2 dataset 3 11122data 2111Accession array for data 3114Marker array for data dataset_id method_id experiment_id 2015 31110 metadataset_id dimension dataset_id data_type_id size Metadatasets Datasets dataset_discription dimension count Box A Box B Box C Box D Box D displays the metadata information recorded in the database required to recreate the dataset. This includes the number of dimensions for a dataset and relates the metadatasets to the dataset. The DATA table represents a sample of how molecular marker data are typically submitted; a set of markers analyzed in a set of accessions. The arrows in the figure show flow of information as it is inserted into the database. Black arrows indicate data are being held temporarily, green indicates the insertion to the database and blue that data already inserted are being used to insert information into another table. In the latter case, ID’s assigned by the database are used to trace back to the original data. The colours in the tables follow the dataset and metadatasets through the process of being inserted into the database. The peach colour denotes the Accession metadataset, green denotes the Marker metadataset, and purple denotes the allele data. Box A represents the Accession data and metadata inserted into GERMINATE. On entry, each Accession is assigned an accession_id which is unique in the database and this ID is used to reference the appropriate accession in the accession metadataset. The order or number of accession_id’s has no influence on the order of accessions in the metadataset. The ReferenceData table uses a data index to track the correct order of the accession_id’s. Box B indicates where the marker information is inserted into the database, again retaining the order in the original dataset by the data index value. Box C demonstrates how the allelic state of the accession by marker is translated into an integer id (enum_index). This ID is stored in appropriate order in the IntegerData table. The enum_index can then be used to translate back to the actual allele value or to an allele index if only the relative allele states between accessions are required in a query. The AlleleIndex table was created to speed up queries where technology is unimportant and the relative allele values will suffice to answer the question.

3 3 sets of data –Population data Stored in Pedigree table, reference to individuals in reference table which links population to the dataset. –Data used to create linkage map Stored similar to genetic data –Genetic linkage map data Genetic Map Data

4 LocusLinkage GroupPosition L110.0 L219.7 L3112.5 L4117.5 L520.0 L625.2 L729.8 L8214.2 L9218.3 L10223.5 L1130.0 L1232.1 L1337.5 L14315.6 L15319.7 L16325.6 Real Data 110.0 129.7 1312.5 1417.5 150.0 165.2 1x# 11625.6 dataset_id index_idreal_data Positions String Data Linkage Groups String Data Loci 311 321 331 341 352 362 3x# 3163 dataset_id index_idstring_id 21L1 22L2 23L3 24L4 25L5 26L6 2xname 216L16 dataset_id index_idstring_data Original Data 11132position 2114loci 3114Linkage group dataset_id method_id experiment_id 20116 311 metadataset_id dimension dataset_id data_type_id size Metadatasets Datasets dataset_discription dimension count The positions for the loci in cM (indicated by the method, not shown here) is the primary dataset. The Linkage Groups and Loci are added as metadatasets for this dataset. Any additional information users may wish to store can be added as added dimensions to the dataset. The primary dataset is then linked to the populations and genetic data used to create the maps using the linking table. The Grey boxes are database assigned ID's

5 Trait Data M-0002 M-002 E-0008E-0139E-0142 CGN029059= CGN029069= CGN029071= CGN033539 CGN101421 CGN101439 CGN101639 CGN101659 CGN101669 CGN101679 CGN101689 CGN101699 experiment_idnamedateauthor_iddescription 1E-00081985-01-011 2E-01391988-01-011 3E-01421989-01-011 method_idnameunit_iddescription 1M-0021 dataset_idmethod_idexperiment_iddata_type_iddescription 1111Data for E-0008, M-0002 2112Accessions for E-0008, M-0002 3121Data for E-00139, M-0002 4122Accessions for E-0139, M-0002 5131Data for E-00142, M-0002 6132Accessions for E-0142, M-0002 unit_idnameabbreviationdescription 1State:M-002 Units Methods Experiments metadataset_iddimensiondataset_idsize 2014 4033 6055 Datasets Metadatasets Original Data This trait data all uses the same method but three different experiments are done. Each experiment then has two datasets the data value and the accession. The colors follow the loading of each experiment into the database. The ID's (method_id, dataset_id, etc.) are assigned by the database.

6 Trait data unit_idenum_indexenum_value 119= 121= 139 141 EnumUnits dataset_iddata_indexreference_idtable_id 2110254 2210264 2310294 2410304 4125844 4210284 4310354 6110454 6210464 6326854 6410504 6510514 dataset_iddata_indexinteger_data 111 121 132 143 313 324 334 514 524 534 544 554 IntegerData ReferenceData enum_indexallele_index 1[1] 2[2] 3[3] 4[4] AlleleIndex The data values are translated to an integer using the EnumUnits table and an integer loaded into the database. This is done because for large datasets searching a integer table will be faster than a string table. The reference_id here correspondes to the id for the accession in the original data entry.


Download ppt "Loading Data into GERMINATE How data is loading into the GERMINATE tables."

Similar presentations


Ads by Google