Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Database Comparative Genomics Phylogenomics Variation GrameneMart (BioMart) Discovery Environment Josh Stein Cold Spring Harbor Laboratory 1.

Similar presentations


Presentation on theme: "Genome Database Comparative Genomics Phylogenomics Variation GrameneMart (BioMart) Discovery Environment Josh Stein Cold Spring Harbor Laboratory 1."— Presentation transcript:

1 Genome Database Comparative Genomics Phylogenomics Variation GrameneMart (BioMart) Discovery Environment Josh Stein Cold Spring Harbor Laboratory 1

2 Exploring Plant Genomes Browse Search Upload personal data Analysis tools

3 Gramene’s Key Strengths Comparative genomics – Complete reference genomes for 11 plant species including A. thaliana & A. lyrata – Whole genome alignments – Phylogenetic gene trees Ability to upload and share data Data mining using Gramene Mart Extensive variation data sets for Arabidopsis Integration with Pathways databases

4

5 Quick entry points

6 Browser tracks Whole genome alignments Synteny views Location-based variation

7 Gene sequence Splice variants Gene centered variation Phylogenetic trees Cross-reference to external databases Gene sequence Splice variants Gene centered variation Phylogenetic trees Cross-reference to external databases

8 Transcript & protein sequences Protein structure Transcript & protein based variation GO and other ontologies Transcript & protein sequences Protein structure Transcript & protein based variation GO and other ontologies

9 Location View Browser Tracks TAIR 10 Annotation EST/cDNA alignments Array probes Variation Genome alignments -cross-species browsing Repeats

10 Configuring Tracks

11 Standard Analysis & Visualization InterPro domain & GO functional annotation Cross-reference to external ID’s Whole Genome Alignment (Blastz-chain-net) Phylogenetic Gene Trees (Compara) Synteny Analysis Consequences of SNP 11

12 InterPro/dbXref/GO Structural prediction: Pfam, PIRSF, PRINTS, PROSITE, SMART, SUPERFAMILY, TIGRFAM, TMHMM, SignalP Cross-reference genes to 3 rd party identifiers: Entrez Gene, PlantGDB, PUTs, RefSeq, Gene Index, UniGene, UniProtKb/Swissprot, NASC, IPI, WikiGene Gene Ontology, Plant Ontology

13 Alignment View Pairwise BLASTZ-CHAIN- NET whole genome alignment Arabidipsis lyrata, Poplar, Grapevine Rice, Brachypodium, Sorghum Physcomitrella

14 Multi-species View A. lyrata Arabidopsis Grapevine Poplar

15 Conserved non-coding regions 15

16 View Sequence Alignment

17 Phylogenetic Analysis Tools

18 18 Compara Gene Trees Gene Trees for 11 plants plus human, Ciona, fly, worm, & yeast Infers orthologs and paralogs by reconciling gene tree with input species tree Taxonomic dating Gene Trees for 11 plants plus human, Ciona, fly, worm, & yeast Infers orthologs and paralogs by reconciling gene tree with input species tree Taxonomic dating Reconstructing evolutionary histories http://useast.ensembl.org/info/docs/compara/ho mology_method.html Vilella A.J., et al. (2008). Genome Res. Pre-print: doi:10.1101/gr.073585.107 ~35,000 trees ~24,500 plant specific ~10,000 containing Arabidopsis 1059 specific to Arabidopsis genus 79 specific to A. thaliana 527 specific to A. lyrata ~35,000 trees ~24,500 plant specific ~10,000 containing Arabidopsis 1059 specific to Arabidopsis genus 79 specific to A. thaliana 527 specific to A. lyrata

19 Tree Viewer Speciation node = ortholog Duplication node = paralog

20 Newick Tree & Alignment 20 (((ENSCINP00000002474_Cint_:0.0000, R10D12.12_Cele_:3.4477):0.7716, FBpp0084782_Dmel_:0.8566):0.0000, (((((BRADI3G43170.1_Bdis_:0.0615, BRADI2G38000.1_Bdis_:0.1536):0.0214, ((LOC_Os02g26814.1_Osat_:0.0000, BGIOSGA008178-PA_Oind_:0.0000):0.0000, ORGLA02G0140900.1_Ogla_:0.0000):0.0938):0.0231, (((GRMZM2G050705_P02_Zmay_:0.0099, GRMZM2G124671_P01_Zmay_:0.0745):0.0043, Sb08g016480.1_Sbic_:0.0348):0.0000, (GRMZM2G022470_P01_Zmay_:0.0475, Sb04g017490.1_Sbic_:0.1037):0.0000):0.0917):0.1118, (((POPTR_0005s03870.1_Ptri_:0.0420, POPTR_0013s02650.1_Ptri_:0.0427):0.0918, (GSVIVT01006266001_Vvin_:0.0342, GSVIVT01000019001_Vvin_:0.0817):0.1210):0.0363, ((scaffold_702792.1_Alyr_:0.0043, scaffold_603852.1_Alyr_:0.0632):0.0277, AT4G16710.1_Atha_:0.0204):0.2813):0.1261):0.5081, E_GW1.232.43.1_Ppat_:0.3698):0.3605):0.0000; ORGLA02G0140900.1_Ogla_ VFVTVGTTCF DALVKAVDSP QVKEALLEKG YTDLIIQMGR GTY------- BRADI2G38000.1_Bdis_ VFVTVGTTCF DALVKAVDSE EVKQALLRKG YTDLLIQMGR GTY------- GRMZM2G050705_P02_Zmay_ VFVTVGTTCF DALVMAVDSP EVKKALLQKG YSNLLIQMGR GTY------- POPTR_0005s03870.1_Ptri_ VFVTVGTTLF DALVRTVDTK EVKQELLRNG YTHLIIQMGR GSY------- GRMZM2G022470_P01_Zmay_ VFVTVGTTCF DALVMAVDSP EVKKTLLQKG YSNLLIQMGR GTY------- BRADI3G43170.1_Bdis_ VFVTVGTTCF DALVKKVDSP QVKEALWQKG YTDLFIQMGR GTY------- GSVIVT01006266001_Vvin_ VFVTVGTTCF DALVKAVDTQ EFKKELSARG YTHLLIQMGR GSY------- Sb08g016480.1_Sbic_ ---------- ----MAVDSP EVKMALLQKG YSNLLIQMGR GTY------- GRMZM2G124671_P01_Zmay_ VFVTVGTTCF DALVMAVDSP EVKKALLQKG YSNLLIQMGR GTY------- Sb04g017490.1_Sbic_ ---------- ----MAVASP EVKKALLQKG YSNLVIQMGR GTY------- BGIOSGA008178-PA_Oind_ ---------- ---------- ---------- ---------- ---------- E_GW1.232.43.1_Ppat_ VLVTVGTTLF DALVREASSQ PCRQVLADFG YSSLVIQRGK GSF------- scaffold_702792.1_Alyr_ VFVTVGTTSF DALVKAVVSE DVKDELQKRG FTHLLIQMGR GIF------- R10D12.12_Cele_ ---------- ---------- ---------- ---------- ---NQDVIDR ENSCINP00000002474_Cint_ IFVTVGTTSF DELTETITSK PVQKVLQSQG YDKVTIQYGR GKH------- scaffold_603852.1_Alyr_ VFVTVGTTSF DALVKAVVSE DVKDELQKRG FTHLLIQMGR GNF------- AT4G16710.1_Atha_ VFVTVGTTSF DALVKAVVSQ NVKDELQKRG FTHLLIQMGR GIF------- LOC_Os02g26814.1_Osat_ VFVTVGTTCF DALVKAVDSP QVKEALLEKG YTDLIIQMGR GTY------- GSVIVT01000019001_Vvin_ VFVTVGTTCF DALVKAVDTH EFKRELFARG YTHLLIQMGR GSY------- FBpp0084782_Dmel_ VYITVGTTKF DALISTASTE PALKALQNRK CTKLVIQHGN SQP------- POPTR_0013s02650.1_Ptri_ VFVTVGTTLF DALVRTVDTK EVKQELLRKG YTDLVIQMGR GSY-------

21 Orthologs & Paralogs 21

22 Gene-Centered Synteny Build 22 Oryza sativa JaponicaO.jap Brachypodium distachyonYESB.dis Sorghum bicolorYES S.bic Arabidopsis thaliana---A.tha Arabidopsis lyrata---YESA.lyr Vitis vinifera---YES V.vin Poplar trichocarpa---YES P.tri Compara OrthologsCollinear mappings (DAGchainer) “in-range” mappings near collinear anchors Map

23 Synteny View Available for A. lyrata, grapevine, & poplar Navigate to other genome Ortholog browser Link to multi-species view

24 Browse across duplicated regions from polyploidy Chr 1 vs PoplarChr 1 vs GrapevineSwitch reference to grape

25 Some Applications …

26 Distinguish “Real” Genes From Transposons 26 FAR1/FHY3 transcription factor family functions in light sensing Evolved from Mu-related transposes Cannot distinguish by BLAST FHY3 “Rule-in” functioning genes Missing annotation in A. lyrata? Domesticated TE

27 Enrich Annotations in Other Species Arabidopsis and Rice orthologs both show one gene Arabidopsis ortholog in correct syntenic context 27 Putative mis-annotated Grape gene

28 Adding Custom Tracks

29 Custom Tracks Salk T-DNA lines Uploaded from my laptop GFF file format EST alignments from non-model plants DAS: Distributed Annotation system Protocol for sharing 3 rd party data DAS Registry Methylome (Ecker) Uploaded from an URL BED file format

30 Upload Your Data chr1 SALK T-DNA 1066 1097 7e-07 -. ID=SALK_082138.17.20.x chr1 SALK T-DNA 1066 1097 6e-07 +. ID=SALK_114475.16.50.x chr1 SALK T-DNA 1067 1093 3e-06 -. ID=SALK_065399.25.40.x chr1 SALK T-DNA 1073 1097 6e-05 -. ID=SALK_117416.15.55.n chr1 SALK T-DNA 1075 1099 6e-05 -. ID=SALK_132061.15.90.x chr1 SALK T-DNA 1076 1100 6e-05 -. ID=SALK_117013.15.75.n chr1 SALK T-DNA 1676 2070 0.0 -. ID=SALK_047276.52.80.x

31 Attach From Remote File track name="mCIP col/met1 BU" color=darkgreen description="Methylation" useScore=3 visibility=2 height=30 chr1 25 49 mCIP_col/met1_BU 13.4997 chr1 60 84 mCIP_col/met1_BU 7.54671 chr1 113 137 mCIP_col/met1_BU 0.0145213 chr1 154 178 mCIP_col/met1_BU 0.15643 chr1 185 209 mCIP_col/met1_BU 0.000386254 chr1 219 243 mCIP_col/met1_BU 0.000218226

32 Add DAS: Distributed Annotation System Protocol for sharing 3 rd party data via a DAS registry www.dasregistry.org www.gramene.org/gramenedas/das/sources

33 Manage Custom Tracks

34 Turn On/Off Custom Tracks

35 GrameneMart Orthologs in lyrata, grape, poplar, rice, Brachypodium, sorghum maize, & moss Custom queries for bulk downloads Powerful tool for data mining

36 BioMart Use Cases All transmembrane-targeted genes, showing InterPro domains, GO terms, and AFFY id’s

37 BioMart Use Case Evolution of cyclin genes: Taxon of origin for paralog pairs of cyclin-domain genes that have an ortholog in Physcomitrella

38 BioMart Use Cases Mine germplasm for loss of function alleles in diversity populations: All Myb-domain genes with “STOP_GAINED” SNP allele

39 Additional Data Access 39 FTP: Data files, SQL dump, SoftwareRead-only Public MySQL Web Services

40 HELP!

41 Contact Us


Download ppt "Genome Database Comparative Genomics Phylogenomics Variation GrameneMart (BioMart) Discovery Environment Josh Stein Cold Spring Harbor Laboratory 1."

Similar presentations


Ads by Google