Download presentation
Presentation is loading. Please wait.
Published byThomasina Boyd Modified over 9 years ago
1
VISTA family of computational tools for comparative genomics How can we leverage genome sequences from many species to learn about genome function?How can we leverage genome sequences from many species to learn about genome function? Microbial applicationsMicrobial applications Inna Dubchak, Genomics Division LBNL, JGI ildubchak@lbl.gov vista@lbl.gov
2
Human Genome Annotation Gene A only 1–2% codingonly 1–2% coding efficient identification of regulatory sequences?efficient identification of regulatory sequences?
3
Sequence conservation implies function AGTTGAAAC GGAGCTGATGGAGC GGTGGGC T TACATTTCG ACTGTATCGCCTCG CAACCCT A potential functional region conservation sequence CTATAAATGC CTATAAATGC AC AC Last Common Ancestor divergence= non functional functional region =conservation 80 million years
4
Comparative Genomics Introduction Human Drosophila Mouse Urchin Chimp Similar Genes Synteny Sequence Alignment
5
http://genome.lbl.gov/vistahttp://genome.lbl.gov/vista VISTA is an integrated system for global sequence alignment and visualization for comparative genomic analysis
6
Algorithm Feature AVID *can handle draft sequence LAGAN ** produces true multiple alignments Shuffle-LAGAN ** handles rearrangements (inversions, translocations) * Lior Pachter, UC Berkeley ** Michael Brudno, U. Toronto How does VISTA Work: Global Genomic Aligments sequence 1 sequence 2 1- anchoring: identify regions of strong similarity 2- chaining: join regions of weak or no similarity
7
104670599 TCCCCAACTATAAATGGATGAAATTGCAGGAAATGACAGGTA-----TGACCCCTTCTCT 104670653 >>>>>>>>> ||| ||| | |||||| | || || | | | ||||||| || <<<<<<<<< 052328645 TCCTCAATTCAGAATGGAGGGAAGCACACAGGACACAGAGATCCCTTTACCCCCTTCGCT 052328704 104670654 ACCAGAGGCTTGGATTTTTTTTCTTCTTCTCCTCCCTTAGCCCGTGTTGAGCTATTTCGG 104670713 >>>>>>>>> | | | || | | | <<<<<<<<< 052328705 ATGT----------------------------------------TATCAGGCCACTCAAG 052328724 104670714 AGTTTCCTGGCAGGGAAGAGCGAGTGAGGCTGCCTTACCTTCAGGATGACCACTAGCAGG 104670773 >>>>>>>>> |||| | || || | ||||| ||||||| | ||| ||||||| ||||||||| |||||| <<<<<<<<< 052328725 AGTTCCTTGTCAAG-AAGAGTGAGTGAGTCCACCTCACCTTCAAGATGACCACCAGCAGG 052328783 104670774 CCAGCGCTCACAAGAAGAGGAATGAGGCTACTAATGAACCAGCTAAACCAGAGGATGCTG 104670833 >>>>>>>>> |||||||||||||| ||||| |||||||| |||| |||||||||||||||||||||| <<<<<<<<< 052328784 CCAGCGCTCACAAGCAGAGGGATGAGGCTGCTAACAAACCAGCTAAACCAGAGGATGCCA 052328843 104670834 TTGTCCAGGCCCATGATCCGCATGGTCTCTTTCAGCCGTGCCTCCTTCTCATACACGATG 104670893 >>>>>>>>> |||||||| |||||||||||||||||||| |||||||| ||||||||||||||||| ||| <<<<<<<<< 052328844 TTGTCCAGACCCATGATCCGCATGGTCTCCTTCAGCCGAGCCTCCTTCTCATACACAATG 052328903 104670894 CCCTTGATGATCACAGCCACTGAGTAAATCCAGGCCAGCGTCATGAAGAGGGGCATTGAC 104670953 >>>>>>>>> | ||||||||||||||| || ||||| |||||||| || ||||||||||||||||||||| <<<<<<<<< 052328904 CTCTTGATGATCACAGCGACAGAGTAGATCCAGGCTAGAGTCATGAAGAGGGGCATTGAC 052328963 104670954 CGGCTCATCACCCGCAGAAAGCTGGAGGCCCCAAGGAAGGACAAGGGGAGAAAGAAAGAC 104671013 >>>>>>>>> |||||||| ||||||||||| |||||||| | || || | || ||| | || |||| <<<<<<<<< 052328964 CGGCTCATGACCCGCAGAAAACTGGAGGCACAGAGAAAAGGCATGGGAAAAATGAAAAGT 052329023 104671014 ACACGTGAGCCAGGGTGATGGGCCAAGGCCTCTGAGCCTGCATGCTAGAGGGAGCACCAC 104671073 >>>>>>>>> ||||||| || | ||||||||| |||| || |||| ||| | <<<<<<<<< 052329024 ----GTGAGCCCGG-CACCGATCCAAGGCCT-------TGCACACTGGAGGACAAACCTC 052329071 104671074 ATCTGGGCCACAGAAGGACAGGCCCTCTAGACTCTGAAATGTACGTATGATCCAATGCTT 104671133 >>>>>>>>> ||| ||| | | | | | |||||| || ||||| ||||| | | || | || <<<<<<<<< 052329072 ATCAGGGTCGCTTATGAA-AGGCCCACTGAACTCTCAAATG--------ACCAAAGGTTT 052329122 104671134 CACGAGCAATGCAATGTAGAGAGAAAAACGAGGCTAACAAAGTGTTGCCAAACCAAATTT 104671193 >>>>>>>>> || |||| || | ||||| ||| | || | | || | ||| | |||||| <<<<<<<<< 052329123 CATTAGCAGTGGA---CAGAGATGAAACCTGGGTTTCGAGGGTATGGCCGTGCAAAATTT 052329179 104671194 CTTTGGGGGCTTGCTTCAGTAACTAGGTAACTGTGAGCGATAC-TTAAACTAAAGGTAGA 104671252 >>>>>>>>> || |||||| ||| | || ||||| || | || | | |||| |||| || <<<<<<<<< 052329180 TTTCAGGGGCTCTCTTTAATAGCTAGGAAATGGATAGGGTAATATTAAGATAAATATAAG 052329239 104671253 TTATGTTA--AAGTACTAAAAACCAAAACA------AAAAAACAACTCATTCTCTCACAA 104671304 >>>>>>>>> ||| || |||||||||| || || | || ||||| ||| | | | <<<<<<<<< 052329240 TTACTCTACTAAGTACTAAACACAAAGGGCGGGGGCAGAATCCAACTTGGTCTTCCGCTA 052329299 Global Genomic Aligner Output
8
VISTA visualization 104637349 GTAGTGCCACTGAGTGTGACAGGGATGGCAAGAAAAGCATTAAGTTCCAAGGGGAAAGAA 104637408 >>>>>>>>> | || ||| ||| |||| |||||||||| | || || |||| | |||||||| <<<<<<<<< 052290302 GAGATGTCACCAAGTA-AACAGAGATGGCAAGAGGACCAATAGGTTCTAGTGGGAAAGAC 052290360 “sliding window” to measure sequence conservation (default window size 100bp) Graphical presentation of sequence conservation as “peaks-and-valley” curve >70% identity base sequence coordinates % identity
9
VISTA homepage: http://genome.lbl.gov/vista VISTA Servers (submit your own data) VISTA Browsers (precomputed alignments) Other VISTA-related Projects Access servers, browsers, other information
10
wgVISTA Align and compare sequences, including microbial assemblies mVISTA Align and compare sequences rVISTA Search for TFBS combined with a comparative sequence analysis VISTA Servers GenomeVISTA Align DNA sequence to a genome
11
VISTA Browser Browse through pre-computed whole-genome alignments Whole Genome rVISTA Whole genome analysis for conserved TFBS over-represented in upstream regions of genes Precomputed Alignments VISTA-Point Browse and obtain sequence and alignment data
12
VISTA Browser: Access
13
VISTA Browser: Input Menu genomeposition visualization Java 2, if needed Choose “base” genome Select location Determine visualization preference VISTA Browser VISTA tracks on UCSC Browser VISTA-Point
14
VISTA Browser: Alignment Details direction exon repeats alignment SNPs gene
15
VISTA Browser: Result Position on chromosome Control Panel Graphical display of genome alignments Color Legend Cursor Info Menu & Icons Curve annotation (species) 1 row
16
VISTA Browser: Zooming vs. rhesus vs. dog
17
VISTA browser
18
VISTA Point: Access Overview
19
VISTA Point: Graphics Table
20
VISTA Point: AlignmentsTable sequence
22
Google map-like Dot-Plot
24
BlockView – Synteny Plot tool
27
RegTransBase – experimental data manually curated database of regulatory interactions captured from literature; 6000 papers RegPrecise – computational predictions manually curated database of regulons inferred by comparative genomics approach RegPredict – web tool for regulon inference integrated system for fast and accurate inference of regulons by comparative genomics NAR database issue, 2010; Featured Article NAR Web Server issue, 2010; Featured Article Principal components NAR database issue, 2007
28
mVISTA: Access
29
mVISTA: Interface Our example will show 3 sequences Align up to 100 sequences
30
mVISTA: Input of Sequences Provide your email address Upload your sequences Or enter GenBank ID your email upload file or GenBank ID
31
AVID multiple pair wise alignments accepts finished or draft sequences LAGAN true multiple alignments mVISTA: Input Parameters Shuffle-LAGAN –multiple pair wise alignments –detects sequence rearrangements and inversions
32
mVISTA: Results PDF VISTA Browser VISTA -Point
33
wgVISTA: Microbial Assemblies Comparison wgVISTA: whole genome VISTA Compares 2 sequences (up to 10 Mb) Draft or finished microbial assembly sequences can be used
34
rVISTA: Access
35
Regulatory VISTA (rVISTA): prediction of transcription factor binding sites Simultaneous searches of the major transcription factor binding site database (Transfac) and the use of global sequence alignment to sieve through the data rVISTA search is automatically run when submitting: mVISTA mVISTA genomeVISTA genomeVISTA
36
Human TGATTTCTCGGCAGCAAGGGAGGGCCCCATGACAAAGCCATTTGAAATCCCAGAAGCAATTTTCTACTTACGACCTCACTTTCTGTTGCTGTCTCTCCCTTCCCCTCTG Mouse TGATTTCTCGGCAGCCAGGGAGGGCCCCATGACGAAGCCACTCGAAATCCCAGAAGCAATTTTCTACTTACGACCTCACTTTCTGTTGCTCTCTCTTCCTCCCCCTCCA Dog TGATTTCTCGGCAGCAAGGGAGGGCCCCATGACGAAGCCATTTGAAATCCCAGAAGCGATTTTCTACCTACGACCTCACTTTCTGTTGCGCTCACTCCCTTCCCCTGCA Rat TGATTTCTCGGCAGCCAGGGAGGGCCCCATGACGAAGCCACTCGAAATCCCAGAAGCAATTTTCTACTTACGACCTCACTTTCTGTTGTTCTCTCTTCCTCCCCCTCCA Cow TGATTTCTCGGCAGCCAGGGAGGGCCCCATGACGAAGCCATTTGAAATCCCAGAAGCAATTTTCTACTTACGACCTCACTTTCTGTTGCGTTCTCTCCCTTCCCCTCCT Rabbit TGATTTCTCGGCAGCCAGGGAGGGCCCCACGAC-AAGCCATTCAAAATCCCAGAAGTGATTTTCTACTTACGACCTCACTTTCTGTTG----CTCTCTCCTTCCCTCCA Ikaros-2 Ikaros-2 NFAT Ikaros-2 20 bp dynamic shifting window >80% ID 1. Identify potential transcription factor binding sites for each sequence using library of matrices (TRANSFAC) 2. Identify aligned sites using VISTA 3. Identify conserved sites using dynamic shifting window Regulatory VISTA (rVISTA):
37
rVISTA: Interface your email sequences rVISTA sequence submission: set number Submit email address, sequences, and set parameters Key step: click the box for: Find potential transcription factors
38
rVISTA: Select TRANSFAC Matrices
39
rVISTA: Mailed Results Emailed results will provide a link Choose which binding sites matrices to display You can then choose visualization options display
40
rVISTA: Results Graphic Blue all transcription factor (TF) binding sites Red TF sites which are aligned in both sequences Green TF sites which are aligned & in conserved regions sequences sites
41
Whole Genome rVISTA: Access
42
Whole Genome rVISTA: Select Alignment IDs or symbols upstream range
43
Whole Genome rVISTA: Results sites found view genes
44
Examples of VISTA usage Non-coding regulatory regions, for example enhancers Genes from the same gene families Alternative splicing Transcriptional regulation Genetic studies References collected are available through the Publications link at the VISTA home page http://genome.lbl.gov/vista http://genome.lbl.gov/vista
45
VISTA-related Publications
46
http:/www.openhelix.com
47
VISTA thanks BiologyGenomics Division, LBNL lead by Dr. Edward Rubin Dario BoffelliKelly Frazer Gaby Loots Len PennacchioMarcelo Nobrega Axel Visel Bioinformatics Michael BrudnoOlivier Couronne Simon Minovitsky Igor RatnerAlexander Poliakov Lior Pachter (UCB) Shyam PrabhakarDmitriy RyaboyNameeta Shah Inna Dubchak
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.