Visualization of genomic data Genome browsers
UCSC browser Ensembl browser Others ? Survey
UCSC genome browser Basic functionalities used in exercise Finding a gene by name by sequence Gene structure Orthologues – i.e. functional homolog in other organisms SNP’s - Single Nucleotide Polymorphisms Several other functionalities Gene Sorter - sort according to expression, homology, in situ images of genes in different tissues Custom tracks – upload your own data
Visualization of genomic data Genome browsers
Genome browsers Visualization of a gene >chr5: ATGAAGTTATGGGATGTCGTGGCTGTCTGCCTGGTGCTGCTCCACACCGC GTCCGCCTTCCCGCTGCCCGCCGGTAAGAGGCCTCCCGAGGCGCCCGCCG AAGACCGCTCCCTCGGCCGCCGCCGCGCGCCCTTCGCGCTGAGCAGTGAC TGTAAGAACCGTTCCCTCCCCGCGGGGGGGCCGCCGGCGGACCCCCTCGC ACCCCCACCCGCAGCCAGCCCCGCACGTACCCCAAGCCAGCCTGATGGCT GTGTGGCCTACCGACCCGTGGGCAAGGGGTGCGGGTGCTGAAGCCCCCAG GGGTGCCTGGCTGCCCACTGCTGCCCGCACGCCTGGCCTGAAAGTGACAC GCGCTGGTTTGCCCAGCACAGAGGGGATGGAATTTTTATGCTGCTCCTTT AGCATTCTGATGAACAAATATCCTCCCCACCAGCACCACCACCTCAGTAA Chr Open Reading Frame (ORF) – from start to stop codon Flat files / tab files Exon Intron
Genome browsers Why graphic Display ? Why is a graphic display better than Flat files / tab files A graphic display is compact Meta data available i.e. Support information about a gene Experimental evidence like EST Predicted gene structures SNP information Links to many databases In short much data about a gene is gathered is one place and can be viewed easily.
Genome browsers Visualization of a gene (Ensembl)
Genome browsers Visualization of a gene (UCSC) Exon Intron UTR
UCSC genome browser Easy to use Often updates, but not as often as Ensembl upload of personal tracks Ensembl browser Less easy to use Maintained/updated by several people Gbrowser Genome browsers
BLAT Blast Like Alignment Tool BLAT (2002) Very fast searches (MySQL database) Handle introns in RNA/DNA alignments Check that donor/acceptor rules are followed Data for more that 30 genomes (human, mouse, rat…) Exon Intron Exon Splice sites Donor site Acceptor site GTAG
BLAT genome Browser
BLAT genome Browser Using a search term or position eg Chr1:10,234-11,567
BLAT genome Browser
BLAT genome Browser Using a protein or DNA sequence
Blat genome Browser
BLAT genome Browser ”Details” Correct splice site ?
Logo Plot Information Content IC = -H(p) + log 2 (4) = a p a log 2 p a + 2 The Information content is calculated from a multiple sequence alignment. Result is a graphical visualization of sequence conservation where: Total height at a position is the Information Content Height of single letter is proportional to the frequency of that letter Mutiple alignment of 3 protein sequences: Seq1: A L R K P Q R T Seq2: A V R H I L L I Seq3: A I K V H N N T Pos1: I = [1*log 2 (1)] = log 2 (20) = 4.32 Pos2: I = [1/3*log 2 (1/3)+ 1/3*log 2 (1/3)+ 1/3*log 2 (1/3)] = 2.73 Pos3: I = [2/3*log 2 (2/3)+ 1/3*log 2 (1/3) = 3.38
Logo Plot Exon
BLAT genome Browser ”Details” Correct splice site ?
BLAT genome Browser ”Details” Donor site | Acceptor site exon.... G | GT...intron...AG | exon...
Blat genome Browser
BLAT genome Browser ”Browser” Base, Center & Zoom Known genes Predictions RNA EST Conservation Expression
Genome browsers
BLAT genome Browser Center & zoom
Forward/reverse direction Selected number of tracks
BLAT genome Browser Sequence Orthologs
“klick”
BLAT genome Browser Sequence Orthologs
SNPs
Single Nucleotide Polymorphism SNP SNPs can be located anywere in the genome non synomous (nsSNP) i.e. amino acid is changed (shown below ) Synomous SNP does not affect the the protein An amino acid is coded by 3 nucleotides Valine (V): GTC V I T P Humans are diploid: cells have 2 homologous copies of each chromosome i.e. 2*23 chromosomes. Haploid cells only 23 chromosomes (sex-cells)
Diploid organism - most mammals A chromosome from mother If the red strand is the plus-strand: C;T (or T;C but we write it alphabetical) If the green strand is the minus strand: G;A but we write it as G;A A chromosome from father An example of two homologous copies of ex chromosome 9 within a cell
SNP nomenclature SNPs within a coding region of a piece of DNA might cause a change in the translated protein ie. SNPs within an exon region. Also, SNPs at the boundary of intron/exon regions can have an effect on the protein product. nsSNP (non-synonymous SNP) cSNP (coding SNP) missense SNPs or mutations: nsSNP and cSNP. nonsense SNPS are those that result in a stop-codon SNPs within an exon region that do NOT change the protein product sSNP (synonymous SNP) ATG 5’
SNPs
Exercise 1.Basic understanding of the graphics 2.Effect of Single Nucleotide Polymorphisms (SNPs) 3.Finding Orthologue genes 4.Identify chromosomal locus for a gene