Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA.

Slides:



Advertisements
Similar presentations
Genomics – The Language of DNA Honors Genetics 2006.
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Ulf Schmitz, Statistical methods for aiding alignment1 Bioinformatics Statistical methods for pattern searching Ulf Schmitz
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Progressive MSA Do pair-wise alignment Develop an evolutionary tree Most closely related sequences are then aligned, then more distant are added. Genetic.
Transcription factor binding motifs (part I) 10/17/07.
MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
Inter-species sequence conservation and intra- species sequence diversity Apratim Mitra.
Investigating the Importance of non-coding transcripts.
Tutorial 5 Motif discovery.
An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.
Project No. 7 Structural Genomics of the RGS Protein Family: Development of a Public Web-based Informatics Database Resource Dahai Gai Samuel Kalet Hongbo.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
A Computational Analysis of the H Region of Mouse Olfactory Receptor Locus 28 Deanna Mendez SoCalBSI August 2004.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Bioinformatics Sequence Analysis III
Genetic Effects of Stress in Vervet Monkey Olivera Grujic Dr. Eleazar Eskin’s Lab, UCLA Dr. Nelson Freimer’s Lab,UCLA SoCalBSI, 2008.
Sequence comparison: Local alignment
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Presented by Liu Qi An introduction to Bioinformatics Algorithms Qi Liu
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Selfish DNA Honors Genetics.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Copyright OpenHelix. No use or reproduction without express written consent1.
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST INVESTIGATION 3 BIG IDEA 1.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Cis-regulatory Modules and Module Discovery
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Cluster validation Integration ICES Bioinformatics.
Sequence Alignment.
Construction of Substitution matrices
Doug Raiford Phage class: introduction to sequence databases.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
1 Repeats!. 2 Introduction  A repeat family is a collection of repeats which appear multiple times in a genome.  Our objective is to identify all families.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
What is sequencing? Video: WlxM (Illumina video) WlxM.
Gene Expression Ilana Granovsky Jonathan Laserson.
Pairwise Sequence Alignment. Three modifications for local alignment The scoring system uses negative scores for mismatches The minimum score for.
INVESTIGATION 3 BIG IDEA 1
Sequence comparison: Local alignment
A Hybrid Algorithm for Multiple DNA Sequence Alignment
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
INVESTIGATION 3 BIG IDEA 1
INVESTIGATION 3 BIG IDEA 1
Gene Annotation with DNA Subway
BLAST.
INVESTIGATION 3 BIG IDEA 1
Presented by, Jeremy Logue.
Volume 32, Issue 6, Pages (March 2015)
Evolution of Alu Elements toward Enhancers
Problems from last section
Presented by, Jeremy Logue.
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Recently Mobilized Transposons in the Human and Chimpanzee Genomes
Presentation transcript:

Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes Kendra Baughman York Marahrens’ Lab UCLA

Overview GoalBackground Prior Studies StrategyResults Remaining Tasks Future Directions

Goal Determine if there are motifs present among Alu elements near highly expressed genes, and missing from Alu elements near poorly expressed genes, that might contribute to gene expression

Background – Alu Elements Repetitive sequence Transposons (DNA sequences that make copies of themselves and insert elsewhere in the genome) Over 1 million in human genome ~50 subfamilies categorized by sequence differences

Prior Studies “Repetitive sequence environment distinguishes housekeeping genes” Eller, Daniel et al. submitted “Alu abundance positively correlates with gene expression level” C.D. Eller et. al. submitted

Higher Alu concentration near widely expressed genes

Higher Alu concentration near highly expressed genes

Alu Subfamilies Subfamily # Alu in the Subfamily

Data Human gene expression levels from microarray data (Stan Nelson’s lab, UCLA) Alu information from UCSC Genome Browser, Repeat masker tracks

Goal, reiterated Determine if there are motifs present among Alu elements near highly expressed genes, and missing from Alu elements near poorly expressed genes, that might contribute to gene expression

Strategy Find Alu “near” high and low expression genes (within 20kb) Perform multiple sequence alignment on Alu sequences Identify motifs preferentially conserved around highly expressed genes (these motifs could help the genes be highly expressed)

Strategy Find Alu “near” high and low expression genes (within 20kb) Perform multiple sequence alignment on Alu sequences Identify motifs preferentially conserved around highly expressed genes (these motifs could help the genes be highly expressed)

Used Perl scripts to extract information from MySQL databases Grouped genes by expression level in R Chose genes in top and bottom 20% Genes Expression Level Screening the genes…

 Used MySQL queries to determine flanking region  Used Perl scripts to screen Alu located within 20kb of genes  Omitted Alu in overlapping flanking regions PERCENTAGES OF ALU THROWNOUT 50%11%17%50kb 28%7% 20kb 20%6%3%10kb Chrom19 1st 20mb Chrom10Chrom1 1st 20mb HI-gene LO-gene HI-Alu??-AluLO-Alu Screening the Alu…

Strategy Find Alu “near” high and low expression genes (within 20kb) Perform multiple sequence alignment on Alu sequences Identify motifs preferentially conserved around highly expressed genes (these motifs could help the genes be highly expressed)

Alignment Process… First alignment tool: Clustalw –Slow, inaccurate Second alignment tool: T-COFFEE –Can’t handle hundreds of sequences Third alignment tool: MUSCLE Aligning thousands of sequences = big gaps and processing limitations Chose to analyze by subfamily (S, Sp/q) –Aligned elements around highly expressed genes –Aligned elements around poorly expressed genes –Profile high/low alignment –Consensus sequence alignment

 Alignment viewed in Jalview

AluS AluSp-q EPS AluSp/q Alignments of Alu Sp/q and AluS Elements High Alu High conserv. Low conserv.

Strategy Find Alu “near” high and low expression genes (within 20kb) Perform multiple sequence alignment on Alu sequences Identify motifs preferentially conserved around highly expressed genes (these motifs could help the genes be highly expressed)

AluS * * Alu w/ a base: All Alu: Frequency of consensus base Alu w/ a base: All Alu: Frequency of consensus base High Alu: TATCCACGCCTGCAAAATCTCAGCCACTCCCAAAGTTGCTGCG CANCC-CGCCT-CGTAATCCCAA AATGTT--TG-G Low Alu Alu consensus sequence All Alu: All Alu: Alu w/ a base: 596** ****** Alu w/ a base: High Alu: TGCTCAGAAATTTCTCGGCTCACTGCAACCTCCGTATCACCCC Low Alu: CG---A-AA CTCCGT--T---CT AluSp/q Alu consensus sequence Frequency of consensus base

Remaining Tasks Analyze the remaining sub-families Determine whether identified motifs agree across subfamilies BLAST motifs against all Alu sequences and correlate alignment scores with expression level

Future Directions Cluster alignments into a relationship tree to see if HI and LO Alu groups cluster differently from each other –Create a matrix of pairwise alignments and cluster these into a tree using nearest neighbour clustering Use Hidden Markov Models or Gibbs sampling to identify sequence motifs (non- multiple sequence alignment method of motif finding)

Acknowledgements Danny Eller York Marahrens Marc Suchard Chiara Sabatti SoCalBSINIH/NSF