Sequence Comparison and Genome Alignment in the Human Genome Jian Ma Jian Ma | Sequence Comparison and Genome Alignment1 Powerpoint: Casey Hanson.

Slides:



Advertisements
Similar presentations
Enrichment Map GSEA Tutorial
Advertisements

Web Apollo Resources at the National Agricultural Library Christopher Childers NAL ARS USDA i5k.nal.usda.gov.
Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Copyright OpenHelix. No use or reproduction without express written consent1.
Lab 3.41 Demo: Exploiting the UCSC Genome Browser Stefanie Butland UBC Bioinformatics Centre
UCSC Genome Browser Tutorial
Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
The Genome Genome Browser Training Materials developed by: Warren C. Lathe, Ph.D. and Mary Mangan, Ph.D. Part 1.
Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles 1 PowerPoint by Casey Hanson Protein Sequence, Structure, and Function | Gustavo.
CHAPTER 9 Introducing Microsoft Office Learning Objectives Start Office programs and explore common elements Use the Ribbon Work with files Use.
CHAPTER 9 Introducing Microsoft Office Learning Objectives Start Office programs and explore common elements Use the Ribbon Work with files Use.
The UCSC Genome Browser Introduction
Introduction to RNA-Seq & Transcriptome Analysis
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
Use cases for Tools at the Bovine Genome Database Apollo and Bovine QTL viewer.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Polymorphism & Variant Analysis Lab Saurabh Sinha Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 1 Powerpoint by Casey Hanson.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Copyright OpenHelix. No use or reproduction without express written consent1.
Genomics and Personalized Care in Health Systems Lecture 5 Genome Browser Leming Zhou, PhD School of Health and Rehabilitation Sciences Department of Health.
Copyright OpenHelix. No use or reproduction without express written consent1.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
VISTA family of computational tools for comparative genomics How can we leverage genome sequences from many species to learn about genome function?How.
Wednesday, September 11, 2013 TAKE OUT: Bioinformatics pre-lab (p. 1-2); tear off pages 3-8 from lab handout AND RECYCLE ! SAVE analysis questions on page.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Sackler Medical School
A Genomics View of Unix. General Unix Tips To use the command line start X11 and type commands into the “xterm” window A few things about unix commands:
Basic Local Alignment Search Tool BLAST Why Use BLAST?
The UCSC Table Browser & Custom Tracks Advanced searching and discovery using the UCSC Table Browser and Custom Tracks Osvaldo Graña CNIO Bioinformatics.
数据库使用 杨建华 2010/9/28. Outline of the Topics UCSC and Ensembl Genome Browser (Blat vs Blast vs Blastz vs Multiz) 挖掘数据用 Table Browser 或 BioMart 用户友好化你的数据.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Copyright OpenHelix. No use or reproduction without express written consent1.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
Copyright OpenHelix. No use or reproduction without express written consent1.
Web Apollo Resources at the National Agricultural Library Christopher Childers NAL ARS USDA i5k.nal.usda.gov.
Copyright OpenHelix. No use or reproduction without express written consent1.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Accessing and visualizing genomics data
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Welcome to the combined BLAST and Genome Browser Tutorial.
The Genome Genome Browser Training Materials developed by: Warren C. Lathe, Ph.D. and Mary Mangan, Ph.D. Part 2.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Lisa Stubbs | Chip-Seq Peak Calling in Galaxy1.
Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles Protein Sequence, Structure, and Function Lab v1 | Gustavo Caetano - Anolles 1.
Using BLAST to Identify Species from Proteins
NGS Analysis Using Galaxy
Regulatory Genomics Lab
Chip – Seq Peak Calling in Galaxy
Introducing Microsoft Office 2010
Using BLAST to Identify Species from Proteins
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Conservation in Evolution
Yating Liu July 2018 G-OnRamp workshop
Regulatory Genomics Lab
Using BLAST to Identify Species from Proteins
Introduction to RNA-Seq & Transcriptome Analysis
Regulatory Genomics Lab
Chip – Seq Peak Calling in Galaxy
User guide for OneDrive
Presentation transcript:

Sequence Comparison and Genome Alignment in the Human Genome Jian Ma Jian Ma | Sequence Comparison and Genome Alignment1 Powerpoint: Casey Hanson

Introduction This goals of the lab are as follows: 1.Gain experience using BLAST and Genome Browsers by looking at repeat families in the VHL gene. 2.Become familiar with BLAT and the UCSC website by discovering the identity of a mystery sequence. 3.Visualize pairwise multi-genome alignment and chromosomal rearrangements. 4.View phylogeny based multi-genome alignment. 5.Use UCSC tools and Galaxy to intersect annotated functional regions between human and other placental animals. Jian Ma | Sequence Comparison and Genome Alignment2

Step 0: Shared Desktop Directory For viewing and manipulating files on the classroom computers, we provide a shared directory in the following folder on the desktop: classes/mayo In today’s lab, we will be using the following folder in the shared directory: classes/mayo/ma Bacterial Genome Assembly v9 | C. Victor Jongeneel3

BLAST & Genome Browser In this exercise, we will use BLAST (Basic Local Alignment Search Tool) to search for significant occurrences of a class of transposable elements (TEs) called Short INterspersed Elements (SINEs), specifically of the ALU family, in the well-known VHL tumor suppressor gene. The goal of this exercise is to gain experience using BLAST, particularly blastN, and the UCSC genome browser to answer biologically relevant questions. Jian Ma | Sequence Comparison and Genome Alignment4

Step 1A: BLAST VLH in ALU Database Go to the following web page: Click nucleotide_blastnucleotide_blast In the Enter Query Sequence box, paste the accession # for VHL: AF In the Database drop-down list, select the following: Human ALU repeat elements (alu_repeats) Click the BLAST button. Jian Ma | Sequence Comparison and Genome Alignment5

Step 1B: BLAST VLH in ALU Database Jian Ma | Sequence Comparison and Genome Alignment6

Step 2A: Interpreting BLAST Results Coordinates of VHL gene Jian Ma | Sequence Comparison and Genome Alignment7 Very Good Matches Color Indicates Quality of Match Good Matches Okay Matches A match is a significant similarity between a region of the query and a region of a database sequence. Lines between boxes indicate ‘gaps’ between matches in the query sequence. (The next slide has a legend for interpretation)

Step 2B: Interpreting BLAST Results Jian Ma | Sequence Comparison and Genome Alignment8 Exonic regions less likely to have ALU repeats. Matches like this are likely to be located in intronic regions. Note the following legend for interpreting a match. Excellent Match Good Match Okay Match Exo n Intron Exo n Intron

Step 3A: Examine VHL in UCSC Browser Let’s look at the structure of the VHL gene in a Genome Browser to verify that ALU elements are confined to the introns. Go to the following web page: Click Genome BrowserGenome Browser In the search term, type VHL Click submit Click the 2 nd link: VHL (uc003bvd.3) at chr3: VHL (uc003bvd.3) at chr3: Jian Ma | Sequence Comparison and Genome Alignment9

Step 3B: Examine VHL in UCSC Browser Enter chr3:10,177,301-10,201,372 into input box and click go. Right click on tracks NOT shown below and hide them. Right click on the RepeatMasker track and click full. It is dense by default. Adjust the zoom until you get a view you are comfortable with. Jian Ma | Sequence Comparison and Genome Alignment10

Step 3C: Examine VHL in UCSC Browser Repeat tracks are 3’ to the gene, 5’ to the gene, or in the intronic region. This validates our hypothesis. ALUs are not the only family of SINEs located in the intronic regions. What other SINE families does VHL have? What about other TE classes other than SINE? Jian Ma | Sequence Comparison and Genome Alignment11 (Answers provided in separate pdf)

BLAT In this exercise, we will use BLAT (Basic Local Alignment Tool) to search for the identity of a mystery gene annotated in the human genome. The goal of this exercise is to gain experience using BLAST, particularly blastN, and the UCSC genome browser to answer biologically relevant questions. Jian Ma | Sequence Comparison and Genome Alignment12

BLAST v. BLAT BLAST Can find matches to a query in any set of GenBank sequences. Not limited to a given k-mer size. × Consumes a lot of memory. × Slow compared to BLAT. BLAT × Limited to matches to a query in a particular reference genome. × Limited to non-overlapping 11-mers for DNA. Can fit an entire genome in memory ( < 1GB) of RAM. Fast compared to BLAST. Jian Ma | Sequence Comparison and Genome Alignment13

Step 1A: BLAT the Mystery Sequence Go to the following web page: Click BLATBLAT Open our mystery sequence, located below, in Notepad. classes/mayo/ma/mystery_sequence.txt Paste the sequence into the textarea Click submit Jian Ma | Sequence Comparison and Genome Alignment14

Step 1B: BLAT the Mystery Sequence Jian Ma | Sequence Comparison and Genome Alignment15 Screenshot of the web form for BLAT.

Step 2A: Identify Mystery Sequence BLAT will return a list of significant matches in the genome. Investigate the matches in the list by clicking browser for each match For example, click the first browser link here. Jian Ma | Sequence Comparison and Genome Alignment16

Step 2B: Identify Mystery Sequence The screenshot below shows UCSC and RefSeq genes aligned to the Mysterious Sequence. In particular, CYP2A13. Examine the other matches on the previous slide in the genome browser. Keep in mind 2 questions: (Answers provided at the end of the document) A. How many potential genes does the mystery sequence come from? B. What is the relationship among these genes? Jian Ma | Sequence Comparison and Genome Alignment17

Pairwise Whole Genome Alignments In this exercise, we will utilize the UCSC Genome Browser to view whole genome alignments computed by lastZ of the following genomes individually to human: organutan, mouse, dog, and opossum. We will investigate these alignments to see if we can discover chromosomal rearrangements. Jian Ma | Sequence Comparison and Genome Alignment18

Step 1: Create a Custom UCSC Track Go to the UCSC Genome Browser: Under the My Data Tab, click Create Custom Tracks: In the Paste URLs textbox paste the following and click submit: (no commas) chr On the next page, click Go to Genome Browser Jian Ma | Sequence Comparison and Genome Alignment19

Step 2A: Track Addition The track should look similar to what is below: ’ Jian Ma | Sequence Comparison and Genome Alignment20

Step 2B: Track Addition and Removal To get ‘Pairwise Alignments’ we need to turn a few tracks on and one track off. Specifically, we need to select: Primate Chain/NetPlacental Chain/NetVertebrate Chain/Net. Underneath the Comparative Genomics Tab, turn these tracks to dense. Additionally, set Conservation to hide and click refresh. Jian Ma | Sequence Comparison and Genome Alignment21

Step 2C: Track Addition The resulting view should look like the figure below. There is one problem: our species of interest are not being displayed. Jian Ma | Sequence Comparison and Genome Alignment22

Step 2D: Species Selection To select the correct species, go back to the Comparative Genomics Tab. Click on the Primate Chain/Net link. In the resulting window, set Chains to hide and make sure only Orangutan is selected. Click submit Jian Ma | Sequence Comparison and Genome Alignment23

Step 2E: Species Selection Continued Conduct Step 2D for the other two tracks: Placental Chain/NetVertebrate Chain/Net Make sure your configuration resembles the screenshots below: Jian Ma | Sequence Comparison and Genome Alignment24 Placental Chain/Net Vertebrate Chain/Net

Step 2F: Expand Tracks On the tracks for each species, Right Click and select Full. The resulting Genome Browser (after moving the tracks to the top) should look like the following: Jian Ma | Sequence Comparison and Genome Alignment25

Step 3: Whole Genome Alignment Analysis. Investigate the tracks for each species and answer the following questions. A.Are the sequence counterparts co-linear with respect to human? If not, is their evidence of genomics rearrangements in this region? Which kind? B.Can you infer when these rearrangements happened evolutionarily on the diagram to the right? Answers provided in separate pdf. Jian Ma | Sequence Comparison and Genome Alignment26

Phylogeny Based Whole Genome Alignment In this exercise, we will utilize the UCSC Genome Browser to view a refined whole genome alignment of orangutan, mouse, dog, and opossum genomes to human. This alignment is produced by Multiz, a program that utilizes pairwise whole genome alignments of many species and, using a phylogenetic tree, improves the alignment. Jian Ma | Sequence Comparison and Genome Alignment27

Step 1: Setup Multiz Visualization Go to the UCSC Genome Browser: Upload the following as a Custom Track and go to the genome browser, as in the previous exercise: (no commas) chr Under the Comparative Genomics tab in the genome browser, click on Conservation. Ensure the following settings are in place on the next 2 pages: Jian Ma | Sequence Comparison and Genome Alignment28

Step 1B: Setup Multiz Visualization Jian Ma | Sequence Comparison and Genome Alignment29

Step 1C: Setup Multiz Visualization Jian Ma | Sequence Comparison and Genome Alignment30 Once your configuration resembles the last 2 figures, click submit

Step 2: Multiz Visualization Analysis Jian Ma | Sequence Comparison and Genome Alignment31 Investigate the tracks for each species and answer the following questions: A.Is this region highly conserved in mammals? B.Look closely at the Multiz track. Do you see anything strange in the human sequence compared to the other species? What could be the reason for this discrepancy? (Answers provided in separate pdf) After rearranging tracks, the genome browser should resemble the figure below:

Intersection of Annotated Regulatory Regions in Human and Placental Mammals In this exercise, we will use Galaxy to intersect annotated regulatory regions in human with annotated regions in other placental mammals. We will then view the intersection in the UCSC genome browser Jian Ma | Sequence Comparison and Genome Alignment32

Step 1A: Place Regulatory Data in Galaxy Login to Galaxy : Upload the sequence of predicted regulatory regions in h19 to Galaxy: classes/mayo/ma/PRe_Mod_hg19.bed Make sure to identify hg19 as your reference genome. Acquire all conserved regions in placental mammals from the UCSC Main Table Browser in Galaxy: Jian Ma | Sequence Comparison and Genome Alignment33

Step 1B: Place Regulatory Data in Galaxy Select Comparative Genomics for Group Select Mammal E1: phastConsElements45wayPlacental for table. Select Genome for region. Select Galaxy for send output to. Click Get Output On the next screen, click Send Query to Galaxy. Jian Ma | Sequence Comparison and Genome Alignment 34

Step 2: Intersect Datasets Go to Operate on Genomic Intervals in Galaxy and select Interesect. Select the parameters below and click Execute. When finished, click display at UCSC in history pane. Jian Ma | Sequence Comparison and Genome Alignment35 UCSC Results chr19 regulatory regions.

Step 3: Predicted Modules Overlap with PAX5 Regulators Jian Ma | Sequence Comparison and Genome Alignment36

Exploratory Exercise Pick a gene of interest. (VHL, CMYC, ETS1, TBP, USF2, GATA- 1, …) Visualize the intersected intervals in the UCSC Genome Browser. See how this region correlates with results from ENCODE to assess their functional roles. Jian Ma | Sequence Comparison and Genome Alignment37 We will come around to help.