UCSC Genome Browser Tutorial

Slides:

Advertisements

Similar presentations

Chapter 3 – Web Design Tables & Page Layout

Advertisements

Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.

Using HapMap.Org A Tutorial Lincoln Stein, Cold Spring Harbor Laboratory.

Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.

InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.

Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.

Copyright OpenHelix. No use or reproduction without express written consent1.

Lab 3.41 Demo: Exploiting the UCSC Genome Browser Stefanie Butland UBC Bioinformatics Centre

Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite.

Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.

NGS Analysis Using Galaxy

Working with the Conifer_dbMagic database: A short tutorial on mining conifer assembly data. This tutorial is designed to be used in a “follow along” fashion.

A Gentle Introduction to UCSC Genome Browser 陳任志, 游岳齊.

The Genome Genome Browser Training Materials developed by: Warren C. Lathe, Ph.D. and Mary Mangan, Ph.D. Part 1.

Spring 2006, v7 Copyright OpenHelix. No use or reproduction without express written consent 1 The UCSC Genome Browser Search, retrieve and display the.

Copyright OpenHelix. No use or reproduction without express written consent1.

1 The Genome Browser allows you to –Browse the Rice-Japonica, Maize and Arabidopsis genomes. –View the location of a particular feature on the rice genome.

1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.

The UCSC Genome Browser Introduction

Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.

GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.

is accessible at: The following pages are a schematic representation of how to navigate through ALE-HSA21.

Copyright OpenHelix. No use or reproduction without express written consent1.

UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.

Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.

Copyright OpenHelix. No use or reproduction without express written consent1.

Copyright OpenHelix. No use or reproduction without express written consent1.

Genomics and Personalized Care in Health Systems Lecture 5 Genome Browser Leming Zhou, PhD School of Health and Rehabilitation Sciences Department of Health.

Copyright OpenHelix. No use or reproduction without express written consent1.

Welcome to DNA Subway Classroom-friendly Bioinformatics.

Browsing the Genome Using Genome Browsers to Visualize and Mine Data.

Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.

Copyright OpenHelix. No use or reproduction without express written consent1.

Sackler Medical School

Copyright OpenHelix. No use or reproduction without express written consent1.

The UCSC Table Browser & Custom Tracks Advanced searching and discovery using the UCSC Table Browser and Custom Tracks Osvaldo Graña CNIO Bioinformatics.

数据库使用杨建华 2010/9/28. Outline of the Topics UCSC and Ensembl Genome Browser (Blat vs Blast vs Blastz vs Multiz) 挖掘数据用 Table Browser 或 BioMart 用户友好化你的数据.

Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.

Copyright OpenHelix. No use or reproduction without express written consent1.

Copyright OpenHelix. No use or reproduction without express written consent1.

Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.

Copyright OpenHelix. No use or reproduction without express written consent1.

Copyright OpenHelix. No use or reproduction without express written consent1.

Copyright OpenHelix. No use or reproduction without express written consent1.

Copyright OpenHelix. No use or reproduction without express written consent1.

Copyright OpenHelix. No use or reproduction without express written consent1.

Welcome to Gramene’s RiceCyc (Pathways) Tutorial RiceCyc allows biochemical pathways to be analyzed and visualized. This tutorial has been developed for.

Copyright OpenHelix. No use or reproduction without express written consent1.

Copyright OpenHelix. No use or reproduction without express written consent1 1.

UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.

Copyright OpenHelix. No use or reproduction without express written consent1.

Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.

Accessing and visualizing genomics data

Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.

Welcome to the combined BLAST and Genome Browser Tutorial.

The Genome Genome Browser Training Materials developed by: Warren C. Lathe, Ph.D. and Mary Mangan, Ph.D. Part 2.

Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.

Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.

Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.

IGV Demo Slides:/g/funcgen/trainings/visualization/Demos/IGV_demo.ppt Galaxy Dev: 0.

Visualization of genomic data

Visualization of genomic data

Ensembl Genome Repository.

Welcome to the Markers Database Tutorial

A Tutorial Lincoln Stein, Cold Spring Harbor Laboratory

Regulatory Genomics Lab

Welcome to the GrameneMart Tutorial

Presentation transcript:

UCSC Genome Browser Tutorial http://genome.ucsc.edu/ http://genome-test.cse.ucsc.edu/ The UCSC Toolset & Portal to the Human Genome Genome Browser Table Browser “I was blind and now I can see” http://cs273a.stanford.edu

UCSC Genome Browser [version9a] http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml http://cs273a.stanford.edu

The UCSC Homepage: http://genome.ucsc.edu navigate General information Specific information— new features, current status, etc. Shown here is the homepage for the UCSC Genome Bioinformatics site. When you first arrive, you will see a page that looks like this. First, there is a section that contains general information about the site. Second, there is a specific section for NEWS--new features, software or data changes, the current state of the data that is available. This information is worth a quick check when you visit the site, in case there have been changes since the last time you visited. But the real substance of the site—the data and tools—are accessible in a couple of ways from this page. There are navigation bars at the top and left side which will permit you to access all of the available features. You will begin your experience at the UCSC database by navigating from these blue areas. To actually get in and start searching the database, there are several options—you can search by text—gene name, gene symbol, keywords, ID, and so on. To do this we will use the Genomes or the Genome Browser link. I have circled the Genome Browser link, which you could click to access the search page.

The Genome Browser Gateway start page choices, December 2006 3 1 Make your Gateway choices: Select Clade Select species: search 1 species at a time Assembly: the official backbone DNA sequence Here we are going to focus on the options that you have to search a genome using the Gateway page. This screen shot isolates that part of the page for us so we can focus on the specific options that are available to you. The first option is clade, and then the second is the genome, or species, choice. At one time all of the species were in a single list, but there are so many species now that they have been re-organized into these 2 menus. You will search one species at a time in the Genome Browser. Use the pulldown menus to select and highlight the species name that you want to use in your search. Next, you have to choose an ASSEMBLY. Assembly refers to the official backbone genomic sequence that is used to create the framework on which to hang all the other data. Backbone data comes from the “official” groups who release genome sequence. UCSC obtains the official assembly, and then generates the annotation tracks for that genome. The date they obtain that sequence is what we see in the Assembly menu. Usually you will want the most current assembly, but sometimes you may want to look back at older data and you can see that is still available for a while. Even older data is still available in the UCSC archives if you need it. “Position or search term” is the 4th option. This is where you put the symbol, keyword, or ID information about where you want to examine in the genome. Image width is the number of pixels used to make the genome viewer window you will see. You can set this from around 300 up to 5000 pixels if your monitor is that big! The last thing I’ll point out here is the button for configuring tracks and display. You can make changes here to the display—such as the font sizes and feature appearance, but later I’ll show you a couple of other places you can access this as well. practically speaking, there is no such thing as a genome. there is only a genome assembly. assemblies update. frequently. think moving target...

Everything in Genomics is a Moving Target The genomes Their annotations The Portals Our understanding of Biology Conclusion: write code that can be run... and rerun

The Genome Browser Gateway start page, basic search Shown here is a portion of the Genome Browser Gateway page. By default the search is set to Human when you first arrive, but we will see that you can change the species later. We will begin to talk about searching using the text search feature from this Genome Browser Gateway page. You can do a text search for information such as gene names, chromosome number, chromosome region, your favorite gene or marker identification number (ID), GenBank submitter name, and more. You can use a keyword to find records. Examples of the kinds of searches you could do are shown on the lower part of this page—see the request items, and the expected responses from the genome browser. Remember that you can just check out this section for helpful reminders of the correct query format when doing your own searches later on. We are going to go a little deeper into your search options from this gateway—we’ll take each option and explore what you can expect from a given search.

The Genome Browser Gateway start page choices, December 2006 4 5 6 Make your Gateway choices: Select Clade Select species: search 1 species at a time Assembly: the official backbone DNA sequence Position: location in the genome to examine Image width: how many pixels in display window; 5000 max Configure: make fonts bigger + other choices Here we are going to focus on the options that you have to search a genome using the Gateway page. This screen shot isolates that part of the page for us so we can focus on the specific options that are available to you. The first option is clade, and then the second is the genome, or species, choice. At one time all of the species were in a single list, but there are so many species now that they have been re-organized into these 2 menus. You will search one species at a time in the Genome Browser. Use the pulldown menus to select and highlight the species name that you want to use in your search. Next, you have to choose an ASSEMBLY. Assembly refers to the official backbone genomic sequence that is used to create the framework on which to hang all the other data. Backbone data comes from the “official” groups who release genome sequence. UCSC obtains the official assembly, and then generates the annotation tracks for that genome. The date they obtain that sequence is what we see in the Assembly menu. Usually you will want the most current assembly, but sometimes you may want to look back at older data and you can see that is still available for a while. Even older data is still available in the UCSC archives if you need it. “Position or search term” is the 4th option. This is where you put the symbol, keyword, or ID information about where you want to examine in the genome. Image width is the number of pixels used to make the genome viewer window you will see. You can set this from around 300 up to 5000 pixels if your monitor is that big! The last thing I’ll point out here is the button for configuring tracks and display. You can make changes here to the display—such as the font sizes and feature appearance, but later I’ll show you a couple of other places you can access this as well.

The Genome Browser Gateway start page, basic search text/ID searches 4 Helpful search examples, suggestions below Use this Gateway to search by: Gene names, symbols Chromosome number: chr7, or region: chr11:1038475-1075482 Keywords: kinase, receptor IDs: NP, NM, OMIM, and more… See lower part of page for help with format Shown here is a portion of the Genome Browser Gateway page. By default the search is set to Human when you first arrive, but we will see that you can change the species later. We will begin to talk about searching using the text search feature from this Genome Browser Gateway page. You can do a text search for information such as gene names, chromosome number, chromosome region, your favorite gene or marker identification number (ID), GenBank submitter name, and more. You can use a keyword to find records. Examples of the kinds of searches you could do are shown on the lower part of this page—see the request items, and the expected responses from the genome browser. Remember that you can just check out this section for helpful reminders of the correct query format when doing your own searches later on. We are going to go a little deeper into your search options from this gateway—we’ll take each option and explore what you can expect from a given search.

The Genome Browser Gateway sample search for Human TP53 Sample search: human, March 2006 assembly, tp53 select Now that we have examined the search options, let’s perform a sample search of this database. The search that I’ll be demonstrating uses the HUMAN genome, the March 2006 assembly. If you are seeing these slides at a time when there is a later assembly, things might look slightly different. For this example, I’m going to use the human TP53 gene—this is an important and medically relevant gene that has been implicated in some cancers. It is a well characterized gene for our example. Once you have made the appropriate selections among the options, added your position or search text, you would click the SUBMIT button and wait for your results…..which we see below. Here I show a part of the results page for the text search for TP53. That text appears in a number of different records, so you have to select the one you want from this results page. Sometimes you can go directly to the browser—if you use a specific ID that could happen. However, with text searches often you will have to select from the records. Usually I choose a record that appears to be the correct gene symbol or name. And if there appear to be multiple entries that are likely to be splice variants, I may select the longest of them (as indicated by the nucleotide range at the end of the link). For my example here, I will choose the link that says NM_000546, tumor protein p53, at the top. Click that link to go to the TP53 position in the genome. Select from results list ID search may go right to a viewer page, if unique

Overview of the whole Genome Browser page (mature release) } Mapping and Sequencing Tracks Genes and Gene Prediction Tracks Genome viewer section Groups of data mRNA and EST Tracks Shown here is an overview of the page that results from clicking the link in our results list. I use this slide to illustrate the major organizational concepts of the Genome Browser. At the top of the page you will see the Genome Viewer section. Here you will see the diagrammatic representation of the genome and annotation track features in this region. Soon we will examine this data and the visual cues in more detail. At the bottom of the page you will see the controls that you can use to turn the data in the viewer on or off. The data is organized into GROUPS for quickly finding data of interest. These are groups of similar data, such as Mapping and Sequencing Tracks, Genes and Gene Prediction tracks, and so on. Each GROUP contains the individual TRACKS, or the lines of annotation. A group at the bottom of the page corresponds to a section in the viewer. Here I illustrate that the data from the Mapping and Sequencing tracks group is displayed in the uppermost part of the viewer. Next, the Genes and Gene Prediction tracks are located in the next section down in the viewer. In the viewer the separate GROUPs are indicated by the color change along the left side of the image area, from gray to blue. Understanding this Group and Track organization will also help you to understand the Table Browser functions we’ll discuss later. This is a Genome Browser page at a mature stage of this assembly. You can see that there are many track and image controls seen down at the bottom of the page. At the very beginning of a release—there is only a CORE set of tracks at first, not all of the tracks are available. Over time these will be added to the browser—so the actual track options you see will accumulate over time. Tracks take time to create—within UCSC, and from other contributors all over the world. So, the first day of a new release the SNPs may not be there. However, they are sure to appear over time. A key point to make here: the official sequence that forms the framework for this assembly will remain frozen over the course of time. However, the data in the annotation tracks may change. It may be updated periodically—for example, new data for ESTs and mRNAs is downloaded from GenBank every week. New data types may be added, or tracks may be updated, at any time. So although the official sequence remains the same, the annotation tracks data may change. Expression and Regulation Comparative Genomics Variation and Repeats ENCODE Tracks

Different species, different tracks, same software Another point to make at this time is that the UCSC Genome Browser has dozens of different species genome browsers. Here are a few of the images of these different species. As you can see from a quick look, the interface and display is very similar, and the way the software works will be similar as well. Although we are focusing on the human genome in our slides today—you should know that all these species share the software functionality that we will be talking about. However, different species will have different annotation tracks. Just because you see a certain track in the human browser, it does not mean that the same track will be available in Fugu, for example. Similarly, there may be data in yeast that will not be available in the human genome browser. Species may have different data tracks Layout, software, functions the same

Sample Genome Viewer image, TP53 region base position STS markers Known genes RefSeq genes GenBank seqs repeats 17 species compared SNPs single species compared At this time, let’s focus on the viewer section of the Genome Browser. This is the default view, after a search for TP53. I want to quickly orient you to the things that you are seeing when you look at the default setup of the genome viewer. One of the first things to notice is that we can see that we are in the position of the genome that we expected by looking at the label on the side of the known gene track, which indicates the TP53 gene location—which I have highlighted in RED. Notice that one of the TP53 symbols is highlighted black: that is the specific one that we clicked from our results list to arrive here. At the very top of the image there is a track called BASE POSITION, which I have been calling the genome backbone. This is the actual base of every single nucleotide of the genome backbone. As you can see, we are on chromosome 17 around base number 7 million something-something-something….The viewer displays numbers unless you are zoomed all the way in, and then you will see the individual nucleotide letters A, T, G and C themselves. As you look down the viewer, you will see many different data types are represented…STS markers, known genes, gene predictions, mRNAs, ESTs, evolutionary relationships, and repeats. This is just the default view, though—other data types are available for you to display. Immediately from the viewer, you can see that you have a lot of information about the TP53 region. Let’s talk a little bit more about the display of the features in the viewer.

Visual Cues on the Genome Browser Tick marks; a single location (STS, SNP) Intron, and direction of transcription <<< or >>> < exon < < < < < < < ex 5' UTR 3' UTR Track colors may have meaning—for example, Known Gene track: If there is a corresponding PDB entry, = black If there is a corresponding NCBI Reviewed seq, = dark blue If there is a corresponding NCBI Provisional seq, = light blue For some tracks, the height of a bar is increased likelihood of an evolutionary relationship (conservation track) Various data objects will be represented differently in the Genome Browser. For some objects, there are just single locations, or very short stretches of sequence. For example, STS sequence tagged sites, or SNPs, simple nucleotide polymorphisms, are indicated by vertical tick marks. Sometimes if there are several close together they may look like a broader bar—but essentially these are indicating a small single location. For the Known Genes track, there are several cues provided. Coding region exons are the tallest boxes. Half-size boxes indicate exons that comprise the 5’ and 3’ Untranslated Regions, or UTRs. Further, you can tell the direction of the transcription of this coding unit if you look at the little arrowheads which point to the left or to the right on the intron section. In the example diagram I have here, the arrowheads point to the left, indicating that this gene is transcribed from the 5’ UTR on the right side to the 3’ UTR on the left. For some tracks, colors have important meaning. For example, in the Known Genes track, the color BLACK indicates that there is a PDB or Protein Data Bank structure entry for this transcript. Shades of blue indicated its NCBI status—which may be reviewed, or provisional, for example. You should check the documentation for the specific color codes for different tracks. Another track that has specific important color codes is the SNP, where the SNPs can be colored to represent different characteristics of the SNP. [NCBI = National Center for Biotechnology Information; status is its state of review; see RefSeq for a description of review status.] Some data types are represented by a histogram—for example some of the Comparative Genomics data in the track called Conservation displays a bar of a certain height; tall bars indicate the increased likelihood of an evolutionary relationship in that region. This kind of track is sometimes called the “wiggle” track. The different tracks will have different colors, shapes, etc. If you have a question about a specific representation you should check the documentation for an explanation of the significance. Understanding these representations will help you to quickly grasp many of the features in any genomic region.

Options for Changing Images: Upper Section Walk left or right Zoom in Zoom out Specify a position fonts, window, more click to zoom 3x and re-center In addition to the view of the genome that you see when you first arrive, you have the option to make lots of changes to the area of your view. Here I show the upper section of the Genome Viewer page, with several controls for adjusting your view of the genome. You can use the buttons with the arrowhead indicators to walk left or right along the chromosome in this area. You can take big steps (with the triple arrowhead), medium, or little steps along with the single arrowheads. These can be very handy if you are interested in what’s going on near your search region. You can magnify the image area using the ZOOM IN buttons—and as you can see you can zoom in a little bit, or up to 10-fold! Or—you can choose BASE to zoom all the way down to the nucleotide level right away. Similarly, you can ZOOM OUT with a different set of buttons. Alternatively, you can indicate a specific genome coordinate position in the POSITION box. For example, if we wanted to see more of the possible promoter or downstream regions, we could subtract 1000 from the 5’ side, and add 1000 to the 3’ side, and get all of that extra sequence in our view. In addition, you can use this box just like the search box on the gateway page—you can use it to search for text items if you enter text and hit JUMP. Another handy feature is the automatic zoom and recenter action. If you click your mouse on the nucleotide backbone track at the very top, the browser will automatically recenter the image where you clicked, and zoom in 3 fold. Finally, you could change the way your viewer looks with the CONFIGURE button. From this button, you will access a page that gives you some choices about how this page should look, including changing the font and graphics sizes. Those are the controls at the upper part of the page—mostly they move you along the genome horizontally or to change the position, affecting the entire viewing area. In the next few slides we’ll talk about controlling the individual annotation tracks down below on the Genome Viewer page with the track controls. Change your view or location with controls at the top Use “base” to get right down to the nucleotides Configure: to change font, window size, more…

Annotation Track display options enforce changes Links to info and/or filters Change track view Some data is ON or OFF by default At the bottom of all the Genome Viewer pages are the controls for the annotation tracks. This slide shows just a part of that section. In this slide I have focused on just one category area—MAPPING AND SEQUENCING TRACKS. However, the pulldown menu options are the same for all of the annotation tracks. The first important point is this: when you arrive a a fresh Genome Browser, some tracks are ON by default, and others are HIDDEN by default. For example, note that the display menu option for STS Markers says DENSE. And see also the display menu option for FISH Clones says HIDE and is grey in color. So—when you first arrive at the genome browser you are being shown only the DEFAULT items which are already turned on. Some of the annotation track names are pretty clear: Known Genes, or Human ESTs for example. Other names may seem a little bit less apparent. If you aren’t sure what type of data the track contains, all you need to do is click the hyperlink above the menu. Those links will present a page of information about the data in that track: the description of the data, the source of the data, any filters that might be available for that data, and possibly publications about the data if they are available. There are so many data types, and new ones are being added all the time. Yet it is easy to learn about these annotation tracks from these links. Once you find the data types you want to see or hide, you can use the pulldown menus here to turn any individual annotation track ON or OFF. There are several options here, and I’ll define those in the next slide. Right in the center of the page there are some handy buttons that I will describe in more detail later, but you need to know that you have to hit REFRESH if you make any changes to the menus; you need to click REFRESH to enforce those and actually see them in the viewer. Menu links to info about the tracks: content, methods You change the view with pulldown menus After making changes, REFRESH to enforce the change

Annotation Track options, defined Hide: removes a track from view Dense: all items collapsed into a single line Squish: each item = separate line, but 50% height + packed Pack: each item separate, but efficiently stacked (full height) Here I will illustrate the different appearances of the menu selections, using the Human ESTs (expressed sequence tags) track as an example. I show the same region of human chromosome 17 as our TP53 gene, in the Human ESTs section of the viewer, using the different menu options: Hide: completely removes the data from your image. Dense: all items become collapsed into a single line—it fuses all the rows of data into one line. In this case it means that you can see where there is EST coverage, but you don’t know anything about individual ESTs in this view. Squish: each item is on a separate line, but the graphics are only 50% of their regular height. Here you can see more information about individual ESTs. Pack: each item is separate, but efficiently stacked like sardines. However, they are full height diagrams—which makes it different from squish. Here you can see the GenBank accession numbers for the ESTs, which may be useful. Full: each item is on its own separate line, all the way down the browser viewer…up to a certain number of rows. If you have more than a couple of hundred items here the browser can become overloaded, and it will automatically revert to the more efficient “Pack” view. To choose any of these options, just highlight it in the pulldown menu. To make the changes appear, you must click the REFRESH button that appears in the middle of the genome browser page. We’ll spend a little more time on the mid-page controls now. Full: each item on separate line

Reset, Hide, Configure or Refresh to change settings enforce any changes (hide, full, squish…) reset, back to defaults start from scratch The final features I wanted to mention about controlling the Genome Viewer image are illustrated in this slide. This is a screen shot of the area around the middle of the Genome Browser page. First, let me draw your attention to the control buttons. The DEFAULT TRACKS button will get you back to the default settings—it is like an escape hatch if you made a lot of changes on the image and want to start over. The HIDE ALL button is nice if you wanted to set up a specific display with only those annotation tracks that you want—will let you start to build a nice customized view for yourself with only those things you care about. We will talk about the custom tracks button in another tutorial. CONFIGURE: here, up above on this page, and also on the Gateway page this is the same button: this button gives you access to a big web page that will let you make all sorts of changes to the viewer. You will be able to change the font and graphic size here; you can also change the window width (in pixels again) from this page. You can make broad changes to all the track menus, which are all together and grouped on this page for quick access to entire sections. REFRESH: You have to click this button to enforce any of the changes you made to those pulldown menus in the annotation tracks. The changes in the pulldown menus are NOT made automatically, you have to click this button. You control the views Use pulldown menus Configure options page

Annotation Track options, if altered… Annotation Track options, if altered…. important point: the browser remembers! Session information (the position you were examining) Track choices (squish, pack, full, etc) Filter parameters (if you changed the colors of any items, or the subset to be displayed) …are all saved on your computer. When you come back in a couple of days to use it again, these will still be set. You may—or may not—intend this. To clear your “cart” or parameters, click default tracks One thing that is important to know about changes you have made to the viewer: the browser REMEMBERS your changes, until you clear them. A cookie is stored on your browser that remembers where you were looking in the genome, and if you made changes to those menus. As we have discussed, there are a number of changes you can make—to the position, the track displays, and the filter options (which we really didn’t cover here, but are covered in the exercises). These parameters are all saved on the computer you are using. This may be great—you may always want to look at the data the same way. Or—as you move from one tool at this site to another, you “carry” your position with you. But—that may not be great—if you have forgotten that you filtered out something, or turned off a track. And if you use a shared computer in the lab or a library—you don’t know if someone made some changes since you used the browser last. The UCSC team refers to these settings as being stored in your “cart”. There are a couple of ways to clear out your “cart”: you can choose the DEFAULT TRACKS button from the Viewer controls. Or the link that says: Click here to reset on the Gateway starting page. If you ever find that your browser isn’t behaving quite like you expect, try to clear your cart and start again. OR

Saved Sessions

Click Any Viewer Object for Details Click the item New web page opens Many details and links to more data about TP53 We have spent a great deal of time on the Genome Viewer image, which offers a great deal of visual information about the genome data and annotation tracks. But there is much more data available to you still. Here I’ve just shown the small area of the annotation track image that has been our focus, the upper section in the TP53 region. Focus on the UCSC Known Genes area with our TP53 likely splice variants. You will remember that the one in the black highlight around the gene symbol is the one we selected in our original search. And the black color of that line indicates that this entry corresponds to an entry in the PDB, or Protein DataBank. We want to know more information about that item specifically. To learn more, all you need to do is put your mouse on that line and click that item. When you do so, a new web page will open. Here I show just the upper section of the TP53 gene description page for this item. You will find many important details about the object that you clicked just one page down from the viewer. The point is that one click away—on any item in the Genome Viewer--there is a LOT of more information available to you. Let’s look at an entire sample page. Example: click your mouse anywhere on the TP53 line

Click annotation track item for details pages informative description other resource links microarray data mRNA secondary structure links to sequences protein domains/structure homologs in other species Gene Ontology™ descriptions mRNA descriptions pathways Click annotation track item for details pages Not all genes have This much detail. Different annotation tracks carry different data. As I showed on the previous slide, one level down there are information pages that contain a great deal of extra information about that gene (or predicted gene, or SNP, or other item) in the viewer. I’m going to just show one sample here of the detailed information on the human TP53 Known Gene page. But the other types of data also have lots of additional information one layer down as well. This page is actually quite huge, and I know that you won’t be able to see all the details right now. But later you should go and see for yourself. There is extensive information about this gene, and links to many other resources as well. Practically one-stop shopping for known genes! One thing to know: not all gene will have this level of detail, and not every species will have all this information. I have specifically chosen a well-known gene for our example. Some genes won’t have protein structures, some won’t have pathway information, some won’t have microarray data. But if the data is available, it will be available to you on these detail pages. Other pages will carry different types of data, of course. I attached here a small part of a SNP detail page—position, sequence, validation status, function….and so on. Different data types will have different details pages. You only have to click on any item in the viewer to get to these details pages.

Get DNA, with Extended Case/Color Options Use the DNA link at the top Plain or Extended options Change colors, fonts, etc. So far, we have seen visual cues, and lots of text-based data. But one Frequently Asked Question that people have at this point is “where is the sequence data”? I want to spend a couple of slides on that topic so that you will know that you can get to the sequence level data. From the viewer, there are 2 handy ways to get the sequence information. First, from your TP53 viewer section, you could simply click the DNA link in the blue navigation bar at the top of the page. The link will bring you to a new GET DNA in Window web page, shown in the center. As you can see, the position you were looking at in the viewer is carried here, and is specified in the position box. This takes whatever you were examining in your viewer window. On this page you have several options to format the sequence: --You can tweak the output by adding some bases upstream or downstream. --You can get the sequence in upper or lower case. --You can mask repeated, low complexity regions. --Or you can get the reverse strand. You could just click the GET DNA button to get the sequence in a new web page, the output will be in FASTA format. The second button option offers even more ways to customize the output DNA sequence. If you click the EXTENDED CASE/COLOR OPTIONS button, you’ll get a new page that lets you change the case of individual items, change their colors, underline specific features, and so on. The choices that you will see in the list are based on the tracks actively shown in the Genome Viewer window you were looking at. If that’s too much, go back and turn off some tracks….This is a really unique way to look at your sequence of interest! As you can see in a sample output, different features look different by color, case, or underlines. These two options that I just describe deal with getting the whole region of DNA from your viewer. But you have another option—you can get just the sequence you want from an annotation track; that’s what we’ll look at in the next slide.

Get Sequence from Details Pages Click a track, go to Sequence section of details page Click the line Click the item sequence section on detail page In this second example of how to get sequence data, I’m showing a screen shot of the TP53 annotation track in the KNOWN GENES section. As before, we would CLICK ON THE LINE to get to the TP53 details page. From the details pages you can get the specific sequence for that item. Here I’m showing a part of that details page—the sequence section. You can scroll down the details page to find the sequence section. Here you will find links to the Genomic, mRNA sequence, and the protein sequence. You can use these links to get this specific sequence, plus additional options if you choose the genomic sequence—which is great for promoter studies, intron studies, and so on. So—the sequence of the items in your viewer is just a couple of clicks away, using either the DNA link at the top to get the whole window, or the links from the information pages to obtain sequence for specific items.

Accessing the BLAT tool BLAT = BLAST-like Alignment Tool In the UCSC browser, the tool you will use for sequence searching is called BLAT. Many of you will be familiar with the alignment tool called BLAST® or BLAST2, which stands for BASIC LOCAL ALIGNMENT SEARCH TOOL. If you have used the NCBI databases, and searched for similar sequences, you have probably used BLAST. But BLAT is different—it is the Blast-like alignment tool. It searches the database slightly differently than BLAST. BLAT requires an index of the sequences in the database—something like the index in the back of a biochemistry textbook. The BLAT index consists of all the possible unique 11-oligomer sequences in the genome (or 4-mers for protein sequences). Just as you can quickly scan a book index to find the correct word, BLAT scans the index for matching 11-mers, and then builds the rest of the match out from there. It is a very fast way to search the sequences. BLAST does it the other way—it indexes your query and then runs your smaller index over everything…that’s the essential difference in the algorithm. But the outcome will still be a pair of sequences that are lined up with each other so you can compare the matches. BLAT works best with sequences with high identity, and greater than 21 bases long—but don’t let that scare you, you can find more distant matches as well. Directly from the UCSC documentation: “On DNA queries, BLAT is designed to quickly find sequences with 95% or greater similarity of length 40 bases or more….On protein queries, BLAT rapidly locates genomic sequences with 80% or greater similarity of length 20 amino acids or more. In general, gene family members that arose within the last 350 million years can generally be detected.” For many people it will be enough to know that there is a means of searching for your region of interest in the database by starting with a sequence! For the more casual BLAT user, check out the documentation at the UCSC web site for a little more detail about the way BLAT works, without tremendous amounts of mathematical equations. For the more mathematically inclined folks, you can see the publication by Jim Kent that describes BLAT in more detail. So now we know a little bit about the BLAT tool. How do we get to it? Let’s start at the UCSC Genome Browser homepage. As for most UCSC tools, you can use the Navigation bars at the top or at the side of the UCSC home page. Select a link called BLAT to get started. [not read in recording] BLAST (original paper): Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403-10. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=2231712&dopt=Abstract BLAT (original paper): W. James Kent (2002) BLAT - The BLAST-Like Alignment Tool, Genome Res 12:4 656-664. Rapid searches by INDEXING the entire genome Works best with high similarity matches See documentation and publication for details Kent, WJ. Genome Res. 2002. 12:656

BLAT tool overview: www.openhelix.com/sampleseqs.html Make choices DNA limit 25000 bases Protein limit 10000 aa 25 total sequences Paste one or more sequences Shown here is the interface for BLAT. We will work our way down this page. As you can see along the top there are a few parameters you can change—some choices you have to make. First, you must choose 1 species to search. You search 1 species at a time with this tool. Then you choose an assembly—which we have seen before in the basic search section. Next, you may let the BLAT tool guess whether you have entered nucleotides or amino acids, or you can tell it which one you are using. Sort output—on default settings here—will list the best scoring matches first. Output type specifies whether you want the output to be in the browser form, or in files you can use later. Hyperlink is the default which displays in the browser, and that’s what I’ll be using for this example. The other type, PSL output styles, are useful for people who want a differently structured, text-based output that can be used for a variety of purposes. For my example, I will use the default “hyperlink” choice. There is a big text box where you can paste your sequence. You can paste 1 or more sequences, but there are limits to how much BLAT you can do, as it is a large burden on the servers. You can submit up to 25,000 bases or 10,000 amino acids, up to a total of 25 sequences. If you need to do more BLAT, UCSC asks that you download it and run a local copy. Instructions for this can be found in the documentation. I have displayed 2 partial mRNA sequences here that I will use in my example. These are in the common FASTA format, which you have to use if you are going to use multiple sequences. And there is also an option to upload your sequence (or sequences), if you keep a file of them. Finally, you click SUBMIT to send your query to the database. There is a special new button—the “I’m Feeling Lucky” button. If you click that—just like in Google—you will be taken to the position of your best match right away, in the Genome Viewer. But I’ll be demonstrating the plain old SUBMIT button right now. submit Or upload

BLAT results, with links go to browser/viewer go to alignment detail sorting Here we see the results of a BLAT search against the human genome, using the sample human mRNA sequences I showed. As you can see, we have sorted the list by the query and score [long red boxes]. You can see we have a really high scoring match up at the top. After that they appear to be less good matches—pretty small regions, probably. Now, you’ll remember that we asked for hyperlinked results in our setup. You can see that there are 2 columns of links for us. One says BROWSER, one says DETAILS. The first thing that I will do is demonstrate a click of the BROWSER link for the matches. This will link me to the position of this match in the Genome Viewer. I will show a sample of that on the next slide. Later we will click on the DETAILS link for the best match. That will give us a new page with sequence information, as you’ll see a couple of slides from now. Results with demo sequences, settings default; sort = Query, Score Score is a count of matches—higher number, better match Click browser to go to Genome Browser image location (next slide) Click details to see the alignment to genomic sequence (2nd slide)

BLAT results, browser link click to flip frame query From browser click in BLAT results A new line with your Sequence from BLAT Search appears! When you link from the BLAT results to the BROWSER—you get a special track appearing in the Viewer! Just down from the top there is a new line on the browser—it says YOUR SEQUENCE FROM BLAT SEARCH. And the name of my query sequence is listed over on the left. If you look at the known genes, or RefSeq genes, you can see that we have matched the CXCL5 gene, which is what I would have expected from the BLAT query. On the known genes, because we are zoomed in to a small region, you can see the Methionines indicated in green. Also, note the direction of this gene—it is on the negative strand, therefore runs from right to left in this case. But beware—the 3 frame translation at the top is running the other way. If you want to compare the methionines or other amino acids, you have to flip the frame translation. To flip the sequence and see the opposite strand, you must click the tiny arrow on the upper left of the viewer. So—we have used a sequence as a starting point to search the genome. We get to see the location of our match directly on the Genome Browser by clicking the Browser link from our BLAT results. So BLAT is another good place to start searching for your genes of interest in the UCSC Genome Browser tool. One special tip here: when you are zoomed in enough on any genomic sequence, you can see the amino acids in the display if you have turned on the Base Position menu to “full”. Zoom in more to see the amino acid single-letter codes right on the sequence. Watch out for reading frame! Click - - - > to flip frame Base position = full and zoomed in enough to see amino acids

BLAT results, alignment details Your query Genomic match, color cues Side-by-side alignment yours genomic Here I show the outcome if you clicked the DETAILS link from the BLAT results page. I know it’s impossible to see the whole alignment page clearly—even with my short query sequence this is a large web page. You can see the page is divided into several parts. The top part shows the query sequence you put in (in this case our human CXCL5 mRNA sequence). The middle part of the page shows the match up of your sequence (in blue) capital letters, to the genomic sequence. This gives you a quick look at the possible exon/intron structure if you have used an mRNA as I have. It’s a nice way to see which parts are the likely exons in an mRNA, and the introns in black text. The bottom part shows you the actual nucleotide-for-nucleotide matches—this may be more like the BLAST results you are used to seeing. I magnified the top of the side-by-side alignment so you can see where my query sequence on the top (starts with number …001), lines up with the genomic sequence. You can judge the quality of the match yourself in this section. So you can start to search the UCSC Genome Browser data with a sequence, and view the results in either the Genome Viewer or at the level of alignment detail shown here.

Understand Blat’s Limitation Blat was designed to rapidly align sequence from one genome back to itself (e.g., EST/cDNA data) It can and it does miss clear hits at times Blat actually allows for a single mismatch, but it also removes k-mers with excessive counts for efficiency. Not suitable for cross-species mapping.

Bunch More Goodies – Click Around

Bibliography: http://genome.ucsc.edu/goldenPath/pubs.html The UCSC Genome Browser Database: update 2008, update 2007, and earlier. UCSC Genome Browser Tutorial UCSC Genome Browser: Deep support for molecular biomedical research The UCSC Known Genes, 2006. The UCSC Gene Sorter, 2007. Piloting the Zebrafish Genome Browser, 2006.

UCSC Genome Browser [version9a]

Genome Browser Database search & download visualize Underlying Database (MySQL) Auxiliary table: related data Primary table: positions, names, etc. The Genome Browser database is comprised of “tables” of data. Some tables are primary and contain positional information. One example is “knowngene” table which includes data such as chromosomal position, gene name, exon and intron sizes and the like. Other tables include ‘auxiliary’ types of data. Some have positional information and others do not, like the example here that relates knowngene IDs to the corresponding Ensembl gene IDs. These tables of data comprise the database. The UCSC Genome Browser is a great way to visualize this genomic data, in context with many data types. Each annotation track in the Genome Browser draws from these “tables” of data in which to build that graphical view of the annotation track and the details pages you see. Each annotation track is based on a table that contains genomic positions, item names, and sometimes additional information. The Genome Browser queries this main table when generating the graphical display. Some tracks also have one or more auxiliary tables, related to the main table by a shared column of data, that contain even more detailed information about the track's items. The Table Browser allows you to access these same tables of data directly, filtering, manipulating and downloading the data in a very customized and flexible manner not possible with the Genome Browser. For example, you may want a list of all the SNPs in a given gene, as in the sample we show here, Or maybe you want to know what all the known genes are on Chromosome 21. You could also do a narrow search of only those SNPs that are non-synonymous coding variations found in disease genes. The possibilities of customized searching and data retrieval are nearly endless with the Table Browser..

The Table Browser http://genome.ucsc.edu/ Open browser Open browser To start our advanced search, we go to the Table Browser by clicking on either of those buttons found on the home page we introduced earlier on the top navigation bar or the left navigation bar. http://genome.ucsc.edu/

Table Browser: Choose Genome This is the Table Browser interface you come to when you click that homepage link (or any other ‘Table Browser’ link on other pages). To introduce you to advanced searching we are going to perform a series of tasks, each building on the other and allowing us to use and learn different features of the Table Browser. Our first task will be to download the sequence of all simple repeats of a copy number of more than 10 from a position on Human chromosome 4. Simple repeats are, as the name implies, short repeats of a simple sequence like cag-cag-cag-cag and are found throughout the genome, though less commonly within exons of genes. There are several choices and options to choose from here and we will go through these step by step. As in the other interfaces in the Genome Browser and related tools, one must choose the genome and assembly that you wish to search in. Here we will choose “Vertebrates” from the clade menu, “Human” from the genome menu and the “July 2003” assembly from the assembly menu. We’re going to choose an earlier assembly just as an example, later assemblies will have somewhat different data, so results might not be the same. There might be times you will want the earlier assembly. In the Human genome (hg16), search for simple repeats on a chromosome 4 location with copy number more than 10 and download the sequence. In the Human genome (hg16), search for simple repeats on a chromosome 4 location with copy number more than 10 and download the sequence.

Table Browser: Choose Table to Search Choose Data Table The next choice you have is which data table you wish to search. Tracks are organized into larger groups as they are in the Genome Browser. When you choose a group, for our example here we choose “Variation and Repeats,” the track menu changes to show the tracks in that group. When you choose a track, “Simple Repeats” in our example, the tables menu changes to show the tables in that track. Sometimes there is only one table for a track. As in the case of the “Simple Repeats” tracks there is only this one table: “simplerepeat.” When there are multiple tables for a track, the "main" table which includes genomic positions appears first in the tables menu, followed by auxiliary tables. Auxiliary tables *usually* contain purely descriptive information, not genomic positions. When you choose a table, the querying options below the tables menu may change. This is because some output format options and filtering options are appropriate only for certain types of table -- more about that later. In the Human genome (hg16), search for simple repeats on a chromosome 4 location with copy number more than 10 and download the sequence.

Table Browser: Describe Table If you have any questions about the type of data in the selected table, you can click the "describe table schema" button which leads to a page, part of which we show here. This description page shows the field names, an example value, data type, and definition of each field of the database table. If the database contains other tables which are related to the selected table by a field with shared values, those related tables are listed. If the selected table is the main table for a track, then the track description text is included.

Table Browser: Choose Region to Search The next choice is what position you want to obtain data from, you can get data from the entire genome, an ENCODE region or a specific chromosomal location. ENCODE is a project by the National Human Genome Research Institute (NHGRI) that is identifying and characterizing all functional elements in human genome sequences. If you know a gene name, but not the chromosomal location, this “look up” button will find the location for you. We are going to put in chromosome 4, 3 to 4 million base pairs, “chr4, colon, and 3 million dash 4 million.” In the Human genome (hg16), search for simple repeats on a chromosome 4 location with copy number more than 10 and download the sequence.

Table Browser: Upload Locations to Search Paste Upload Alternatively, you can copy and paste in a list of names or accession numbers or upload a file of them.

Table Browser: Filter to Refine Search Create Filter Submit Filter One of the powerful features of the Table Browser is the ability to filter the data table on various criteria. This is a ‘form based sql query’, allowing you to filter for different parameters of the fields in the table. By clicking the filter “create” button, you can access a form to create a filter. In this form we can filter on the number of cytosines or guanines a simple repeat contains or match a specific sequence. For example, our task is to download simple repeats, but only those with copy number over 10. So here we filter for simple repeats with a copy number greater than 10 by changing the pull down menu for “copynum” to the greater than sign and then typing 10 into the box. If you know the sql [pronounced “see quill”] query language, you can use the last box called “free form query” to type in your own custom filter. However, knowing sql is not essential to using this filter form. If there were more tables that the track was based on or related to, those too will show up on this page and you can filter on those. In the Human genome (hg16), search for simple repeats on a chromosome 4 location with copy number more than 10 and download the sequence.

Table Browser: Output Data In the Human genome (hg16), search for simple repeats on a chromosome 4 location with copy number more than 10 and download the sequence. Once you submit, the buttons for filter will change to ‘edit and clear’, indicating you have made filter choices. Like the Genome Browser, changes and choices you make in the Table Browser will be “remembered” until further changes are made. We are going to leave “intersection” and “correlation” for later. The next step in our task is to output the data. You’ll have several choices here, output format, type of file downloaded and other choices. We are going to output our data as sequence, but we’ll show you some examples of other output formats and the other choices here.

Table Browser: Output Formats Text Fields Output formats There are 7 different output formats: The first two get the fields of data from the primary table or selected fields from all related tables. This downloads as a tab-delineated text file that can be later used in a word processing or spreadsheet program.

Table Browser: Fasta Sequence Output The next allows you to obtain the DNA or protein sequence of the items in the table in a FASTA format.

Table Browser: Database Format Outputs The next two are database formats to use in other programs and databases. The two formats are either the Gene Transfer Format (GTF) or the Browser Extensible Data format (BED). BED is the format used by the Genome Browser database and we will be looking closer at this later in this tutorial.

Table Browser: Custom Track Output The custom track output creates an annotation track of your query in the Genome Browser and the Table Browser for further study. This newly created annotation track can be viewed and searched just as any other annotation track. The next section of this tutorial will go into more detail on how to create a custom track from your advanced Table Browser search or your own data.

Table Browser: Hyperlinks Output Lastly, You can also get a list of hyperlinks of the data positions in the Genome Browser.

Table Browser: Obtaining Output Adding name creates file on desktop, leaving blank creates output in browser. (exception: custom track) Data Summary You can obtain a summary of the specified data by clicking the “summary/statistics” button. This will give you a general idea of what kind and amount of data your are about to output. For example, here we have 121 simple repeats in this region of chromosome 4 with a copy number of greater than 10 . We can also learn such things as the smallest and largest item in base pairs and other statistics about our results. Before obtaining the data output, you will need to decide whether you want a file of the data saved on you computer or to view the data in your browser. Adding a name in the output file box creates a file on your desktop, leaving the box blank creates output in the browser. The exception is the custom track output, which automatically sends you to a separate browser page no matter what is in the file box.

Table Browser: Output configuration Sequence Format We are going to choose a sequence file as the output type. For some output types including sequence, clicking the "get output" button takes you to a page with additional formatting options. For example, our choice of sequence download asks us to decide what sequence to obtain and how it is formatted. Since this is simple repeats, this only asks if we wish upstream or downstream base pairs. If it were known genes or other type of sequence, the choices might be more extensive asking if we wish introns, 5’ [pronounced “five prime”] UTR regions and the like. Once you click “get sequence”, you will obtain your output. In this example, you will get a fasta file of all the 121 simple repeat sequences with no upstream or downstream base pairs included. Already we have learned the Table Browser can be a powerful tool. We can use it to obtain lots of data quickly in a format we can use elsewhere and we have learned that we can filter that data for exactly what we are looking for in a myriad of different and highly customizable ways. Now we will look at yet another powerful aspect of the Table Browser, the intersection. Get Sequence

Table Browser: Intersecting Data 2nd Table Any Overlap Intersect Submit Not only can you filter tables of data, but you can also study the intersection or overlap of sets of data in “intersection.” The intersection tool allows you to find if two datasets have any overlap. For example, is there any chromosomal location overlap between the “known genes” dataset and the “simple repeat” dataset and then download only that data where there is that overlap. Here we look at the question: Find simple repeats of a copy number over ten, which is the filter we just did, within known genes and download the sequences. If you return to the Table Browser, your previous search and filter should have remained. Clicking on the "intersection" create button will take you to an intersection page. Here, you choose the group, annotation track and table that you wish to intersect with the table that you selected on the main page. Only tables that include genomic positions are shown. This usually excludes any auxiliary tables that may be associated with a track. Here we intersect our simple repeats with known genes to find which of our 121 filtered repeats reside in known genes. You can choose any, complete, no or percentage overlap, intersection or union. Here we choose any overlap. You also have even more control over how you want to see the data with base-pairwise intersections and complements. To learn more about those, please see the UCSC help documentation for an explanation about those choices. For our purposes, and most others, the simple intersection will suffice. Once you’ve completed your choices, click submit. Find simple repeats (copy number > 10) within known genes and download the sequence.

Table Browser: Intersecting Data Narrows Search Filtered simple repeats Filtered simple repeats, intersected (overlapping) w/ known genes Summary Once you click “Submit,” you will find that the “intersection” choice changes to “edit” and “clear” and text appears that shows you the intersection. If we look at the summary as we had earlier, you will see that by intersecting our filtered repeats with known genes in our specified region, we’ve narrowed our search from the previous 121 to just 3.

Table Browser: Downloading Sequence Data Sequence Format Get Sequence Obtaining the sequence as before by clicking get output, deciding on formatting options and clicking “Get Sequence,” which shows the sequence of the three simple repeats.

Table Browser: Correlating Data Tables Get Results Correlate 2 Datasets Another feature in the Table Browser is the correlation tool which we will only briefly introduce here. The correlation feature was added to the Table Browser in August 2005 and is still under development. It is available for data tables which contain genomic positions and computes a simple linear regression on the scores in two datasets. If a dataset does not contain a score for each base position, then the Table Browser assigns a score of 1 for each position covered by an item in the table, and 0 otherwise. The Table Browser computes the linear regression quickly and then displays several graphs for visualizing the correlation, as well as summary statistics including the correlation coefficient "r". When datasets and parameters are chosen with some forethought, the correlation feature is a powerful tool for quickly gauging the relationship between two datasets. This correlation tool allows you to see what, if any, correlation there is between two datasets. For example, one might want to determine if there is any correlation between GC content and chromosome structure or between certain types of genes and repeats. Keep an eye on this feature as it develops.

Custom Tracks: Table Browser Searches Create Track Get Output Custom tracks can be created from the Table Browser search queries. Here we create one from the previously filtered and intersected data from the Table Browser. Instead of choosing ‘sequence’ as our output format, we will choose “custom track” And then we click “Get Output” which will allow us to set up our new custom annotation track.

Custom Tracks: Name and Configure Track Name Track: SRepeatKGenes Describe Track: Intersection … Choose default view in browser Download track file to desktop This is the page where we can set up our new track. Give it a name, in this case we are calling it “SRepeatKGenes”, but you could have left the default name the browser selected for you. Give it a description (or leave the default) that helps you remember what exactly the search entailed and shows up in the browser and a default visibility (full, pack, dense, squish and hide) that can be changed later in the Genome Browser. You can put in a URL to more information about your annotation track if you’ve created such a page, we have not, so we leave this blank. Then you choose how toyou want each record to be created. Since these are simple repeats you can have one record per repeat (written as gene, better word might be item) or you can add upstream and downstream bases to the record. Once you’ve made those choices you can download the track as a file to be uploaded at a later time and viewed in the browser (we’ll show you where later in this tutorial). This is important because unlike other parameters and changes to the browser you might have made, custom tracks are only persistent for 8 hours. Any Table Browser search you have created a custom track from would have to be redone after 8 hours if you haven’t downloaded the file for later use. There are two other choices. You can immediately view your new track in the Table Browser, or you can view it in the Genome Browser. We will click “Get custom track in Genome Browser” to open the Genome Browser page and view our new track. In Genome Browser

Custom Tracks: Open Track in Genome Browser Open Details Compare The custom track will appear in the Genome Browser. As you see here. Each item in the track (the three simple repeats of copy number over ten overlapping known genes) has some basic properties (chromosome position, sequence) that you can view by clicking on the item as you would any other annotation track. You can now view and compare this specialized search with other annotated data in the Genome Browser. As you see here, our search wasn’t in vain… this particular simple repeat is in the first exon of a disease gene, the Huntington’s Disease gene. It was discovered years ago that this repeat is the cause of the disease and the repeat expands and contracts from generation to generation. As the repeat expands in copy number, the disease becomes more virulent and has an earlier onset. Out of 400 some simple repeats in this region, three are left with our criteria. One of those is known to cause disease. Perhaps others that fit this criteria might be of interest. The advanced searching of the Table Browser and custom tracks offer a powerful tool of analysis and discovery. And there is more you can do with custom tracks created from your advanced searches. “…caused by an expanded, unstable trinucleotide repeat…”

Custom Tracks: Track in Table Browser Custom tracks also are available for filtering and intersections on the Table Browser You can also view the track in the Table Browser for further filtering, intersecting and searching. This will allow you to do some very specialized searches and discoveries, narrowing down your data to exactly the information you need.

Custom Tracks: User-generated Data in Track Custom Tracks Link Custom Track How-to You can also create custom tracks from your own data. To find the custom tracks, return to the home page by clicking the “Home” navigation link and you will see the link here on the left navigation bar near the bottom, “Custom Tracks”. There is a link taking you to directions of how to create a custom track from your data, “Displaying Your Own Annotations in the Genome Browser.” We are going to briefly look at these instructions for a quick overview on how to get and view your own data in the browser.

Custom Tracks: Four Steps to Create Track Four steps to create a custom track Define track characteristics Define browser characteristics Format your data Upload and view your track After clicking that link you will be taken to a page with instructions on how to create a custom annotation track from your own generated data. There is a lot that you can do with this custom track file, but the basics are quite straightforward. It is a four step process. In this process, you can type the information into any text editor or spreadsheet program. You will either save the file and upload it or copy and paste the information in a form we will show in the last step. The four steps are: Define how you want the annotation track to show up by default (colors, visibility, etc). Define the characteristics you want the Genome Browser to open up to (chromosomal location, pixel size, etc.). Format your data. and then upload the file you created and view your track.

Custom Tracks: Submit Track Submit File Alternatively, you can copy and paste the custom track file you’ve created as we’ve done here using the custom track we just created in previous slides. Once you’ve uploaded or pasted in your custom track file, you then click submit and you will be taken to the Genome Browser. Copy and paste small or simple tracks http://genome.ucsc.edu/FAQ/FAQformat

Custom Tracks: Track Appears in Genome Browser As you see here, the browser opens to chromosome 22 from 1000 to 10000 base pair location, and we can see our new track here with both ‘clones’, the track is in green, the clones each have two blocks, the orientation (strand) is shown as are all the other attributes we have set for our tracks.

Custom Tracks: Track Characteristics Default view of custom track is “pack” Default view of other tracks set Additionally, if you scroll down the page, you’ll find the new track is now displayed in the controls and you are able to change its visibility from the default we had set. Also, take notice the “gap” annotation track is on full, the default we specified. This track will last in the browser for 8 hours for you to view.

Custom Tracks: Track Appears in Table Browser Custom Track also appears in Table Browser Additionally, the track will also now appear in the Table Browser, allowing you to filter and intersect with other tracks and do further analysis. Custom tracks offer a very flexible and simple way to analyze data. Custom tracks allow you to view and analyze your own data along with any other annotated data in the Genome Browser. Additionally, custom tracks of Table Browser searches allow the research a very flexible and powerful way to search the data in a very specialized manner. Together with the Genome Browser and the Table Browser, custom tracks create a very powerful and flexible tool for analysis and discovery.

Custom Tracks from Outside Sources Custom Tracks Link Contributed Track We have shown you how you can create your own track. Members of the scientific community have created tracks like this and have made them available for others’ use. You can find these from the UCSC Genome homepage. Clicking on the custom tracks link in the left menu of the home page (remember, clicking the “Home” link on the top of any page will return you to the homepage) will take you to the same page we viewed earlier. If you scroll down a bit, you will see that many labs have submitted data in the form of customized annotation tracks for the community to view. Clicking on the link of any of these submitted tracks will open up a browser page to the Genome Browser with that track now included for your viewing. Here you see the Stanford Promoters track we just clicked on. It will also now appear in the Table Browser for searching. I suggest you take a look at these custom tracks and experiment with creating your own, these are powerful community resources at your fingertips.

Bibliography: http://genome.ucsc.edu/goldenPath/pubs.html The UCSC Table Browser, 2004. Bejerano et al., Nature Methods, 2005. The UCSC Proteome Browser Phylogenomic Resources at the UCSC Genome Browser