Download presentation
Presentation is loading. Please wait.
Published byArlene Morrison Modified over 9 years ago
1
Fission Yeast Computing Workshop -1- Searching, querying, browsing downloading and analysing data using PomBase Basic PomBase Features Gene Page Overview Configuring Browser “Others” links QuickGO browser Homepage links Pombelist list Advanced seach Genome regions (quicklinks to centromeres etc) GO slim Genome Browser Genome statistics Gene characterization FAQ Simple data mining and analysis Create user defined gene sets and Download gene sets in various formats Combine (union, intersect and subtract) to make and refine user defined lists “GO slimming” GO enrichment”
2
Fission Yeast Computing Workshop -2- Using the “Advanced Query Tool” to create and download some gene sets http://www.pombase.orghttp://www.pombase.org (Advanced search under the “Find” menu) The “Query Results” tab allows you to browse and download your search results Exercise 1: Create a protein data set and import gene list Select Gene Filters Genes by type, “protein coding”
3
Fission Yeast Computing Workshop -3- Query history The “Query History” tab stores previous searches and allows you to union, intersect and subtract them. From your protein coding gene set : 1)Subtract the union of “Annotation status” “dubious” and “transposon”. This gives you the set of protein coding genes for fission yeast 2) Subtract annotation status “published” to give you the “unpublished” protein coding gene set 3) Intersect these results with the GO term “nucleus” 4) Intersect these results with phenotype “viable” 5) Go to the “Gene List Search” in the FIND menu and import a list of your own, or the list provided here: ftp/pub/yeast/pombe/EMBO/test_list This imported list will appear in the query history, this enables you to perform these intersections on any user defined list. 6) Intersect your uploaded list with the output of 5 to find the genes in your “user defined list” which are unstudied in fission yeast and annotated as nuclear 7) Download the list for slimming exercise later (you only need the systematic identifiers in column1)
4
Fission Yeast Computing Workshop -4- Exercise 2: Creating defined gene sets This “gene characterization” data is avaiable here in the PomBase website: http://www.pombase.org/status/gene-characterisation (and previous totals under “characterization history, left hand margin) http://www.pombase.org/status/gene-characterisation You can recreate this data in the Advanced Query query using queries for “Annotation status”. Select the Annotation status “Conserved Unknown” To drill down to the “species distribution” intersect your “conserved unknown” list with “conserved in vertebrates”, “conserved in fungi only” (under “Other Vocabularies” “Conserved in…..”) Try some more queries How many proteins have an identified Pfam domain or family assignment? How many non coding RNAs are there between bases 100,000 and 200,000 on chromosome 1? How many proteins are longer than 1000 amino acids on the left arm of chromosome 3 ? Are any of these “conserved unknown” ?
5
Fission Yeast Computing Workshop -5- GO “slimming” A “ slim” is a high level view of GO (genes annotated to granular terms are mapped to higher level terms) Allows users to group genes into broader categories to assess their distribution, for genome wide analyses or smaller gene sets Different Annotation groups (organism databases) have created specific GO slims which are available at GO’s FTP site (fission yeast now has an “official GO slim” which give good coverage of high level processes). You can create and use your own GO slim with high level terms of interest A fission yeast GO slim has been created for process terms http://www.pombase.org/browse-curation/fission-yeast-go-slim-terms This slim gives good coverage of annotated proteins (most annotated proteins are mapped to the slim). This should be suitable for general purposes, but to slim experimental results you may want to change the terms in the slim slightly to best represent your dataset. There are some guidelines for creating user defined slims here: http://www.pombase.org/browse-curation/fission-yeast-go-slimming-tips Note: this is not a gene product count, as gene products have multiple annotations; this means that it doesn’t make sense to display this information as a pie chart For most purposes this slim would be inadequate, (the terms are very broad) but it demonstrates “unknown” (unannotated) “other” annotated to some other term in the slim There are usually many more annotations than genes (e.g. 8454 here). Many genes are annotated to multiple high level terms. A pie chart does not show the percentage of the genome involved in a particular process as it is often used and interpreted. Histograms with absolute numbers on the axis rather than percentages are much more meaningful.
6
Fission Yeast Computing Workshop -6- Exercise 3 “GO Slimming” This exercise uses the generic “GO slim mapper”at Princeton to create a ‘GO slim distribution from our gene set of interest. Go to http://go.princeton.edu/cgi-bin/GOTermMapperhttp://go.princeton.edu/cgi-bin/GOTermMapper (this implementation is always up to date for the ontology and the annotation, and it supplies A list of “unknown genes” which map to the root node, and a list of genes which are annotated to a non-root process but not covered by the slim is provided 1. Upload the protein coding gene list from Exercise 1 Select PomBase GO Slim, 2.User defined GO slim In the advanced options For example, if you wanted to create a “Slim” set for “component terms” You might begin with GO:0005634 nucleus GO:0005739 mitochondrion GO:0005886 plasma membrane GO:0005794 Golgi apparatus GO:0005730 nucleolus Try this option with your list
7
Fission Yeast Computing Workshop -7- Exercise 4 “GO Term Enrichment” Using the generic “GO term finder” tool at Princeton to provide an enrichment analysis (significant shared terms) in a gene set of interest. Go to http://go.princeton.edu/cgi-bin/GOTermFinderhttp://go.princeton.edu/cgi-bin/GOTermFinder 1. Upload your gene list from the Exercise 1. 2. Select the process ontology 3. Choose the PomBase association file (annotations) 4. Repeat with the Cellular Component ontology 1 2 3 The results will show the most significant terms in your gene set, in order of significance. The % in your gene set compared to the % in the genome as a whole is provided, in addition to the P-value
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.