Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Browser The Plot Deepak Purushotham Hamid Reza Hassanzadeh Haozheng Tian Juliette Zerick Lavanya Rishishwar Piyush Ranjan Lu Wang.

Similar presentations


Presentation on theme: "Genome Browser The Plot Deepak Purushotham Hamid Reza Hassanzadeh Haozheng Tian Juliette Zerick Lavanya Rishishwar Piyush Ranjan Lu Wang."— Presentation transcript:

1 Genome Browser The Plot Deepak Purushotham Hamid Reza Hassanzadeh Haozheng Tian Juliette Zerick Lavanya Rishishwar Piyush Ranjan Lu Wang

2 The Outline The Need & The Requirement The Options The Chosen One The New Age

3 THE NEED Why one should develop a Genome Browser

4 Why A Genome Browser? I want to analyze this organism

5 Why A Genome Browser? I want to analyze this organism Gene Functions Protein Domains Metabolic Pathways Comparative Analysis Synteny

6 THE REQUIREMENT What is expected out of a Genome Browser

7 A Genome Browser? I want something manageable

8 A Genome Browser!

9 The Genome Browser “Genome browsers facilitate genomic analysis by presenting alignment, experimental and annotation data in the context of genomic DNA sequences.” Melissa S Cline & James W Kent, 2009 Genome browsers aggregate data Taken From Andy Conley’s slides without permission

10 THE OPTIONS A Short Survey of the available Genome Browsers Modules

11 A Brief Time Travel FlyBase, SGD, MGD, and WormBase Setting up an MOD is expensive and time-consuming. The four MODs agreed in the fall of 2000 to pool their resources and to make reusable components available to the community free of charge under an open source license. The goal of this NIH-funded project, christened GMOD, is “…to generate a model organism database construction set that would allow a new model organism to be assembled by mixing and matching various components.”

12 GMOD

13 Who uses GMOD?

14 GMOD Components

15 Visualization - GBrowse

16 Visualization

17 JBrowse

18 GBrowse Synteny

19 CMAP

20 DATA MANAGEMENT

21 Chado

22 Tripal (http://www.cacaogenomedb.org/)http://www.cacaogenomedb.org/

23 TableEdit

24 BioMart

25 InterMine

26 ANNOTATION

27 MAKER

28 DIYA

29 Galaxy

30 Ergatis

31 Apollo

32 REALLY EXCITING OPTION!

33 JBrowse Smooth, fast navigation (think Google Maps for genomes )

34 JBrowse Smooth, fast navigation (think Google Maps for genomes ) Supports BED, GFF, Bio::DB::*, Chado, WIG, BAM, UCSC (intron/exon structure, name lookups, quantitative plots) Relies on pre-indexing to minimize security exposure and runtime bandwidth/CPU load on the server (future versions more likely to do some server work at runtime) Has an API for customized track/glyph extensions Is stably funded by NHGRI, with many interesting innovations implemented & pending integration

35 Smoother UI

36 Most Genome browsers

37 How is JBrowse different?

38

39

40 First look: Live Demo A couple of JBrowses around the web http://intron.ccam.uchc.edu/JBrowse/Dmel/ http://jbrowse.org/ucsc/hg19/

41

42 Types of Tracks

43 Pros Fast and smooth! User Friendly Works nicely on an iPad/iPhone too

44 Cons No user-uploaded data support Slow for big numbers of reference seqs (e.g. 5,000 annotated contigs) Few glyph options, feature tracks are limited by the facts of

45 What to pick?

46 ? Tried and tested Fancy concept

47 THE CHOSEN ONE Gbrowse and its Features

48 GBrowse Most popular web based genome browser Visualize genome features along a reference sequence Open Source Highly customizable Excellent usability Rich set of “glyphs” – Genome features – Quantitative Data – Sequence Alignments

49 GBrowse Header Main Browser Window Track Menu

50 Under The Hood Client-Server Architecture GBrowse Architecture Installation Issues Input Data Configuration File Customization

51 Client Server Architecture 1. The user types in the URL: browser2012.biology.gatech.edu

52 Client Server Architecture 2. Browser interprets and sends the request to HTTP Server

53 Client Server Architecture 3. Web Server receives the request and “serves” the client i.e., starts Gbrowse

54 Client Server Architecture 4. In case of success, relevant hypertexts and multimedia is generated by accessing the database

55 Client Server Architecture 5. The output traverses the same path back

56 Client Server Architecture 5. The output traverses the same path back

57 Client Server Architecture 6. The whole process repeats again when the user interacts with the browser

58 How you see what you see Juxtaposed Images

59 How are so many images generated?

60 How you see what you see + Hyper Text files

61 How you see what you see Multimedia files + Hyper Text

62 ©2002 by Cold Spring Harbor Laboratory Press Stein L D et al. Genome Res. 2002;12:1599-1610 GBrowse Architecture

63 The Bio::DB::SeqFeature database Schema

64 Attribute Attribute List Feature Name Type List Location List Parent2Child 1 1 1 n n 1 1 n 1 n n n

65 Data file (.gff3) Reference Sequence (Chr/Clone /Contig) Source Eg: Prodigal/ Glimmer Type (sequence ontology (SO) terms) Start End Score Eg: E- value Strand Phase (0/1/2) Attributes Format: tag=value

66 Attributes (Data file) Different tags have predefined meanings: ID: Gives the feature a unique identifier. Useful when grouping features together (such as all the exons in a transcript). Name: Display name for the feature. This is the name to be displayed to the user. Alias: A secondary name for the feature. It is suggested that this tag be used whenever a secondary identifier for the feature is needed, such as locus names and accession numbers. Note: A descriptive note to be attached to the feature. This will be displayed as the feature's description. Alias and Note fields can have multiple values separated by commas. For example : Alias=M19211,gna-12,GAMMA-GLOBULIN Other good stuff can go into the attributes field.

67 Gbrowse Configuration File Global Website Settings Additional HTML Pages JavaScript Jquery Global Database Settings Data Source Definitions

68 Customizations

69 Configuration file (.conf)

70 Making a new Track ### TRACK CONFIGURATION ### [ExampleFeatures] feature = remark glyph = generic stranded = 1 bgcolor = orange height = 10 key = Example Features

71 Adding Multiple Tracks Data: Configuration: Result UI: Searchable Links Popup balloons with links

72 Searching for Features Gene symbols Gene IDs Sequence IDs Genetic markers Relative nucleotide coordinates Absolute nucleotide coordinates etc... click

73 Viewing Multiple Tracks Low Magnification

74 Viewing Multiple Tracks High Magnification

75 In short… Main features (Determination of protein coding and non-coding,…) Quantitative data (E-value, Identity percentage) Other evidences (Interpro, CoGs, etc.) GC content and other useful measurements Protein and DNA sequences

76 THE NEW AGE Value-Added Additions

77 RICHER ANNOTATION What’s New

78 INCREASED ANNOTATION INFO Richer Annotation

79

80 INTEGRATED QUALITY SCORE Richer Annotation

81 Origin of Database Matches

82 Quality Value Integration

83 Quality Scores Origin of Database Matches

84 Different E-values shown with different shades of colors

85 What’s New MORE LINK-OUTS

86 COGs KEGG ID

87 PATHWAYS What’s New

88 KEGG ID KEGG Genes KEGG Compound KEGG Pathway

89 ORGANISM SPECIFIC PAGES Synthesis!

90 Organism Summary Page At this point of the course, we have gathered a lot of information for the strains we are dealing with Not all of this information could be represented inside the genome browser We propose a separate section in the browser containing strain-wise summarized information

91 Organism Summary Page Conceptually, the page could contain: – Biological information – Assembly information: Genome Size, Number of contigs, N50, Sequencing platform – Gene Prediction information: Number of protein coding and non-protein coding genes, links to 16s rRNA gene – Annotation information: Percent annotation, function distribution pie – Comparative information: Unique protein clusters, etc.

92 Organism Summary Page

93 OPERONS Adding more values

94 Operons Operon “…is a functioning unit of genomic DNA containing a cluster of genes under the control of a single regulatory signal or promoter” ~70% of the genes have been assigned a unique OperonID OperonID will provide an additional browsing mechanism for biologist connecting co- transcribed and co-regulated genes.

95 Operons

96 Incorporating Operon Information

97 BRIG PATTERN More with Comparison

98 BRIG Patterns Concept: To either generate BRIG images at run time or load static images when the user requests for BRIG Pattern between two species

99 BRIG Patterns

100 That’s All Folks! Questions? Comments? Concerns? If you have any suggestions, we would love to hear from you! (There is a page on Wiki for it!)


Download ppt "Genome Browser The Plot Deepak Purushotham Hamid Reza Hassanzadeh Haozheng Tian Juliette Zerick Lavanya Rishishwar Piyush Ranjan Lu Wang."

Similar presentations


Ads by Google