Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intro to Biocoductor and GeneGraph by Kyrylo Bessonov 9 Oct 2012.

Similar presentations


Presentation on theme: "Intro to Biocoductor and GeneGraph by Kyrylo Bessonov 9 Oct 2012."— Presentation transcript:

1 Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012

2 Bioconductor repository Is the repository with extensions and libraries for R-language found at http://www.bioconductor.org/ http://www.bioconductor.org/ Bioconductor libraries cover Micorarray analysis Genetic variants analysis (SNPs) Sequence analysis (FASTA, RNA-seq) Annotation (pathways, genes) High-troughput assays (Mass Spec) All libraries are free to use and contain documentation (i.e. vignettes) Vignettes are short and require some previous knowledge of R and/or other defendant libraries 2

3 Configuring R for Bioconductor To install Bioconductor libraries the R environment needs to be configured source("http://bioconductor.org/biocLite.R") To download any Bioconductor library use biocLite( " package_name " ) function biocLite("GenomeGraphs") Load the downloaded library functions via the library("lib_name") function library("GenomeGraphs") Read help file(s) via the ?? or browseVignettes() ??library_name (e.g. ??GenomeGraphs) browseVignettes(package = "GenomeGraphs") 3

4 Genome Data Display and Plotting The GenomeGraphs library Authors: Steen Durinck and James Bullard (Bioconductor library)

5 Intro to GenomeGraphs library Allows to retrieve data from Ensembl comprehensive DB on genomes intron/exon locations sequence genetic variation data protein properties (pI, domains, motifs) GenomeGraphs Bioc library allows to display data on: Gene Expression (expression / location) Comparative genomic hybridization (CGH) Sequencing (e.g. variations/introns/exons) 5

6 GenomeGraphs: Plotting with gdPlot gdPlot main plotting function gdPlot(gdObjects, minBase, maxBase...) gdObjects = any objects created by GenomeGraphs a) BaseTrack ; b) Gene ; c) GenericArray ; d) RectangleOverlay minBase the lowest nt position to be plotted (optional) maxBase the largest nt to display (optional) Objects are data structures that hold/organize a set of variables 1) Create an object to plot using makeBaseTrack() function ObjBaseT = makeBaseTrack(1:100, rnorm(1:100),strand = "+") 2) Plot the newly created object gdPlot(ObjBaseT) Display object structure / properties (e.g. of BaseTrack ) attributes(ObjBaseT) $ sign next to the name represents object variables Change or view individual object variables attr(ObjBaseT, “strand") 6

7 Manipulating GenomeGraphs class graphical properties Changing values of variables (classical way) attr(Object, "variable/parameter") = value attr(ObjBaseT, "strand") = "-“ If an array, to access individual elements use [element number 1..n] attr(ObjBaseT, "variable/parameter")[number] = new_value attr(ObjBaseT, "base")[1] = 0 Changing graphical parameters such as color showDisplayOptions(ObjBaseT) alpha = 1 lty = solid color = orange lwd = 1 size = 5 type = p getPar(ObjBaseT, "color") [1] "orange" setPar(ObjBaseT, "color", "blue") 7

8 gdPlot composite/group plots gdPlot(…) can take any number of gdObjects to plot Let’s plot two BaseTrack objects a and b a = makeBaseTrack(1:100, rlnorm(100), strand = "+") b = makeBaseTrack(1:100, rnorm(100), strand = "-") ab=list(a,b) gdplot(ab) 8

9 Manipulating values of the grouped gdObjects ab is list of objects a and b To access individual object within the list use [ ] ab[1] will display object a ab[2] will display object b To modify the grouped object use double [[ ]] To change color of b to red getPar(ab[[2]], "color") setPar(ab[[2]], "color", "red") -OR- (re-create object) b = makeBaseTrack(1:100, rnorm(100), strand = "-", dp=DisplayPars(color="red")) ab=list(a,b) gdPlot(ab) 9

10 Plotting with labels and legend gdPlot does not provide functions to label axis Trick = “use labeled / tagged” objects "label" = GenomeGraph object "+ strand"= makeBaseTrack(1:100, rlnorm(100), strand = "+") To display legend use makeLegend("text","color") ab=list("+"=a, "-"=b, makeGenomeAxis(), makeLegend(c("+", "-"),c("orange", "red")) ) 10

11 Retrieving and Displaying Data from Public Database Combining capabilities of biomaRt and GenomeGraph libraries

12 Welcome to biomaRt biomaRt library allows to retrieve data from public DBs ensemblENSEMBL GENES 68 (SANGER UK) snpENSEMBL VARIATION 68 (SANGER UK) unimartUNIPROT (EBI UK) bacteria_mart_14ENSEMBL BACTERIA 14 (EBI UK) ***Use listMarts() to see all available databases*** Let’s retrieve gene data of the Bacillus subtilis strain useMart(database,dataset) allows to connect to specified database and dataset within this database db=useMart("bacteria_mart_14") listDatasets(db) DatasetDescriptionversion ……… bac_6_geneBacillus subtilis genes (EB 2 b_subtilis) EB 2 b_subtilis db=useMart("bacteria_mart_14", "bac_6_gene") 12

13 Exploring biomaRt object listAttributes() shows all prop. of the biomaRt obj. The db object has total of 4175 genes Use getBM(attribute, filter, value, biomaRt_obj) to extract values belonging to specified attributes attribute: general term such as gene name/chromosome # / strand (+ or - ) filter: parameter applied on attribute such as genomic region to consider (i.e. start and end in nt) value: actual values of the applied filter(s) getBM(c("external_gene_id", "description","start_position", "end_position", "strand"), filters = c("start", "end"), values = list(1,10000), db) IDdescription start(nt) end(nt) strand metSMethionyl-tRNA synthetase 45633476271 ftsHCell division protease ftsH homolog 76984788971 hslO33 kDa chaperonin79880807551 DgkDeoxyguanosine kinase2314623769-1 13

14 Plotting the selected “Genome Region” Create an object with makeGeneRegion() function makeGeneRegion(start,end,chromosome name, strand, biomaRt object, plotting options) Find notation used for the chromosome naming getBM("chromosome_name","","",db) chromosome_name 1 Chromosome gRegion = makeGeneRegion(1, 10000, chr = "Chromosome", strand = "+", biomart = db, dp = DisplayPars(plotId = TRUE, idRotation = 90, cex = 0.8, idColor = "black")) gdPlot(list(gRegion, makeGenomeAxis(), makeTitle("Position(nt)",cex=3,"black",0.1))) 14

15 Bacillus subtilis genome region (intron / exon) 15 ensembl_gene_id

16 Mapping Expression data RNAseq GenomeArray()

17 Intro into RNA-seq data HT sequencing technologies allow to sequence mRNA in a series of short contigs of 50-200 bp In addition to gene expression analysis it possible to Map location of introns (UTRs) / exons Principal: one searches for a rapid changes in abundance of the RNA-Seq signal (contigs) Integration of sequence + expression information Ugrappa Nagalakshmi et. al. 2008 had used this strategy to accurately map yeast genome 1 Task: Display part of the seqDataEx dataset having both abundance of mRNA (cDNA) transcripts and annotated yeast genome 17 1 Ugrappa Nagalakshmi et. al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science, 2008

18 Plotting RNA-seq data with GenomeArray() Need to get mRNA contig abundance data data("seqDataEx") rnaSeqAb=seqDataEx$david Create GeneticArray object with makeGenericArray(intensity, probeStart, probeEnd, trackOverlay, dp = NULL) intensity either microaray or RNAseq transcript abundance signal probStart start position of the probe (location in nt) Plot contig abundance w.r.t. to genomic location gdPlot(list("abun"=makeGenericArray(rnaSeqAb[, "expr", drop = FALSE], rnaSeqAb [, "location"]), makeGenomeAxis() ) ) 18

19 Adding genomic annotation to the prev plot Need to get annotated yeast genome data annotGen = useMart("ensembl", "scerevisiae_gene_ensembl") Create mRNA contig abundance mRNAabun = makeGenericArray(rnaSeqAb[, "expr", drop = FALSE], rnaSeqAb [, "location"]) Create annotated seq. covering the mRNA contigs location annotSeq=makeGeneRegion(start = min(rnaSeqAb[, "location"]), end = max(rnaSeqAb[, "location"]), chr = "IV", strand = "+", biomart = annotGen, dp = DisplayPars(plotId = TRUE, idRotation = 0, cex = 0.85, idColor = "black", size=0.5)) Combine objects and plot them gdPlot( list("abund" = mRNAabun,makeGenomeAxis(), "+" = annotSeq), 1299000,1312000) 19

20 Resulting plot: Transcript abundance w.r.t. to location 20

21 Overlays of Basic Shapes and Custom Text To overlay rectangle use makeRectangleOverlay() makeRectangleOverlay(start, end, region = NULL, coords = c("genomic", "absolute"), dp) rectOver = makeRectangleOverlay( 1301500, 1302200, region=c(1,2), "genomic", DisplayPars(alpha = 0.5)) To overlay text use makeTextOverlay() tOver = makeTextOverlay("Ribosomal Large subunit", 1302000, 0.95, region = c(1,1), dp = DisplayPars(color = "red")) Combine all overlay objects into one vector with c(v1,v2) gdPlot( list("abund" = mRNAabun,makeGenomeAxis(), "+" = annotSeq), overlays=c(rectOver, tOver) ) 21

22 Overlay of rectangle and text 22

23 Alternative splicing of transcript makeTranscript(id, type, biomart, dp = NULL)

24 Displaying alternative splicing of a gene mRNA coming from the same ORF could be spliced in many ways E.g. case of VDR genes of IgG Given biomaRt object, the makeTranscript() will extract splicing information for given id (i.e. gene) Download human genome databse hGenome <- useMart("ensembl", "hsapiens_gene_ensembl") Select Ensembl ID to look at head(getBM(c("ensembl_gene_id", "description"),"","", hGenome)) Get splicing data from biomaRt object (hGenome) spliceObj = makeTranscript("ENSG00000168309", "ensembl_gene_id",hGenome) Plot the object with gdPlot gdPlot(list(makeTitle("Transcript ID: ENSG00000168309"),splicingObj, makeGenomeAxis()) ) 24

25 The final result 25

26 Conclusion GeneGraphs provides a wide range to plot genomic data Can use external databases through biomaRt Main useful features identifies exons/introns allows to cross-reference expression / genome data flexible albeit complex plotting capabilities allows to overlay graphical objects and text ability to create custom legends annotation capabilities provided by powerful biomaRt 26

27 Thank you for your patience! & Happy Bioconductor/R Exploration!


Download ppt "Intro to Biocoductor and GeneGraph by Kyrylo Bessonov 9 Oct 2012."

Similar presentations


Ads by Google