Presentation is loading. Please wait.

Presentation is loading. Please wait.

Next Generation R tidyr, dplyr, ggplot2

Similar presentations


Presentation on theme: "Next Generation R tidyr, dplyr, ggplot2"— Presentation transcript:

1 Next Generation R tidyr, dplyr, ggplot2
R-based advanced methods for deep sequencing analysis Weizmann Institute of Science, 2017 Gil Hornung Bioinformatics Analyst, G-INCPM, Weizmann

2 The “tidyverse” Make data manipulation and visualization easier to write and read – by data scientists for data scientists. Set of self-consistent packages (tidyr, dplyr, ggplot2, tibble, purrr, readr, lubridate…). Open source, but mainly developed and maintained by Hadley Wickham.

3 Talk outline Data manipulation: Preliminary concepts:
Tidy data Pipe Examples (the best!) Data visualization: Preliminary concept – Grammar of graphics Examples

4 (Un)tidy data Same data, different representations: Patient Year 1
Jon Snow 119/76 116/79 122/81 Theon Greyjoy 116/75 120/81 166/87 Eddard Stark 118/78 NA Patient Jon Snow Theon Greyjoy Eddard Stark Year 1 119/76 120/75 118/78 Year 2 116/79 125/81 NA Year 3 122/80 166/87 Patient Systolic Diastolic Jon Snow 119,116,122 76,79,81 Theon Greyjoy 116,120,166 75,81,87 Eddard Stark 118,NA,NA 78,NA,NA

5 Tidy data Each row is identifiable by a combination of “key” columns
Values of observations appear in columns Patient Year Systolic Diastolic Jon Snow 1 120 80 2 125 83 3 122 Theon Greyjoy 75 81 166 87 Eddard Stark 118 78 Each observation forms a row

6 Tidy data Easy to add new observations Patient Year Systolic Diastolic
Age House Jon Snow 1 120 80 16 Stark 2 125 83 17 3 122 18 Theon Greyjoy 75 Greyjoy 81 166 87 19 Eddard Stark 118 78 35 Easy to add new observations

7 Talk outline Data manipulation: Preliminary concepts:
Tidy data Pipe Examples (the best!) Data visualization: Preliminary concept – Grammar of graphics Examples

8 Pipe Simple connection between output of one function to the input of the next function. Instead of: > round(scale(abs(log2(x)),center = T,scale = F),2) Use a pipe: > x %>% log2 %>% abs %>% scale(.,center=T,scale=F) %>% round(.,2) Pipe operator Rstudio shortcut: Ctrl+shift+m A do to represent where the output should be used as input

9 Talk outline Data manipulation: Preliminary concepts:
Tidy data Pipe Examples (the best!) Data visualization: Preliminary concept – Grammar of graphics Examples

10

11 Joins A B A %>% inner_join(B) A %>% left_join(B) A %>%
right_join(B) A %>% full_join(B)

12 “Not really” joins A B A %>% semi_join(B) A %>% anti_join(B)
“intersect” values of keys “subtract” values of keys

13 tidyr example 2 - MSigDB gene set format
Hint: extract() can get multiple values: "(.*?)\t(.*?)”

14 Talk outline Data manipulation: Preliminary concepts:
Tidy data Pipe Examples (the best!) Data visualization: Preliminary concept – Grammar of graphics Examples

15 Grammar of graphics Established philosophy (Wilkinson, 2005) behind ggplot2. Decomposes a plot to the following main components: Component ggplot2 syntax Description Data A tidy data frame Aesthetic mapping aes() Mapping of variables to aesthetic attributes (e.g. which column is the x position? which column is the color?) Geometries geom_ The geometric shapes that will represent the data (e.g. bar, line, points…) More components: scale_, stat_, coord_, facet_, theme Layers – add data, then geom, then another geom then stat / facet etc.

16 Grammar of graphics ggplot(data=mpg, aes(x=hwy, y=cty)) + geom_point()
Data frame Aesthetics mapping from column to position Geometric elements to use

17

18 Talk outline Data manipulation: Preliminary concepts:
Tidy data Pipe Examples (the best!) Data visualization: Preliminary concept – Grammar of graphics Examples

19 ggplot2 example 1 – Expression of a single gene
PRJEB18261: RNA-seq of A431 cell line after gefitinib treatment

20 ggplot2 example 2 – Somatic mutation rates
MS Lawrence et al. (2013) doi: /nature12213

21 ggplot2 example 2 – Somatic mutation rates

22 Next Generation R knitr
R-based advanced methods for deep sequencing analysis Weizmann Institute of Science, 2017 Gil Hornung Bioinformatics Analyst, G-INCPM, Weizmann

23 knitr Dynamic report generation – R code is evaluated during the generation of the report. Based on markdown – a markup language that is easy-to-read and easy-to-write plain text format. Supports output as HTML, PDF… Open source, but mainly developed and maintained by Yihui Xie.

24 knitr


Download ppt "Next Generation R tidyr, dplyr, ggplot2"

Similar presentations


Ads by Google