Download presentation
Presentation is loading. Please wait.
Published byClara Rosalyn Reed Modified over 9 years ago
1
Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data Kai Ye k.ye@lumc.nl
2
Data collection for osteoarthritis, cardiovascular disease and longevity Serum parameters Cellular characteristics (biobank) Skin ageing Glycosylation Metabonomic Transcriptomic Genetic (GWAS/sequence) Epigenetic Data Integration
3
Genetic & Epigenetic analyses Biochem analyses Expression analysis metabonomic analysis Glycosylation Cell responses Joost Kok Erik vd Akker Kai Ye Statistical analysis
4
About me 1995 – 2003 B.S. and M.S. in biology and pharmaceutical science 2004 – 2008 PhD with Cum Laude at Leiden University. Thesis title: Novel algorithms for protein sequence analysis 2008 – 2009 Postdoc at European Bioinformatics Institute, collaborating with scientists in Sanger Institute Currently assistant professor at MolEpi
5
A Pindel approach for identifying indels in Next-Gen sequencing data Paired-end reads in Next-gen sequencing Indel detection algorithms Pindel Cancer genome project 1000 genomes project
6
Paired-end reads in Next Generation sequencing ~ insert size
7
SNP Mapping paired-end reads CNVs: copy number variations; INDELs: insertions and deletions; SVs: Structural variations
8
Gapped alignment for small indels ATCCGTATCACGGTCA-CAGATCAGTCCAGT ATCCGTATCACGGTCAGCAGATCAGTCCAGT indel
9
Read-depth for CNVs
10
Read-pair approach for SVs No Indel Deletion Insertion Sample Reference Sample Reference Sample Reference
11
Mapping paired-end reads read-pairs read-depth SNP or small indel
12
Mapping paired-end reads read-pairs read-depth SNP or small indel
13
test ref 1base - 1million bases Pindel: Deletions
14
18 May 201514 Pindel: Deletions ref Anchor
15
18 May 201515 ref Pindel: Deletions Anchor 2 x average distance
16
18 May 201516 ref Pindel: Deletions Anchor 2 x average distance Expected maximum deletion size + read length (36)
17
18 May 201517 reference Pindel: Deletions sample
18
18 May 201518 African male: NA18507 Bentley et al., Nature 2008 135Gb of sequence ~4 billion paired 35-base reads After preprocessing: 56,161,333 pairs of one-end mapped reads Pindel – 142,908 1-16bp insertions – 162,068 1bp-10kb deletions
19
18 May 201519 Deletion size distribution
20
Applications Cancer genome project 1000 genomes project
21
Cancer genome COLO-829 cells Normal ~30x paired-end 100bp reads Tumor ~40x paired-end 100bp reads Search for somatic (tumor specific) indels
23
1000genomes project Pilot 1: 180 people of 3 major geographic groups (YRI, CEU, CHB and JPT) at low coverage (~4x) Pilot 2: the genomes of two families (CEU and YRI, both parents and an adult child) with deep coverage (20x per genome) Pilot 3: sequencing the coding regions (exons) of 1,000 genes in 1,000 people with deep coverage (20x).
24
www.ebi.ac.uk/~kye/pindel k.ye@lumc.nl
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.