Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Methods to study Sequencing data -Meenakshi Sharma.

Similar presentations


Presentation on theme: "Computational Methods to study Sequencing data -Meenakshi Sharma."— Presentation transcript:

1 Computational Methods to study Sequencing data -Meenakshi Sharma

2 Outline Bioinformatics Genomics Motivation Challenges Next-Generation-Sequencing Pipeline – Sequencing – Mapping – Assembly – Blast 2

3 Introduction Biology Computer Science Data Mining Statistics Applied Mathematics Applied Chemistry Applied Physics 3 Applied Sciences Computer Science Biology Bioinformatics

4 Definition Bioinformatics definition by bioinformatics definition Committee, National Institute of Mental Health released on July 17, 2000 “Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.” 4

5 Genomics Determine the complete DNA sequence for all genetic material contained in an organism Analysis and comparison of entire genome of a single or multiple species Genome: set of all genes possessed by an organism 5

6 Genome 6

7 Motivation Gene and genome organization Study protein structure and functions Study metabolic pathways Study ecology and environment Find potential pathogen 7

8 Challenges 8

9 9

10 Knowledge acquisition and knowledge management Methods for Information and Knowledge Processing – Information retrieval – Statistical data analysis – High-performance and large-scale computing – Applications of new devices and emerging hardware technologies – Visualization of data and knowledge Legal issues, policy issues, history, ethics 10

11 Next-Generation-Sequencing Pipeline Sequencing Sample Preparation Output: Reads Quality Analysis Statistics Output: Quality plots Assembly Output: Contigs Mapping Output: Coverage Blast Output: List of organisms matched 11

12 Healthy Tissue Infected Tissue Library Preparation Illumina Sequencer Reads from Healthy Sample Reads from Infected Sample ATGCGACTC ACCATGGCG ACTAGGGCA ATTATGTAG ATGGGTGAA TTCATGCGG ACTTCGCGT ATGATCCGA Sequencing 12

13 ATGATGATGATGATGCGACTCTACCGGCGTA NC_000018 ATGATGATGATGATACTTCGCGTTCTCGCGTA NC_000018 ATGCGACTC ATGATGATGATGATGCGACTCTACCGGCGTA 00 000 0000000000001 0 0000000 2 2 1 5 0 0000000000 3 …0000000 10 20 12 45 10 0000000000 10 … ATGCGACTC ACCATGGCG ACTAGGGCA ATTATGTAG ATGGGTGAA TTCATGCGG ACTTCGCGT ATGATCCGA Reads from Healthy Sample Reads from Infected Sample Mapping 13

14 Comparing coverages in 2 samples Healthy Tissue Infected Tissue Coverage ValueCoverage Value 14

15 ATGCGA TGCGAG TGCGAT TGCGAG ATGAAA TGAAAA GAAATA ATGCGACTC ACCATGGCG ACTAGGGCA ATTATGTAG … ATGGGTTTA TTCATGTCG ACTTGTCAG ATGATCTAA … ATGCGAACCATG ACTAGATTATGTTTCGCGA ACTCCCTATCGA GATTATGTTTCGCGA ATGTTTCGCGAGGTGT … ATGGGTATTCATG TCTTTGTATGATCTA ATGGGTAATG GTGTGTATGATCTA … Assembly 15

16 ATGCGAACCATG ACTAGATTATGTTTCGCGA ACTCCCTATCGA GATTATGTTTCGCGA ATGTTTCGCGAGGTGT … ATGGGTATTCATG TCTTTGTATGATCTA ATGGGTAATG GTGTGTATGATCTA … ATGCGAACCATG| papilloma virus ACTAGATTATGTTTCGCGA| Ecoli ACTCCCTATCGA| human mitochondria GATTATGTTTCGCGA| human chr 12 ATGTTTCGCGAGGTGT| polio virus … ATGGGTATTCATG| small pox virus TCTTTGTATGATCTA| human chr 21 ATGGGTAATG| growth factor gene GTGTGTATGATCTA| human mitochondria … Blast 16

17 ATGCGA ACCATG ACTAG ATTATGTA ATGGGTA TTCATG ACTTGT ATGATCTA NC_989231 ATGTAATCTAGTAGATGAGATGATAG ACTAG ACTTGT ATGCGA ACCATG ACTAG ATTATGTA ATGGGTA TTCATG ACTTGT ATGATCTA ATGCGAACCATG ACTAGATTATGTTTCGCGA GATTATGTTTCGCGA ATGTTTCGCGAGGTGT ATGGGTATTCATG TCTTTGTATGATCTA ATGGGTAATG GTGTGTATGATCTA ATGCGAACCATG ACTAGATTATGTTTCGCGA GATTATGTTTCGCGA ATGTTTCGCGAGGTGT ATGGGTATTCATG TCTTTGTATGATCTA ATGGGTAATG GTGTGTATGATCTA Sequencing reads Coverage ValuesAssembled Contigs Matched genes and Organisms TAGATC TGAGAT TAGATC ATGTAA TGAGAT TAGATC NC_989231 ATGTAATCTAGTAGATGAGATGATAGATCGCAT ACTAG TGAGAT TCGCAT ACTAG TCGCAT Differential Coverage ATGCGA ACCATG ACTAG ATTATGTA ATGGGTA TTCATG ACTTGT ATGATCTA 17 Sequencing AssemblyMapping BlastCoverage Analysis

18 References 1)Gibas, C. and Jambec, P., Developing Bioinformatics Computer Skills, April 2001, O'Reilly & Associates, Inc. Web. 13 February 2012. 2)Kahn, Scott D., On the Future of Genomic Data Science 331, 728 (2011); DOI: 10.1126/science.1197891 3)Wetterstrand KA., DNA Sequencing Costs: Data from the NHGRI Large-Scale Genome Sequencing Program, Available at: www.genome.gov/sequencingcosts. 13 February 2012.www.genome.gov/sequencingcosts 18

19 Thank you! 19


Download ppt "Computational Methods to study Sequencing data -Meenakshi Sharma."

Similar presentations


Ads by Google