Download presentation
Presentation is loading. Please wait.
Published byClare Parrish Modified over 9 years ago
1
Practically Genomic A hands-on bioinformatics IAP Course Materials: http://rous.mit.edu/index.php/IAP_2012 Instructors: Paola Favaretto, Sebastian Hoersch, Charlie Whittaker and Courtney Crummett KI for Integrative Cancer Research at MIT and MIT Libraries Students - Wide range of experience levels Unix account access information will be provided Evaluations - Please send comments to charliew@mit.edu
2
Turning Biologists into Bioinformaticists - A practical approach The teaching material should: be modular and practical have obvious contextual relevance serve as readily accessible and easily used reference materials The students should: become aware of the contents of a basic bioinformatics toolkit learn how to find instructions covering tools and methods. experiment with different methods covered in classes gain familiarity and comfort with command-line computing Target Audience are KI Biologists
3
Turning Biologists into Bioinformaticists - A practical approach – the specifics 1.Theory - Core Bioinformatics Concepts Important principles required to use bioinformatics 2.Tools - A Basic Bioinformatics Toolkit The software of bioinformatics 3.Tasks - Bioinformatics Methods Data analysis with bioinformatics Under Development! http://rous.mit.edu/index.php/Teaching
4
IAP 2012 Agenda (subject to change) 1-23-12 Introduction Getting more from Excel Unix Introduction 1-25-12 Next Generation Sequence Analysis with Unix and Galaxy 1-27-12 Visualization and Analysis of Genomics Data rous.mit.edu
5
Theory – Genomic Data All kinds of genomics data are described using at least 4 pieces of information. 1)The name of a DNA sequence name 2)A position on that sequence 3)A feature that exists at that position. 4)Genome assembly version Sequence1 Position Feature Chromosome1 1314 Mutation Sequence 1 is a long block of sequence arranged by a process called genome assembly.genome assembly This is critical because the 3 pieces of information described above are only meaningful for one specific assembly version. A new version of the genome will probably not have this mutation at position 1314. It would be located elsewhere. BED, GFF, GTF formats
6
Theory – Microarray Data 1.Target features created on a surface 2.Labeled material hybridized 3.Image analysis ProbeIDSample1Sample2Sample3Sample4 1007_s_at10.9311.4411.1911.64 1053_at8.287.548.067.32 117_at3.313.413.133.13 121_at4.424.324.464.63 1255_g_at1.81.71.751.81 Used for: Gene expression analysis Polymorphism detection Copy number analysis DNA binding studies Data is gathered about the features present on the array.
7
Theory – Next Generation Sequencing (NGS) 1.Generate DNA fragments 2.Attach to surface and amplify in situ. 3.Subject surface to cycles of imaging/chemistry. 4.Image analysis to call base sequences and qualities Used for: Gene expression analysis Polymorphism/Mutation detection Copy number analysis Mixture Quantization DNA or RNA binding studies others… 200+ million clusters per experiment Data is gathered about everything in the input mixture.
8
Theory – NGS Alignment Files 2:75:1538:897 16 chr1 8291 0 60M AGGCCAGGCCCTCHHHHHGGH@HGHHHHH 4:31:101:1130 16 chr1 8328 1 60M CACCTACTTGCCA################ Query Flag Reference Position MapQual CIGAR Sequence Base Quality SAM Format Each line has a lot of information (not all columns are shown) One experiment = millions of lines = many Gb of data Scale of the data causes problems with Excel etc.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.