Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Slides:



Advertisements
Similar presentations
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Advertisements

Welcome to Chem 434 Bioinformatics Sept 20, 2012 Review of course prerequisites Review of syllabus Review of CSULA Bioinformatics Course website.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Introduction to Bioinformatics Yana Kortsarts Bob Morris.
Bioinformatics For MNW 2 nd Year Jaap Heringa FEW/FALW Integrative Bioinformatics Institute VU (IBIVU) Tel ,
Bioinformatics at IU - Ketan Mane. Bioinformatics at IU What is Bioinformatics? Bioinformatics is the study of the inherent structure of biological information.
Bioinformatics Master’s Course Genome Analysis ( Integrative Bioinformatics ) Lecture 1: Introduction Centre for Integrative Bioinformatics VU (IBIVU)
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Jianlin Cheng, PhD Informatics Institute, Computer Science Department University of Missouri, Columbia Fall, 2011.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Welcome to Chem 434 Bioinformatics March 25, 2008 Review of course prerequisites Review of syllabus Review of Bioinformatics Course website Course objectives.
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Sanguinetti, 2012Bio2 lecture 1 Bioinformatics 2 “My main problem these days is that I don’t understand how we go from an experiment in the lab to a number.
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
Ayesha Masrur Khan Spring Course Outline Introduction to Bioinformatics Definition of Bioinformatics and Related Fields Earliest Bioinformatics.
EECS 395/495 Algorithmic Techniques for Bioinformatics General Introduction 9/27/2012 Ming-Yang Kao 19/27/2012.
BIO337 Systems Biology/Bioinformatics (course # 50524) Spring 2014 Tues/Thurs 11 – 12:30 PM BUR 212 Edward Marcotte/Univ. of Texas/BIO337/Spring 2014.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Computational Biology Dr. Jens Allmer Lecture Slides Week 1 Week 1.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Cpt S 471/571: Computational Genomics Spring 2015, 3 cr. Where: Sloan 9 When: M WF 11:10-12:00 Instructor weekly office hour for Spring 2015: Tuesdays.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
BIT 115: Introduction To Programming1 Sit in front of a computer Log in –Username: 230class –password: –domain: student Bring up the course web.
COMP Introduction to Programming Yi Hong May 13, 2015.
Introduction to Bioinformatics Prologue. Bioinformatics Living things have the ability to store, utilize, and pass on information Bioinformatics strives.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
C E N T R F O I G A V B M S U 2MNW/3I/3AI/3PHAR bachelor course Introduction to Bioinformatics Lecture 1: Introduction Centre for Integrative Bioinformatics.
Intelligent systems in bioinformatics Introduction to the course.
Genomes and Genomics.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Bioinformatics For MNW 2 nd Year Jaap Heringa FEW/FALW Centre for Integrative Bioinformatics VU (IBIVU) Tel ,
Welcome to the Seminar Professor Fred Bittner.  Review Key Terms  Introduce Yourself to your classmates  Read Chapters 1 and 2 in Criminal Investigation.
+ => Bioinformatics: from Sequence to Knowledge Outline: Introduction to bioinformatics The TAU Bioinformatics unit Useful bioinformatics issues and databases:
CS 461b/661b: Bioinformatics Tools and Applications Software Algorithm Mathematical Models Biology Experiments and Data.
Initial sequencing and analysis of the human genome Averya Johnson Nick Patrick Aaron Lerner Joel Burrill Computer Science 4G October 18, 2005.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Course Sequence Analysis for Bioinformatics Master’s Bart van Houte, Radek Szklarczyk, Victor Simossis, Jens Kleinjung, Jaap Heringa
Overview of Bioinformatics 1 Module Denis Manley..
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
WMU CS 6260 Parallel Computations II Spring 2013 Presentation #1 about Semester Project Feb/18/2013 Professor: Dr. de Doncker Name: Sandino Vargas Xuanyu.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
Please initial the attendance roster near the door. If you are on the Wait List you will find your name at the bottom. If you are not on the roster, please.
Bioinformatics and Computational Biology
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
BCH339N Systems Biology/Bioinformatics (course # 54040) Spring 2016 Tues/Thurs 11 – 12:30 PM BUR 212.
Course Overview Stephen M. Thebaut, Ph.D. University of Florida Software Engineering.
Chapter 13 Section 13.3 The Human Genome. Genomes contain all the information needed for an organism to grow and survive The Human Genome Project (HGP)
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.
BME435 BIOINFORMATICS.
Computational Biology
What is Bioinformatics?
Genomes and Their Evolution
C E N T R F O I G A V B M S U 2MNW/3I/3AI/3PHAR bachelor course Introduction to Bioinformatics Lecture 1: Introduction Centre for Integrative Bioinformatics.
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Cpt S 471/571: Computational Genomics
Bioinformatics For MNW 2nd Year
BIOINFORMATICS Summary
BIT 115: Introduction To Programming
CSE 5290: Algorithms for Bioinformatics Fall 2009
Introduction to Bioinformatics
Presentation transcript:

Applied Bioinformatics Dr. Jens Allmer Week 1 (Introduction)

Your Instructor Education –BSc: University of Münster 1996 –MSc: University of Münster 2002 –PhD: University of Münster 2006 Worked at –Izmir Institute of Technology (since 2008) –Izmir University of Economics, Turkey (Feb 2007 – Aug 2008) –University of Muenster, Germany (Jan 2006 – Feb 2007) –University of Pennsylvania, USA (Jan 2004 – Dec 2005) –University of Jena, Germany (Nov 2002 – Dec 2003)

Areas of Interest Bioinformatics –Sequences –Alignments Mass Spectrometry –De novo sequencing –Pattern matching Annotation –Integration –Automatic assessments General Automation and Productivity

Course Rules Attendance –Is essential and will be monitored strictly –if(absence > 12h) Then NA; Make-up Work –None

Course Rules Lecture starts on time –if late enter QUIETLY –if more then 5 min late DO NOT ENTER wait for break Breaks are 10 min max –if late after break enter QUIETLY –if more then 5 min late DO NOT ENTER wait for next break Early leave –Announce before course and leave if granted

Course Rules Project –Parts to be performed published on the website and/or as slides –Deadline 6pm on the day before the next class (you may submit early of course) –No extention –No make-up –No extra work Must be electronicly submitted to: –Must be named ????_first_last.eee or will not be accepted –Formats include: doc, ppt, odx, txt, html,... –Not allowed are formats that may not be edited by me like pdf, and similar formats that are not widespread –Must be significantly different from your classmates –Otherwise everyone involved will obtain zero for that assignment

Grading All information available on class website Grading individualized –Quizzes15% –Mind Maps10% –Midterm 125% –Midterm 225% –Project25%

Project Group Formation 0%( :00) –Group Size: 4 First Draft 25%( :00) Results 15%( :00) Second Draft 20%( :00) Presentation 10%( :00) Final Version 25%( :00)

Grading I am responsible to evaluate you –I am not responsible to pass everyone or give great grades Make it easy for me 1.Show up and participate 2.Do homeworks and pre-course preparations 3.Midterm and Final will be easy for you if you adhere to 1. and 2.

Course Structure –Start –10 min quiz –35 min lecture – 5 min mind mapping –10 min break –50 min practice –10 min break –40-50 min lecture –10 min break –30 min practice

Textbooks Primary audience Junior bio majors Course home page: ISBN: jens-allmer/tanim.asp?sid=GUFFOI44R7FJ9CIR6STU

Textbooks Everything you currently need to know about Applied Bioinformatics in regard to practical problems you will encounter during everyday research.

Mathematics Statistics Computer Science Informatics Biology Molecular biology Medicine Chemistry Physics Bioinformatics

Bioinformatics is Multidisciplinary Computer Science Math Statistics Structural Biology Phylogenetics Drug Design Genomics Molecular Life Sciences

The Pyramid of Life (2000) 30,000 Genes 3,000 Enzymes 1400 Chemicals Metabolomics Proteomics Genomics B I O I N F O R M A T I C S

The Pyramid of Life 100,000 Proteins 30,000 Genes 1400 Chemicals Protein Interactions?

Bioinformatics (or Computational Biology) Not just the study of DNA or protein sequence data Inclusive definition – concerns the storage, display, reduction, management, analysis, extraction, simulation, modeling, fitting or prediction of biological, medical or pharmaceutical data

Basis of molecular life sciences Hierarchy of relationships (some exceptions): Genome Gene 1Gene 3Gene 2Gene X Protein 1Protein 2Protein 3Protein X Function 1Function 2Function 3Function X

How can one use bioinformatics to link diseases to genes? Positional cloning of genes 1.Find genetic markers associated with disease 2.Sequence DNA next to the markers 3.Compare DNA from afflicted individuals to DNA of normal individuals (database) 4.Find abnormalities 5.Predict gene function from sequence information Disease Map Gene Function

Bioinformatics in the old days Close to Molecular Biology: –(Statistical) analysis of protein and nucleotide structure –Protein folding problem –Protein-protein and protein-nucleotide interaction Many essential methods were created early on –Protein sequence analysis (pairwise and multiple alignment) –Protein structure prediction (secondary, tertiary structure) Evolution was studied and methods created –Phylogenetic reconstruction (clustering – e.g., Neighbor Joining (NJ) method) –Nowadays also part of Datamining

But then the big bang….

The Human Genome - 26 June 2000 Dr. Craig Venter Celera Genomics -- Shotgun method Francis Collins (USA)/Sir John Sulston (UK) Human Genome Project

Human DNA There are at least 3bn (3  10 9 ) nucleotides in the nucleus of almost all of the trillions (3.2  ) of cells of a human body (an exception is, for example, red blood cells which have no nucleus and therefore no DNA) – a total of ~10 22 nucleotides! Many DNA regions code for proteins, and are called genes (1 gene codes for 1 protein as a base rule, but the reality is a lot more complicated) –Name examples Human DNA may contain ~27,000 expressed genes –Problems? Deoxyribonucleic acid (DNA) comprises 4 different types of nucleotides: adenine (A), thiamine (T), cytosine (C) and guanine (G). These nucleotides are sometimes also called bases –Ambiguities?

Y-Chromosome 50% of the sequence consists of NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN Not very meaningful –Explanation.... Same as in x chromosome –What about the N’s in chr 1?

Human DNA (Cont.) All people are different but the DNA of different people only varies for 0.2% or less So, only up to 2 letters in 1000 are expected to be different. Evidence in current genomics studies (Single Nucleotide Polymorphisms or SNPs) imply that on average only 1 letter out of 1400 is different between individuals. Over the whole genome, this means that 2 to 3 million letters would differ between individuals.

Modern bioinformatics is closely associated with genomics The aim is to solve the genomics information problem Ultimately, this should lead to biological understanding how all parts fit (DNA, RNA, proteins, metabolites) and how they interact (gene regulation, gene expression, protein interaction, metabolic pathways, protein signaling, etc.)

TERTIARY STRUCTURE (fold) Genome Expressome Proteome Metabolome Functional Genomics From gene to function Interactome?

Unknown Function How much of the genome is defined?

What is bioinformatics? E.g. Process the spots on a microarray, determine which genes are differentially expressed, link spots to sequence via a database, analyze the sequence using predictive tools, link the genes to related genes to form a network Comp sci Bio Math Stats Machine learning Database systems Data mining Image processing Modeling Graph theory Statistical analysis Sequence Structure Interactions Regulation Genomes Evolution PhysicsEnglish Bioinformatics Chem

What is a bioinformatician? Somebody who knows everything

What is a bioinformatician? facilitatorA facilitator –Typically has background in biology or CS, but is comfortable with concepts from other disciplines –Bring together ideas (or researchers) from different domains to solve a biological problem Conceptualize the problem –Use language appropriate to the domain Identify potential solutions –Understanding of different fields helps to identify possible approaches at a broad level Guide the development process –Create in-house or find potential collaborators to work on approaches in-depth Integrate results into overall solution –Software/method, results of biological analysis

How is Bioinformatics Used? Experimental proof is still the “Gold Standard”. Bioinformatics isn’t going to replace lab work anytime soon Bioinformatics is used to help “focus” the scientist on the bench top experiments

Bioinformatics Is application of computational tools in Biology Bioinformatics? Not really! In this course we will however only go into algorithmic details rarely (like today ;)

Mind Mapping Have you ever studied a subject or brainstormed an idea, only to find yourself with pages of information, but no clear view of how pieces fit together?  Mind mapping –Learn more effectively –Improves memorization –Enhances creativity –Speeds up analyses –Gives structure to complex ideas –Records information for future use Source:

An Example Mind Map for MicroRNAs

How to Mind Map 1.Identify the central topic write in center 2.Write major parts of the topic on lines in all directions 3.Repeat 2. with ever finer level of detail until satisfied Source:

Note Taking with Mind Maps Capture ideas organized into topics –What if the central topic which I chose is not the central topic? –Make a new mind map which captures the topic correctly Uses Cases –Note taking in class –Recapitulization after lecture –Analysis of a new topic –Structuring of any intended writing When –During acquisition of new knowledge (faster than writing) –For review 5m, 1h, 6h, 1d, 7d, 1m after note taking

Mind Mapping Tips 1.Use single words or very short phrases 2.Write clearly and readable 3.Use color! 4.Seperate ideas (color, lines, shading) 5.Draw symbols and images 6.Draw links among elements

A More Elaborate Mind Map Source:

At the Heart of Bioinformatics >scaffold_1152 GGTGCGGCCGTCCTCCAGCTGCTTGCCGGCGAAGATCAGGCGCTGCTGGT CCGGGGGGATGCCTGCATCCGGTGAGGAAACGCTCGTGTCAGACAAAGTG GGTGGGCGCAGGAAGCAGCAATCAACACAGCCCAGTGCAGCTGCAAAGCG CCCGCCTTACCACTGACCCGCCTGGCCACCCACCCCTACCCCCCGTAAGG AAAGAGCCCCGACTCACCCTCCTTGTCCTGAATCTTGGCCTTCACGTTCT CAATGGTGTCCGAAGACTCCACCTCGAGCGTGATGGTCTTGCCCGTCAGG GTCTTGACGAAGATCTGCATGCCACCGCGCAGGCGCAGCACCAGGTGCAG … Genomic >RF1_scaffold_1152 GAAVLQLLAGEDQALLVRGDACIR$GNARVRQSGWAQEAAINTAQCSC KAPALPLTRLATHPYPP$GKSPDSPSLS$ILARDVAHDFAKSSPR$YA PLIPQNLRC$SIEMKQPASLLSPIGEGACASHLQCLEKCLLP$GAIVY MIS$GSGRR$TSWVGIGGCNDGTEKRSEVDSRRGGKGNIHD >RF2_scaffold_1152 VRPSSSCLPAKIRRCWSGGMPASGEETLVS AATAAKPQTWSPTAWEF KVGGRRKQQSTQPSAAAKRPPYH$PAWPPTPTPRKERAPTHPPCPESW SRSQWCPKTPPRA$WSCPSGS$RRSACHRAGAAPGAGSTPSGCCSQPG CGRPPAACRRRSGAAGPGGCLCVGGGGEGACASHLQCLEGE … Translated

Your Task You may only compare 1 character at a time You may create helpful structures You should find the location of the pattern in the Sequence with a minimal number of comparisons Try it for yourself ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Sequence Pattern Your Task You may only compare 1 character at a time You may create helpful structures You should find the location of the Pattern in the Sequence with a minimal number of comparisons

Brute Force Approach ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 1

Brute Force Approach ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 2

Brute Force Approach ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 3

Brute Force Approach ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 4

Brute Force Approach ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 6

Brute Force Approach ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 7-16

Brute Force Approach ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 17-22

Boyer-Moore Algorithm ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 1 Preprocessing Good suffix matrix (m+1) Bad character matrix (m+1)

Boyer-Moore Algorithm ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 2

Boyer-Moore Algorithm ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 3-7

Boyer-Moore Algorithm ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 8

Boyer-Moore Algorithm ACGGTAGTATGTGATGTATGATCGCGAAAGAGG TGATGT Comparisons: 9-15

Questions

Define Algorithm

Website Slides Homework Additional materials and challenges Grades

Website To see your grades you need to login Some material may need login as well Currently –UserID = StudentID –Password = StudentID Change now –UserID = working address –Password = whatever you will remember

Login to mbg305.allmer.de We will now assist you to log in and to add your address and change your password.

Assignments –Research about Mind Maps E.g.: IYTE library –Make sure to read the lecture notes for next week (Available online on Wednesday) –Read Chapters 1 and 2 from our textbook