COT6930 High Performance Computing and Bioinformatics Course overview, Introduction Instructors: Xingquan (Hill) Zhu and Imad Mahgoub http://www.cse.fau.edu/~xqzhu/

Slides:



Advertisements
Similar presentations
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Advertisements

Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Bioinformatics For MNW 2 nd Year Jaap Heringa FEW/FALW Integrative Bioinformatics Institute VU (IBIVU) Tel ,
Bioinformatics at IU - Ketan Mane. Bioinformatics at IU What is Bioinformatics? Bioinformatics is the study of the inherent structure of biological information.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Bioinformatics and Phylogenetic Analysis
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Bioinformatics in the Biology Curriculum Gloria Rendon NCSA July 2008.
Computational Genomics Lecture 1, Tuesday April 1, 2003.
Structural Bioinformatics Dr. Avraham Samson Course no.: Credit points: 1.5 Final grade is based on 10 assignments Course homepage:
Welcome to Introduction to Bioinformatics Computing aka BIC1.
EECS 395/495 Algorithmic Techniques for Bioinformatics General Introduction 9/27/2012 Ming-Yang Kao 19/27/2012.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
BIO337 Systems Biology/Bioinformatics (course # 50524) Spring 2014 Tues/Thurs 11 – 12:30 PM BUR 212 Edward Marcotte/Univ. of Texas/BIO337/Spring 2014.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
BioInformatics - What and Why? The following power point presentation is designed to give some background information on Bioinformatics. This presentation.
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
CSE 6406: Bioinformatics Algorithms. Course Outline
A brief Introduction to Bioinformatics Y. SINGH NELSON R. MANDELA SCHOOL OF MEDICINE DEPARTMENT OF TELEHEALTH Content licensed under.
Introduction to Bioinformatics Prologue. Bioinformatics Living things have the ability to store, utilize, and pass on information Bioinformatics strives.
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Intelligent systems in bioinformatics Introduction to the course.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Bioinformatics: Theory and Practice – Striking a Balance (a plea for teaching, as well as doing, Bioinformatics) Practice (Molecular Biology) Theory: Central.
Introduction to Bioinformatics Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Rachelly Normand Edward Vitkin Course web site :
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2008 Colin Dewey Dept. of Biostatistics & Medical Informatics.
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
Condor: BLAST Monday, July 19 th, 3:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Overview of Bioinformatics 1 Module Denis Manley..
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
Application of Bioinformatics in Genetic Research Instructors: Dr. Henry Baker Dr. Luciano Brocchieri Dr. Michele Tennant Dr. Lei Zhou
Bioinformatics and Computational Biology
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
BCH339N Systems Biology/Bioinformatics (course # 54040) Spring 2016 Tues/Thurs 11 – 12:30 PM BUR 212.
Computer Applications and Bioinformatics
BME435 BIOINFORMATICS.
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
Introduction to Bioinformatics
Introduction to Bioinformatics and Functional Genomics
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
What is Bioinformatics?
14-3 Human Molecular Genetics
Genomes and Their Evolution
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Bioinformatics For MNW 2nd Year
LESSON 1 INTNRODUCTION HYE-JOO KWON, Ph.D /
The Study of Biological Information
Introduction to Bioinformatic
Basic Local Alignment Search Tool (BLAST)
BIO307- Bioengineering principles SPRING 2019
CSE 5290: Algorithms for Bioinformatics Fall 2009
Computational Biology
Introduction to Bioinformatics
BF528 - Applications in Translational Bioinformatics
Introduction to Bioinformatics
Presentation transcript:

COT6930 High Performance Computing and Bioinformatics Course overview, Introduction Instructors: Xingquan (Hill) Zhu and Imad Mahgoub http://www.cse.fau.edu/~xqzhu/ http://www.cse.fau.edu/~imad/ 11/26/2018

About COT 6930 Meeting time: T R 11:00AM-12:20PM Room: GS109 Course home page: http://www.cse.fau.edu/~xqzhu/courses/cot6930.html Course lectures, homework, and solutions will be posted online Make sure you check the course website regularly.

About COT 6930 Instructors: Prof. Imad Mahgoub and Prof. Xingquan Zhu Emails: {imad,xqzhu}@cse.fau.edu Offices: S&E 406 (Dr. Mahgoub), S&E 366 (Dr. Zhu) Office hours: Dr. Zhu: T & R 1:00 – 4:00 pm or by appointment Dr. Mahgoub: T & R 12:25 – 3:25 pm or by appointment

Introduce Yourself Your name Your major and class Your background Your research interests Why you study HPC & Bioinformatics Your expectations from this course Other

Expected Background Data Structures and Programming Statistics: good if you’ve had at least one course, but not required We will cover necessary statistics background Molecular biology: no knowledge assumed, but you should be interested in learning some basic molecular biology concepts

Course Objective Deals with high performance computing in Bioinformatics research Bioinformatics basic concepts Cell Biology and molecular biology review DNA, to RNA to proteins, protein structure and function prediction Biological network and DNA microarrays Bioinformatics databases and tools Bioinformatics algorithms Pairwise sequence alignment, multiple sequence alignment Bioinformatics classification algorithms Parallel architectures and parallel programming paradigms Bioinformatics programs analysis & parallelization Parallel computation in biological classification and sequence analysis FLOPS: floating-point operations per second

Textbook No required textbook: Supplementary recommended reading: Bioinformatics and Functional Genomics, by Jonathan Pevsner (Wiley, 2003) http://www.bioinfbook.org This has 1100 URLs, organized by chapter Some reading assignments may be in the form of papers Supplementary recommended reading: An Introduction to Bioinformatics Algorithms, by Neil C. Jones and Pavel A. Pevzner, MIT Press, 2004. Developing Bioinformatics Computer Skills, C. Gibas and P. Jambeck, O’reilly, 2001 Parallel Computing for Bioinformatics and Computational Biology by Albert Zomaya, Wiley 2006.

Grading Policy Homework: 20% Course projects: 30% Presentation: 15% Participation: 10% Final: 25%

Grading Policy Cutoffs for grades (roughly) A: 85 – 100 B: 70 – 84

Lecture 1: Introduction to Bioinformatics

Bioinformatics - Definition NIH Definition: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. Definition varies and is somewhat vague Typically taken to include algorithms, databases, and data analysis Mathematical modeling or simulation of biological systems is typically excluded A simpler definition Computers + Biology = Bioinformatics (was an O’Reilly book)

The need for bioinformatics The need for bioinformatics. The number of entries in biological databases is increasing exponentially. Bioinformatics is needed to understand and use this information. GenBank Growth As of August 2004, it contains 41.8 billion nucleotide bases from 37.3 million sequences. In 2004 along, more than 7.9 million new sequences were added to GeneBank 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=540017

What is Bioinformatics Representation/storage/retrieval/analysis of the biological data Concerning Sequences Structures Functions Sometimes used synonymously with computational biology or computational molecular biology Highly interdisciplinary nature Biology, mathematics, statistics, computer science, biochemistry, physics, chemistry, medicine, …

Bioinformatics Requires Interdisciplinary Research From the “Bioinformatics: Building Bridges Symposium (April 13, 2006) http://www.binf.umn.edu/bisymp06/

Promises of Bioinformatics SmartMoney ranked bioinformatics as #1 among the next hot jobs (June 2002). “The fusion of biology and computer science is the hottest of the hot in science right now, and it's going to heat up even more. Bioinformaticians , …, use computer modeling to predict which drugs will work on which illnesses, shaving the time and cost of getting new products to market.”

Promises of Bioinformatics Medicine Knowledge of protein structure facilitates drug design How drugs work? (Animation) Generally work by interacting with receptors on the surface of cells or enzymes (which regulate the rate of chemical reactions) within cells. Blocking, inhibiting, or simulating protein functions. Understanding of genomic variation allows the tailoring of medical treatment to the individual’s genetic make-up Genome analysis allows the targeting of genetic diseases The effect of a disease or of a therapeutic on RNA and protein levels can be elucidated The same techniques can be applied to biotechnology, crop and livestock improvement, etc...

Challenges in bioinformatics Explosion of information Need for faster, automated analysis to process large amounts of data Need for integration between different types of information sequences, literature, annotations, protein levels, RNA levels etc… Need for “smarter” software to identify interesting relationships in very large data sets Lack of “bioinformaticians” Software needs to be easier to access, use and understand Biologists need to learn about the software, its limitations, and how to interpret its results

Bioinformatics Topics Sequence alignments Find similarity between DNA / protein (amino acid) sequences Genome assembly Combining genomic fragments to form whole genome Gene identification & annotation Identify and classify genes on the genome (the functions of 90% of human genes are unknown, although their sequence information are available) Microarrays & gene expression analysis Use DNA microarray (gene chip) to measure mRNA Protein folding Compute 3D protein structure ↔ protein sequence Phylogenetic analysis Find genetic relationships between sequences / species

Topics Covered in this class Introduction to Bioinformatics & Molecular Biology We will present a very brief introduction to molecular biology DNA, RNA, Proteins, Gene expression: from DNA to protein, Central dogma of molecular biology & bioinformatics Bioinformatics databases and tools Blast, genbank, protein bank etc. Sequence Alignment Multiple sequence alignment Protein structure Protein structure prediction Gene expression & data analysis

Molecular biology databases Genomic sequence database Gene expression database Protein sequence database Protein structure database Protein family database

Sequence Alignment Pairwise sequence alignment is the most fundamental operation of bioinformatics Compare two (pairwise) or more (multiple) sequences DNA – 4 letters; Protein – 20 letters Useful for discovering functional, structural, and evolutionary information in biological sequences Assumptions: similar sequences may have the same function; or two similar sequences from different organisms may have a common ancestor sequence (homologous)

Sequence alignment: DNA sequences can be aligned to see similarities between gene from different sources 768 TT....TGTGTGCATTTAAGGGTGATAGTGTATTTGCTCTTTAAGAGCTG 813 || || || | | ||| | |||| ||||| ||| ||| 87 TTGACAGGTACCCAACTGTGTGTGCTGATGTA.TTGCTGGCCAAGGACTG 135 . . . . . 814 AGTGTTTGAGCCTCTGTTTGTGTGTAATTGAGTGTGCATGTGTGGGAGTG 863 | | | | |||||| | |||| | || | | 136 AAGGATC.............TCAGTAATTAATCATGCACCTATGTGGCGG 172 864 AAATTGTGGAATGTGTATGCTCATAGCACTGAGTGAAAATAAAAGATTGT 913 ||| | ||| || || ||| | ||||||||| || |||||| | 173 AAA.TATGGGATATGCATGTCGA...CACTGAGTG..AAGGCAAGATTAT 216 mismatch match gap

Database similarity searching: The BLAST program has been written to allow rapid comparison of a new gene sequence with the 100s of 1000s of gene sequences in databases (2) (1) Sequences producing significant alignments: (bits) Value gnl|PID|e252316 (Z74911) ORF YOR003w [Saccharomyces cerevisiae] 112 7e-26 gi|603258 (U18795) Prb1p: vacuolar protease B [Saccharomyces ce... 106 5e-24 gnl|PID|e264388 (X59720) YCR045c, len:491 [Saccharomyces cerevi... 69 7e-13 gnl|PID|e239708 (Z71514) ORF YNL238w [Saccharomyces cerevisiae] 30 0.66 gnl|PID|e239572 (Z71603) ORF YNL327w [Saccharomyces cerevisiae] 29 1.1 gnl|PID|e239737 (Z71554) ORF YNL278w [Saccharomyces cerevisiae] 29 1.5 gnl|PID|e252316 (Z74911) ORF YOR003w [Saccharomyces cerevisiae] Length = 478 Score = 112 bits (278), Expect = 7e-26 Identities = 85/259 (32%), Positives = 117/259 (44%), Gaps = 32/259 (12%) Query: 2 QSVPWGISRVQAPAAHNRG---------LTGSGVKVAVLDTGIST-HPDLNIRGG-ASFV 50 + PWG+ RV G G GV VLDTGI T H D R + + Sbjct: 174 EEAPWGLHRVSHREKPKYGQDLEYLYEDAAGKGVTSYVLDTGIDTEHEDFEGRAEWGAVI 233 Query: 51 PGEPSTQDGNGHGTHVAGTIAALNNSIGVLGVAPSAELYXXXXXXXXXXXXXXXXXQGLE 110 P D NGHGTH AG I + + GVA + ++ +G+E Sbjct: 234 PANDEASDLNGHGTHCAGIIGSKH-----FGVAKNTKIVAVKVLRSNGEGTVSDVIKGIE 288 (3)

Multiple sequence alignment: Sequences of proteins from different organisms can be aligned to see similarities and differences

Protein structure Proteins perform various functions in cells. The 3-D structure of a protein determines its function. One of the major goals of bioinformatics is to understand the relationship between amino acid sequence and 3-D structure in proteins. In theory, the structure of a protein could be reliably predicted from the amino acid sequence.

Protein Structure/Function > 1NLG:_ NADP-LINKED GLYCERALDEHYDE-3-PHOSPHATE EKKIRVAINGFGRIGRNFLRCWHGRQNTLLDVVAINDSGGVKQASHLLKYDSTLGTFAAD VKIVDDSHISVDGKQIKIVSSRDPLQLPWKEMNIDLVIEGTGVFIDKVGAGKHIQAGASK VLITAPAKDKDIPTFVVGVNEGDYKHEYPIISNASCTTNCLAPFVKVLEQKFGIVKGTMT TTHSYTGDQRLLDASHRDLRRARAAALNIVPTTTGAAKAVSLVLPSLKGKLNGIALRVPT PTVSVVDLVVQVEKKTFAEEVNAAFREAANGPMKGVLHVEDAPLVSIDFKCTDQSTSIDA SLTMVMGDDMVKVVAWYDNEWGYSQRVVDLAEVTAKKWVA Amino Acid Sequence 3-D Structure Protein Function Classification: Gene Transfer EC Number: 1.2.1.13 Computational Challenges: Determine structure from sequence Determine function from sequence/3D structure

Gene expression and data analysis Microarray High-throughput approaches based on hybridization principle, developed recently. Generate terabytes of information that are overwhelming conventional methods of biological analysis different from sequence analysis. Microarray technology allows biologists to study genome-wide patterns of gene expression in any given cell type, at any given time, and under any given set of conditions cancer classification. Microarray data clustering and classification

Gene expression and data analysis Microarray analysis Clustering Classification

High Performance Computing Increase available computation power Exploit parallelism Today’s supercomputers will become our desktop in next 10 years Use multiple processors in parallel Application must be parallelized Exploit locality processors faster than memory, network in cache → avoid memory latency on processor → avoid network latency

HPC Topics Architectures Shared-memory multiprocessors Distributed-memory multiprocessors Cluster

HPC Topics Parallel Programming Shared memory paradigm Distributed memory paradigm Languages

HPC Topics Compilers Run-time systems Program analysis Program transformations Locality optimizations Parallelism optimizations Run-time systems

Course’s Main Point Gain Bioinformatics knowledge set, as well as parallel programming experiences for Bioinformatics research Timeline: Month 1: Bioinformatics knowledge set Month 2: Parallel programming Month 3: HPC and Bioinformatics Remaining time: Student presentation

Reading assignment L. Hunter, Molecular Biology for Computer Scientists, Artificial Intelligence for Molecular Biology, L. Hunter Ed., pp. 1-46, AAAI Press, 1993. (online download: http://compbio.uchsc.edu/Hunter/01-Hunter.pdf)