Advanced Bioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2018 Anthony Gitter gitter@biostat.wisc.edu www.biostat.wisc.edu/bmi776/

Slides:



Advertisements
Similar presentations
CS6501: Text Mining Course Policy
Advertisements

BCH364C/391L Systems Biology/Bioinformatics (course # 54995/55095) Spring 2015 Tues/Thurs 11 – 12:30 PM BUR 212 Edward Marcotte/Univ. of Texas/BCH391L/Spring.
CSE 5522: Survey of Artificial Intelligence II: Advanced Techniques Instructor: Alan Ritter TA: Fan Yang.
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
OBJECT ORIENTED PROGRAMMING I LECTURE 1 GEORGE KOUTSOGIANNAKIS
EECS 395/495 Algorithmic Techniques for Bioinformatics General Introduction 9/27/2012 Ming-Yang Kao 19/27/2012.
Introduction to Programming Environments for Secondary Education CS 1140 Dr. Ben Schafer Department of Computer Science.
BIO337 Systems Biology/Bioinformatics (course # 50524) Spring 2014 Tues/Thurs 11 – 12:30 PM BUR 212 Edward Marcotte/Univ. of Texas/BIO337/Spring 2014.
COP4020/CGS5426 Programming languages Syllabus. Instructor Xin Yuan Office: 168 LOV Office hours: T, H 10:00am – 11:30am Class website:
CS6501 Information Retrieval Course Policy Hongning Wang
© 2004 Goodrich, Tamassia CS2210 Data Structures and Algorithms Lecture 1: Course Overview Instructor: Olga Veksler.
CSCI 347 – Data Mining Lecture 01 – Course Overview.
Cpt S 471/571: Computational Genomics Spring 2015, 3 cr. Where: Sloan 9 When: M WF 11:10-12:00 Instructor weekly office hour for Spring 2015: Tuesdays.
CSE 501N Fall ‘09 00: Introduction 27 August 2009 Nick Leidenfrost.
CS6501 Information Retrieval Course Policy Hongning Wang
Object Oriented Programming (OOP) Design Lecture 1 : Course Overview Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang.
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2015 Colin Dewey
Object Oriented Programming (OOP) Design Lecture 1 : Course Overview Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang.
CS397-CXZ Algorithms in Bioinformatics ChengXiang (“Cheng”) Zhai, Robert Skeel (Department of Computer Science) Nick Sahinidis (Department of Chemical.
Intelligent systems in bioinformatics Introduction to the course.
Programming In Perl CSCI-2230 Thursday, 2pm-3:50pm Paul Lalli - Instructor.
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2008 Colin Dewey Dept. of Biostatistics & Medical Informatics.
Overview of Bioinformatics 1 Module Denis Manley..
1 YORK UNIVERSITY Department of Biology Faculty of Science and Engineering Course outline Immunobiology (SC/BIOL ) W2011 Prerequisites: SC/BIOL2020.
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
CS Welcome to CS 5383, Topics in Software Assurance, Toward Zero-defect Programming Spring 2007.
EB3233 Bioinformatics Introduction to Bioinformatics.
Data Structures and Algorithms in Java AlaaEddin 2012.
BCH339N Systems Biology/Bioinformatics (course # 54040) Spring 2016 Tues/Thurs 11 – 12:30 PM BUR 212.
BNFO 615 Fall 2016 Usman Roshan NJIT. Outline Machine learning for bioinformatics – Basic machine learning algorithms – Applications to bioinformatics.
CSCI 1730: C++ and System Programming
Welcome to CS 4390/CS5381: Introduction to Formal Methods
Computer Engineering Department Islamic University of Gaza
CS6501 Advanced Topics in Information Retrieval Course Policy
CSE 102/ISE 102 Introduction to Web Design and Programming
CS101 Computer Programming I
CMSC 471 Principles of Artificial Intelligence Course Overview
Computer Networks CNT5106C
Spring 2017 Tues/Thurs 11 – 12:30 PM GDC 4.302
It’s called “wifi”! Source: Somewhere on the Internet!
CMSC 471 Introduction to Artificial Intelligence section 1 Course Overview Spring 2017.
Syllabus Overview CSE 4309 – Machine Learning Vassilis Athitsos
CS598CXZ (CS510) Advanced Topics in Information Retrieval (Fall 2016)
Computer Science 102 Data Structures CSCI-UA
Learning Sequence Motif Models Using Expectation Maximization (EM)
Inferring Models of cis-Regulatory Modules using Information Theory
CS410: Text Information Systems (Spring 2018)
Cpt S 471/571: Computational Genomics
Computer Networks CNT5106C
CS7280: Special Topics in Data Mining Information/Social Networks
Eukaryotic Gene Finding
CS510 (Fall 2018) Advanced Topics in Information Retrieval
Introduction to Programming Using C++
Cpt S 471/571: Computational Genomics
EE422C Software Design and Implementation II
PHYS 202 Intro Physics II Catalog description: A continuation of PHYS 201 covering the topics of electricity and magnetism, light, and modern physics.
Accelerated Introduction to Computer Science
Topics in Applied Microbiology
CS4501: Information Retrieval Course Policy
Lecture 1a- Introduction
Dept. of Computer Science University of Liverpool
Analysis of Algorithms
CS 3950 Introduction to Computer Science Research
Topics in Applied Microbiology
BCH339N Systems Biology/Bioinformatics (course # 54510)
Lecture 1a- Introduction
New Student Orientation
Course Introduction Data Visualization & Exploration – COMPSCI 590
CS201 – Course Expectations
Course overview Lecture : Juan Carlos Niebles and Ranjay Krishna
Presentation transcript:

Advanced Bioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2018 Anthony Gitter gitter@biostat.wisc.edu www.biostat.wisc.edu/bmi776/ These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Mark Craven, Colin Dewey, and Anthony Gitter

Agenda Today Introductions Course information Overview of topics

Course Web Site www.biostat.wisc.edu/bmi776/ Syllabus and policies Readings Tentative schedule Lecture slides (draft posted before lecture) Announcements Homework Project information Link to Piazza discussion board

Your Instructor: Anthony Gitter Email: gitter@biostat.wisc.edu Office: room 3268, Discovery Building Assistant professor, Biostatistics & Medical Informatics Affiliate faculty, Computer Sciences Investigator, Morgridge Institute for Research Research interests: biological networks, time series analysis, applied machine learning

Your TA: Fangzhou Mu Email: fmu2@wisc.edu Office: WID 3365C Graduate student, Pharmacy

Office Hours Instructor: Tuesday and Thursday, 2:30-3:30 PM Immediately after class TA: Monday, 1:00-2:00 PM

Finding Our Offices: Discovery Building Engineering Hall Computer Sciences Union South 3rd floor has restricted access See Piazza before 1/28 if you need building access Stop at visitor desk to call my office if card does not work

Finding Our Offices: Discovery Building my office TA office

You So that we can all get to know each other better, please tell us your name major or graduate program research interests and/or topics you’re especially interested in learning about favorite programming language

Course Requirements 4 or 5 homework assignments: ~40% Written exercises Programming (Python) Computational experiments (e.g. measure the effect of varying parameter x in algorithm y) Five late days permitted Project: ~25% Midterm: ~15% Final exam: ~15% Class participation: ~5%

Exams Midterm: March 13, in class Final: Wednesday May 9, 12:25-2:25 PM Let me know immediately if you have a conflict with either of these exam times

Computing Resources for the Class Linux servers in Dept. of Biostatistics & Medical Informatics No “lab”, must log in remotely (use WiscVPN) Will create accounts for everyone on course roster Two machines mi1.biostat.wisc.edu mi2.biostat.wisc.edu HW0 tests your access to these machines Homework must be able to run on these machines CS department usually offers Unix orientation sessions at beginning of semester

Programming Assignments All programming assignments require Python Project can be in any language Have a Python 3 environment on biostat servers Permitted packages on course website Can request others HW0 will be Python introduction Use Piazza for Python discussion If you know Python, please help answer questions

Project Design and implement a new computational method for a task in molecular biology Improve an existing method Perform an evaluation of several existing methods Run on real biological data Suggestions will be provided Not directly related to your existing research Can email me now to discuss ideas

Participation Do the assigned readings before class Show up to class No one will have the perfect background Ask questions about computational or biological concepts Correct me when I am wrong Seriously, it will happen Piazza discussion board Questions and answers

Piazza Discussion Board Instead of a mailing list http://piazza.com/wisc/spring2018/bmics776/home Post your questions to Piazza instead of emailing the instructor or TA Unless it is a private issue or project-related Answer your classmates’ questions Announcements will also be posted to Piazza Supplementary material for lecture topics

Course Readings Mostly articles from the primary literature Must be using a campus IP address to download some of the articles (can use WiscVPN from off campus) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Cambridge University Press, 1998.

Prerequisites BMI/CS 576 or equivalent Knowledge of basic biology and methods from that course will be assumed May want to go over the material on the 576 website to refresh http://www.biostat.wisc.edu/bmi576/

What you should get out of this course An understanding of some of the major problems in computational molecular biology Familiarity with the algorithms and statistical techniques for addressing these problems How to think about different data types At the end you should be able to Read the bioinformatics literature Apply the methods you have learned to other problems both within and outside of bioinformatics Write a short bioinformatics research paper

Major Topics to be Covered (the algorithms perspective) Expectation Maximization Gibbs sampling Mutual information Multiple hypothesis testing correction Convolutional neural networks Linear programming Interpolated Markov models Duration modeling and semi-Markov models Tries and suffix trees Markov random fields Stochastic context free grammars

Major Topics to be Covered (the task perspective) Modeling of motifs and cis-regulatory modules Identification of transcription factor binding sites Genotype analysis and association studies Regulatory information in epigenomic data Transcriptome quantification Mass spectrometry peptide and protein identification Pathways in cellular networks Gene finding Large-scale sequence alignment RNA sequence and structure modeling

Motif Modeling What sequence motif do these promoter regions have in common?

cis-Regulatory Modules What configuration of sequence motifs do these promoter regions have in common? RNAP …accgcgctgaaaaaattttccgatgagtttagaagagtcaccaaaaaattttcatacagcctactggtgttctctgtgtgtgctaccactggctgtcatcatggttgta… …caaaattattcaagaaaaaaagaaatgttacaatgaatgcaaaagatgggcgatgagataaaagcgagagataaaaatttttgagcttaaatgatctggcatgagcagt… …gagctggaaaaaaaaaaaatttcaaaagaaaacgcgatgagcatactaatgctaaaaatttttgaggtataaagtaacgaattggggaaaggccatcaatatgaagtcg… RNAP RNAP

Genome-wide Association Studies Which genes are involved in diabetes?

Noncoding Genetic Variants How do genetic variants outside protein coding regions impact phenotypes? Patton and Harrington, Nature, 2013

Transcriptome Analysis with RNA-Seq What genes are expressed and at what levels?

Proteomic Analysis with Mass Spectrometry What proteins are expressed and at what levels? Steen and Mann, Nature Reviews Molecular Cell Biology, 2004

Identifying Signaling Pathways How do proteins coordinate to transmit information? Yeger-Lotem et al., Nature Genetics, 2009

Gene Finding Where are the genes and functional elements in a genome?

Large Scale Sequence Alignment What is the best alignment of these 6 genomes?

RNA Sequence and Structure Modeling How can we identify sequences that encode this RNA structure?

Other Topics Many topics we aren’t covering Protein structure prediction Protein function annotation Modeling long reads Metagenomics Metabolomics Sequence compression Graph genomes Single-cell sequencing Pseudo- and quasi-alignment Text mining Others?

Reading Groups Computational Systems Biology Reading Group http://lists.discovery.wisc.edu/mailman/listinfo/compsysbiojc AI Reading Group http://lists.cs.wisc.edu/mailman/listinfo/airg ComBEE Python Study Group https://combee-uw-madison.github.io/studyGroup/ Many relevant seminars on campus