Advanced Algorithms and Models for Computational Biology -- a machine learning approach Introduction to cell biology, genomics, development, and probability.

Slides:



Advertisements
Similar presentations
Probabilistic models Haixu Tang School of Informatics.
Advertisements

CS433: Modeling and Simulation
Recombinant DNA Technology
Machine Learning /15-781, Spring 2008 Tutorial on Basic Probability Eric Xing Lecture 2, September 15, 2006 Reading: Chap. 1&2, CB & Chap 5,6, TM.
Review of Basic Probability and Statistics
Probability Theory Part 2: Random Variables. Random Variables  The Notion of a Random Variable The outcome is not always a number Assign a numerical.
Probability Densities
G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Statistical Data Analysis: Lecture 2 1Probability, Bayes’ theorem 2Random variables and.
Chapter 6 Continuous Random Variables and Probability Distributions
Statistical Background
1 Engineering Computation Part 5. 2 Some Concepts Previous to Probability RANDOM EXPERIMENT A random experiment or trial can be thought of as any activity.
A random variable that has the following pmf is said to be a binomial random variable with parameters n, p The Binomial random variable.
CHAPTER 6 Statistical Analysis of Experimental Data
Probability Distributions Random Variables: Finite and Continuous A review MAT174, Spring 2004.
Probability Distributions Random Variables: Finite and Continuous Distribution Functions Expected value April 3 – 10, 2003.
3-1 Introduction Experiment Random Random experiment.
Continuous Random Variables and Probability Distributions
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
QMS 6351 Statistics and Research Methods Probability and Probability distributions Chapter 4, page 161 Chapter 5 (5.1) Chapter 6 (6.2) Prof. Vera Adamchik.
Lecture 6: Descriptive Statistics: Probability, Distribution, Univariate Data.
Lecture II-2: Probability Review
Continuous Probability Distributions A continuous random variable can assume any value in an interval on the real line or in a collection of intervals.
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Joint Distribution of two or More Random Variables
Review of Probability.
Recitation 1 Probability Review
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
QA in Finance/ Ch 3 Probability in Finance Probability.
Advanced Algorithms and Models for Computational Biology Class Overview Eric Xing & Ziv Bar-Joseph Lecture 1, January 18, 2005 Reading: Chap. 1, DTM book.
Genetics of Cancer.
Probability theory 2 Tron Anders Moger September 13th 2006.
Dept of Bioenvironmental Systems Engineering National Taiwan University Lab for Remote Sensing Hydrology and Spatial Modeling STATISTICS Random Variables.
Random Variables & Probability Distributions Outcomes of experiments are, in part, random E.g. Let X 7 be the gender of the 7 th randomly selected student.
Modeling and Simulation CS 313
PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Theory of Probability Statistics for Business and Economics.
Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment.
CPSC 531: Probability Review1 CPSC 531:Probability & Statistics: Review II Instructor: Anirban Mahanti Office: ICT 745
Normal distribution and intro to continuous probability density functions...
Review of Probability Concepts ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
ENGR 610 Applied Statistics Fall Week 3 Marshall University CITE Jack Smith.
IT College Introduction to Computer Statistical Packages Eng. Heba Hamad 2009.
STA347 - week 51 More on Distribution Function The distribution of a random variable X can be determined directly from its cumulative distribution function.
STA347 - week 31 Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5’s in the 6 rolls. Let X = number of.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Lecture V Probability theory. Lecture questions Classical definition of probability Frequency probability Discrete variable and probability distribution.
Random Variables Presentation 6.. Random Variables A random variable assigns a number (or symbol) to each outcome of a random circumstance. A random variable.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Business Statistics (BUSA 3101). Dr.Lari H. Arjomand Continus Probability.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
CONTINUOUS RANDOM VARIABLES
Random Variables. Numerical Outcomes Consider associating a numerical value with each sample point in a sample space. (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
Random Variables A random variable is a rule that assigns exactly one value to each point in a sample space for an experiment. A random variable can be.
Review of Probability Concepts Prepared by Vera Tabakova, East Carolina University.
Continuous Random Variables and Probability Distributions
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Random Variables By: 1.
Theoretical distributions: the Normal distribution.
MECH 373 Instrumentation and Measurements
Genomes and Their Evolution
Statistical NLP: Lecture 4
Random Variables A random variable is a rule that assigns exactly one value to each point in a sample space for an experiment. A random variable can be.
Continuous Random Variables: Basics
Presentation transcript:

Advanced Algorithms and Models for Computational Biology -- a machine learning approach Introduction to cell biology, genomics, development, and probability Eric Xing Lecture 2, January 23, 2006 Reading: Chap. 1, DTM book

Introduction to cell biology, functional genomics, development, etc.

Model Organisms

Bacterial Phage: T4

Bacteria: E. Coli

The Budding Yeast: Saccharomyces cerevisiae

The Fission Yeast: Schizosaccharomyces pombe

The Nematode: Caenorhabditis elegans

The Fruit Fly: Drosophila Melanogaster

The Mouse transgenic for human growth hormone

Prokaryotic and Eukaryotic Cells

A Close Look of a Eukaryotic Cell The structure: The information flow:

Cell Cycle

Signal Transduction A variety of plasma membrane receptor proteins bind extracellular signaling molecules and transmit signals across the membrane to the cell interior

Signal Transduction Pathway

Functional Genomics and X-omics

A Multi-resolution View of the Chromosome

DNA Content of Representative Types of Cells

Functional Genomics The various genome projects have yielded the complete DNA sequences of many organisms. E.g. human, mouse, yeast, fruitfly, etc. Human: 3 billion base-pairs, thousand genes. Challenge: go from sequence to function, i.e., define the role of each gene and understand how the genome functions as a whole.

motif Regulatory Machinery of Gene Expression

Free DNA probe * * Protein-DNA complex Advantage: sensitiveDisadvantage: requires stable complex; little “structural” information about which protein is binding Classical Analysis of Transcription Regulation Interactions “Gel shift”: electorphoretic mobility shift assay (“EMSA”) for DNA-binding proteins

Modern Analysis of Transcription Regulation Interactions Genome-wide Location Analysis (ChIP-chip) Advantage: High throughputDisadvantage: Inaccurate

Gene Regulatory Network

Gene Expression networks Regulatory networks Protein-protein Interaction networks Metabolic networks Biological Networks and Systems Biology Systems Biology: understanding cellular event under a system- level context Genome + proteome + lipome + …

Gene Regulatory Functions in Development

Temporal-spatial Gene Regulation and Regulatory Artifacts A normal fly Hopeful monster? 

Microarray or Whole-body ISH?

Gene Regulation and Carcinogenesis PCNA (not cycle specific) G 0 or G 1 M G2 G2 S G1 G1 E A B + PCNA Gadd45 DNA repair Rb E2F Rb P Cycli n Cdk Phosphorylation of + - Apoptosis Fas TNF TGF- ... p53 Promotes oncogenetic stimuli (ie. Ras) extracellular stimuli (TGF-  ) Inhibits activates p16 p15 p53 p14 transcriptional activation p21 activates cell damage time required for DNA repairsevere DNA damage      Cancer !  

NormalBCH CIS DYS SCC The Pathogenesis of Cancer

Genetic Engineering: Manipulating the Genome Restriction Enzymes, naturally occurring in bacteria, that cut DNA at very specific places.

Recombinant DNA

Transformation

Formation of Cell Colony

How was Dolly cloned? Dolly is claimed to be an exact genetic replica of another sheep. Is it exactly "exact"?

Definitions Recombinant DNA: Two or more segments of DNA that have been combined by humans into a sequence that does not exist in nature. Cloning: Making an exact genetic copy. A clone is one of the exact genetic copies. Cloning vector: Self-replicating agents that serve as vehicles to transfer and replicate genetic material.

Software and Databases NCBI/NLM Databases Genbank, PubMed, PDB DNA Protein Protein 3D Literature

Introduction to Probability

Basic Probability Theory Concepts A sample space S is the set of all possible outcomes of a conceptual or physical, repeatable experiment. ( S can be finite or infinite.) E.g., S may be the set of all possible nucleotides of a DNA site: A random variable is a function that associates a unique numerical value (a token) with every outcome of an experiment. (The value of the r.v. will vary from trial to trial as the experiment is repeated) E.g., seeing an "A" at a site  X =1, o/w X =0. This describes the true or false outcome a random event. Can we describe richer outcomes in the same way? (i.e., X=1, 2, 3, 4, for being A, C, G, T) --- think about what would happen if we take expectation of X. Unit-Base Random vector X i =[X iA, X iT, X iG, X iC ] T, X i =[0,0,1,0] T  seeing a "G" at site i  S X()X()

Basic Prob. Theory Concepts, ctd (In the discrete case), a probability distribution P on S (and hence on the domain of X ) is an assignment of a non-negative real number P( s ) to each s  S (or each valid value of x ) such that  s  S P( s )=1. (0  P( s )  1) intuitively, P( s ) corresponds to the frequency (or the likelihood) of getting s in the experiments, if repeated many times call  s = P( s ) the parameters in a discrete probability distribution A probability distribution on a sample space is sometimes called a probability model, in particular if several different distributions are under consideration write models as M 1, M 2, probabilities as P( X | M 1 ), P( X | M 2 ) e.g., M 1 may be the appropriate prob. dist. if X is from "splice site", M 2 is for the "background". M is usually a two-tuple of {dist. family, dist. parameters}

Bernoulli distribution: Ber( p ) Multinomial distribution: Mult(1,  ) Multinomial (indicator) variable: Multinomial distribution: Mult( n,  ) Count variable: Discrete Distributions

Basic Prob. Theory Concepts, ctd A continuous random variable X can assume any value in an interval on the real line or in a region in a high dimensional space X usually corresponds to a real-valued measurements of some property, e.g., length, position, … It is not possible to talk about the probability of the random variable assuming a particular value --- P( x ) = 0 Instead, we talk about the probability of the random variable assuming a value within a given interval, or half interval The probability of the random variable assuming a value within some given interval from x 1 to x 2 is defined to be the area under the graph of the probability density function between x 1 and x 2. Probability mass: note that Cumulative distribution function (CDF): Probability density function (PDF):

Uniform Probability Density Function Normal Probability Density Function The distribution is symmetric, and is often illustrated as a bell-shaped curve. Two parameters,  (mean) and  (standard deviation), determine the location and shape of the distribution. The highest point on the normal curve is at the mean, which is also the median and mode. The mean can be any numerical value: negative, zero, or positive. Exponential Probability Distribution Continuous Distributions

Expectation: the center of mass, mean value, first moment): Sample mean: Variance: the spreadness: Sample variance Statistical Characterizations

Basic Prob. Theory Concepts, ctd Joint probability: For events E (i.e. X=x) and H (say, Y=y), the probability of both events are true: P(E and H) := P( x, y ) Conditional probability The probability of E is true given outcome of H P(E and H) := P( x | y ) Marginal probability The probability of E is true regardless of the outcome of H P(E) := P( x )=  x P( x,y ) Putting everything together: P( x | y ) = P( x,y )/P( y )

Independence and Conditional Independence Recall that for events E (i.e. X=x) and H (say, Y=y), the conditional probability of E given H, written as P(E|H), is P(E and H)/P(H) (= the probability of both E and H are true, given H is true) E and H are (statistically) independent if P(E) = P(E|H) (i.e., prob. E is true doesn't depend on whether H is true); or equivalently P(E and H)=P(E)P(H). E and F are conditionally independent given H if P(E|H,F) = P(E|H) or equivalently P(E,F|H) = P(E|H)P(F|H)

X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 p(X 6 | X 2, X 5 ) p(X1)p(X1) p(X 5 | X 4 ) p(X 4 | X 1 ) p(X 2 | X 1 ) p(X 3 | X 2 ) P(X 1, X 2, X 3, X 4, X 5, X 6 ) = P(X 1 ) P(X 2 | X 1 ) P(X 3 | X 2 ) P(X 4 | X 1 ) P(X 5 | X 4 ) P(X 6 | X 2, X 5 ) Representing multivariate dist. Joint probability dist. on multiple variables: If X i 's are independent: (P( X i |·)= P( X i )) If X i 's are conditionally independent, the joint can be factored to simpler products, e.g., The Graphical Model representation

The Bayesian Theory The Bayesian Theory: (e.g., for date D and model M) P(M|D) = P(D|M)P(M)/P(D) the posterior equals to the likelihood times the prior, up to a constant. This allows us to capture uncertainty about the model in a principled way