Anand Sastry, Jonathan Monk, Hanna Tegel, Mathias Uhlén,

Machine Learning in Computational Biology to Accelerate High-Throughput Protein Expression
Anand Sastry, Jonathan Monk, Hanna Tegel, Mathias Uhlén, Bernhard O. Palsson, Johan Rockberg, and Elizabeth Brunk Presentation by Alex Joens

Background The Human Protein Atlas (HPA), started in 2001, is a project aimed at exploring the entire human proteome Proteome: The entire set, or composition, of proteins in a specific “thing” in a certain environment Ex, the cell proteome when exposed to a certain hormone You can also analyze an entire organism’s proteome Specifically, attempts to store information on the expression and spatial location of all protein-coding genes By 2017, its knowledge base encompasses 84% of the human proteome Adding to this base requires an insane number of recombinant protein fragments to be expressed and sampled, which can be “difficult”

Background Current process is to inject fragments of human DNA into E. Coli bacteria, and attempt to have it express large amount of heterologous protein fragments Heterologous expression: expression of a gene or gene fragment in a host organism which does not naturally have that gene or gene fragment Though not complete proteins, these fragments are still useful This is even harder than it sounds, because not all fragments show high expression, which is required for analysis Current approaches heavily rely on trial-and-error New idea: use machine learning techniques to be able to predict a protein’s expression, based on the protein structure itself and other factors

Random Graphic

Machine Learning Machine learning “computers the ability to learn without being explicitly programmed.” -- Arthur Samuel, 1959 These algorithms are designed to look for patterns in data, and can use these patterns to make predictions Algorithms can have different performance on different problems, so picking algorithms is not an exact science! For this paper, four algorithms were selected: Linear Regression Random Forest Decision Trees Support Vector Machines (SVMs) Neural Networks

Converting to Machine Learning Problem

Random Graphic

Results: Expression

Results: Solubility

Results: PrEST Selection

Results Protein expression prediction rates is on par with the state of the art algorithms Surprisingly, mRNA features had very little predictive power for expression Protein solubility prediction is significantly better than previous state of the art algorithms Using the learned classifiers reduced the number of needed experiments by 39%

Additional Results The authors developed a simple and very easy to use framework to implement their experiments, all freely available online as IPython Notebook Designed to work with any “omics” data, so can be applied to a wider variety of bioinformatics problems

Anand Sastry, Jonathan Monk, Hanna Tegel, Mathias Uhlén,

Similar presentations

Presentation on theme: "Anand Sastry, Jonathan Monk, Hanna Tegel, Mathias Uhlén,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Anand Sastry, Jonathan Monk, Hanna Tegel, Mathias Uhlén,

Similar presentations

Presentation on theme: "Anand Sastry, Jonathan Monk, Hanna Tegel, Mathias Uhlén,"— Presentation transcript:

Similar presentations

About project

Feedback