Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato
Unstructured learning in computational biology Tony C Smith Computability Before computers were built, mathematicians knew what they could do arithmetic (e.g. missile trajectories) arithmetic (e.g. missile trajectories) search (e.g. keys for secret codes) search (e.g. keys for secret codes) sort (census information) sort (census information) … anything with a mathematical algorithm … anything with a mathematical algorithm
Unstructured learning in computational biology Tony C Smith Artificial Intelligence Computers do things only human brains can otherwise do expert
Unstructured learning in computational biology Tony C Smith Artificial Intelligence Computers do things only human brains can otherwise do expert system expert
Unstructured learning in computational biology Tony C Smith Artificial Intelligence Computers do things only human brains can otherwise do learning system expert system
Unstructured learning in computational biology Tony C Smith Machine learning creating computer programs that get better with experience learn how to make expert judgments discover previously hidden, potentially useful information (data mining) What is machine learning? How does it work? user provides learning system with examples of concept to be learned induction algorithm infers a characteristic model of the examples model is used to predict whether or not future novel instances are also examples – and it does this very consistently, and very, very quickly!
Unstructured learning in computational biology Tony C Smith WeightDamageDirtFirmnessQuality heavyhighmildhardpoor heavyhighmildsoftpoor normalhighmildhardgood lightmediummildhardgood Lightclearcleanhardgood normalclearcleansoftpoor heavymediummildhardpoor... Mushroom Data weight good dirt firmness poor heavy light normal mildclean hardsoft poor good good Structured learning
Unstructured learning in computational biology Tony C Smith Unstructured learning data does not have fixed fields with specific values examples: images, continuous signals, expression data, text learning proceeds by correlating the presence or absence of any and all salient attributes Document Classification given examples of documents covering some topic, learn a semantic model that can recognize whether or not other documents are relevant prioritize them: i.e. quantify “how relevant” documents are to the topic not limited to keywords (nor is it misled by them) adapt to the user’s needs (ephemeral or long-term)
Unstructured learning in computational biology Tony C Smith Document classification demo
Unstructured learning in computational biology Tony C Smith bioinformatics Finding genes Determining gene roles Determining protein functions Empirical tests Sequence similarity comparison Literature
Unstructured learning in computational biology Tony C Smith GO-KDS demo
Unstructured learning in computational biology Tony C Smith Amide group Carboxyl group R group Amino Acid
Unstructured learning in computational biology Tony C Smith Amino Acid glycine tyrosine
Unstructured learning in computational biology Tony C Smith DNA encodes amino acids
Unstructured learning in computational biology Tony C Smith
Rasmol demo
Unstructured learning in computational biology Tony C Smith Biotechnology Biologists know proteins, computer scientists know machine learning Together, they can find out a lot of hidden information about genes and proteins Biotechnology is a multi-billion dollar industry Biotechnology is one of the best funded areas of scientific research