Machine Learning, Bio-informatics and Weka

Slides:



Advertisements
Similar presentations
Approaches, Tools, and Applications Islam A. El-Shaarawy Shoubra Faculty of Eng.
Advertisements

Godfather to the Singularity
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Data Visualization STAT 890, STAT 442, CM 462
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
EECS 349 Machine Learning Instructor: Doug Downey Note: slides adapted from Pedro Domingos, University of Washington, CSE
CSE 546 Data Mining Machine Learning Instructor: Pedro Domingos.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
Part I: Classification and Bayesian Learning
Introduction to machine learning
What is Machine Learning?
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Introduction to Machine Learning MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way Based in part on notes from Gavin Brown, University of Manchester.
MACHINE LEARNING 張銘軒 譚恆力 1. OUTLINE OVERVIEW HOW DOSE THE MACHINE “ LEARN ” ? ADVANTAGE OF MACHINE LEARNING ALGORITHM TYPES  SUPERVISED.
Anomaly detection with Bayesian networks Website: John Sandiford.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Artificial Intelligence Lecture No. 27 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Machine Learning An Introduction. What is Learning?  Herbert Simon: “Learning is any process by which a system improves performance from experience.”
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Machine Learning.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Artificial Intelligence, Expert Systems, and Neural Networks Group 10 Cameron Kinard Leaundre Zeno Heath Carley Megan Wiedmaier.
Instructor: Pedro Domingos
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer.
Introduction to Machine Learning August, 2014 Vũ Việt Vũ Computer Engineering Division, Electronics Faculty Thai Nguyen University of Technology.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 1-2:30p Instructor: Christoph F. Eick Classroom:AH301
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Usman Roshan Dept. of Computer Science NJIT
Brief Intro to Machine Learning CS539
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning with Spark MLlib
Who am I? Work in Probabilistic Machine Learning Like to teach 
Machine Learning for Computer Security
Artificial Intelligence, P.II
Instructor: Pedro Domingos
Machine Learning overview Chapter 18, 21
Machine Learning overview Chapter 18, 21
Theme Introduction : Learning from Data
Eick: Introduction Machine Learning
Intro to Machine Learning
School of Computer Science & Engineering
CSEP 546 Data Mining Machine Learning
Basic machine learning background with Python scikit-learn
Machine Learning Ali Ghodsi Department of Statistics
CSEP 546 Data Mining Machine Learning
Basic Intro Tutorial on Machine Learning and Data Mining
Advanced Embodiment Design 26 March 2015
Artificial Intelligence introduction(2)
CSEP 546 Data Mining Machine Learning
Overview of Machine Learning
3.1.1 Introduction to Machine Learning
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Christoph F. Eick: A Gentle Introduction to Machine Learning
CS639: Data Management for Data Science
Lecture 21: Machine Learning Overview AP Computer Science Principles
Machine learning: What is it?
Machine Learning overview Chapter 18, 21
Machine learning CS 229 / stats 229
Lecture 9: Machine Learning Overview AP Computer Science Principles
Presentation transcript:

Machine Learning, Bio-informatics and Weka A Lecture by Dr. Gursel Serpen Associate Professor & Director of Artificial Intelligence Lab Electrical Engineering and Computer Science University of Toledo February 2017

What is Machine Learning?

What is Machine Learning?

Big Data

Learning from Data The world is driven by data. Germany’s climate research centre generates 10 petabytes per year Google processes 24 petabytes per day The Large Hadron Collider produces 60 gigabytes per minute (~12 DVDs) There are over 50m credit card transactions a day in the US alone.

Learning from Data Data is recorded from some real-world phenomenon. What might we want to do with that data? Prediction - what can we predict about this phenomenon? Description - how can we describe/understand this phenomenon in a new way?

Learning from Data How can we extract knowledge from data to help humans take decisions? How can we automate decisions from data? How can we adapt systems dynamically to enable better user experiences? Write code to explicitly do the above tasks Write code to make the computer learn how to do the tasks

Where does it fit? What is it not? Machine Learning Where does it fit? What is it not? Artificial Intelligence Statistics / Mathematics Data Mining Machine Learning Computer Vision Robotics (No definition of a field is perfect – the diagram above is just one interpretation, mine ;-)

Specialist Domain Knowledge Coding Skills Maths/Statistics Knowledge Machine Learning Data Science Software Engineer Statistician Specialist Domain Knowledge

Many applications are immensely hard to program directly. These almost always turn out to be “pattern recognition” tasks. 1. Program the computer to do the pattern recognition task directly. 1. Program the computer to be able to learn from examples. 2. Provide “training” data.

Definition of Machine Learning self-configuring data structures that allow a computer to do things that would be called “intelligent” if a human did it “making computers behave like they do in the movies”

A Bit of History Arthur Samuel (1959) wrote a program that learnt to play draughts (“checkers” if you’re American).

1940s Human reasoning / logic first studied as a formal subject within mathematics (Claude Shannon, Kurt Godel et al). 1950s The “Turing Test” is proposed: a test for true machine intelligence, expected to be passed by year 2000. Various game-playing programs built. 1956 “Dartmouth conference” coins the phrase “artificial intelligence”. 1960s A.I. funding increased (mainly military). Famous quote: “Within a generation ... the problem of creating 'artificial intelligence' will substantially be solved."

1970s A.I. “winter”. Funding dries up as people realise it’s hard. Limited computing power and dead-end frameworks. 1980s Revival through bio-inspired algorithms: Neural networks, Genetic Algorithms. A.I. promises the world – lots of commercial investment – mostly fails. Rule based “expert systems” used in medical / legal professions. 1990s AI diverges into separate fields: Computer Vision, Automated Reasoning, Planning systems, Natural Language processing, Machine Learning… …Machine Learning begins to overlap with statistics / probability theory.

2010s…. IBM Watson, Siri, Computer Wins GO, Robotics, etc. ML merging with statistics continues. Other subfields continue in parallel. First commercial-strength applications: Google, Amazon, computer games, route-finding, credit card fraud detection, etc… Tools adopted as standard by other fields e.g. biology 2010s…. IBM Watson, Siri, Computer Wins GO, Robotics, etc.

Types of Learning Supervised (inductive) learning Training data includes desired outputs Unsupervised learning Training data does not include desired outputs Semi-supervised learning Training data includes a few desired outputs Reinforcement learning Rewards from sequence of actions

Leading ML Algorithms Supervised learning Unsupervised learning Decision tree induction Rule induction Instance-based learning Bayesian learning Neural networks Support vector machines Model ensembles Learning theory Unsupervised learning Clustering Dimensionality reduction

Supervised Learning Pattern recognition Set of input vectors and corresponding target vectors Model tunes itself to minimize error of objective function. For instance, OF could be number of misclassifications Used in prediction problems (classification, regression) Classifies input vectors into target vectors, discrete == classification Model minimizes errors

Unsupervised Learning Pattern detection Given set of input vectors, no target vectors Uses Clustering – Similarity of data Density estimation – Extrapolation of probability density Visualization – Turn high dimensional data into human readable Clustering – partition related data into clusters, low variance inside clusters, large between clusters In sequence analysis, clustering is used to group homologous sequences into gene families. Density estimation – Estimate probablity density function of random variablegiven some data about a sample of a population, kernel density estimation makes it possible to extrapolate the data to the entire population. Visualization -- reduce dimensions

Example: Handwriting Classification: recognize each number Clustering: group same numbers together

Some real examples! Too many to list here! Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data Unsupervised Clustering of Bioinformatics Data Hidden Markov Models for Detecting Remote Protein Homologies Essential latent knowledge for protein-protein interactions: analysis by an unsupervised learning approaches Inference of Genetic Regulatory Networks with Recurrent Neural Network Models Using Particle Swarm Optimization Finding genes in DNA using HMM

Uses in Bioinformatics Sequence to structure Protein structure, protein function, etc. DNA binding sites Evolutionary information Gene expression analysis Visualization Clustering Inference of gene networks Many more… Clustering --- figure out genes expressed to same result Gene networks – what affects what

General Example (secondary structure) Create a data set with input sequences and output secondary structure labels Train a neural network on a portion of the dataset (training dataset) Test on a new portion of the dataset (test dataset) to estimate the generalization performance.

Create a Dataset Download proteins from Protein Data Bank Remove proteins with redundant patterns Need unbiased training set Annotate protein sequence with secondary structure

Train and test Partition dataset created in the previous slide into Train/Test sections Use training set to build the Neural Net model Use the test set to evaluate performance If results are satisfactory, we now have a method to turn sequences into secondary structure Emphasis: we have automatically created a model to create secondary structure from primary structure, this is a big deal.

Remember to read this article!