Biological data representation and data mining Xin Chen

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Data Mining: Staying Ahead in the Information Age A Tutorial in Data Mining, YOR11, Cambridge, 29 th March Robert Burbidge Computer Science, UCL,
1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Supervised Learning Recap
Adaptive Resonance Theory (ART) networks perform completely unsupervised learning. Their competitive learning algorithm is similar to the first (unsupervised)
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Lesson 8: Machine Learning (and the Legionella as a case study) Biological Sequences Analysis, MTA.
1 Fuzzy Signatures in SARS Student: Bai Qifeng Client: Prof. Tom Gedeon.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
Statement of the Problem Goal Establishes Setting of the Problem hypothesis Additional information to comprehend fully the meaning of the problem scopedefinitionsassumptions.
Neural Network Homework Report: Clustering of the Self-Organizing Map Professor : Hahn-Ming Lee Student : Hsin-Chung Chen M IEEE TRANSACTIONS ON.
Data Mining – Intro.
Introduction to Machine Learning Approach Lecture 5.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Further advanced methods Chapter 17.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Techniques
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Data Mining Teaching experience at the FIB. What is Data Mining? A broad set of techniques and algorithms brought from machine learning and statistics.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
CS 445/545 Machine Learning Winter, 2012 Course overview: –Instructor Melanie Mitchell –Textbook Machine Learning: An Algorithmic Approach by Stephen Marsland.
ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto.
Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,
UNCERTML - DESCRIBING AND COMMUNICATING UNCERTAINTY WITHIN THE (SEMANTIC) WEB Matthew Williams
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Project 1 FINA B. Group of 5. Due by 18/09/ parts. Each worth 50% of total. Need to provide 1 excel workbook for part 1 and part 2. This.
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Prepared by: Mahmoud Rafeek Al-Farra
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Data Mining and Decision Support
NTU & MSRA Ming-Feng Tsai
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 2 Nanjing University of Science & Technology.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
CEE 6410 Water Resources Systems Analysis
Data Mining 101 with Scikit-Learn
Multimodal Learning with Deep Boltzmann Machines
Machine Learning Ali Ghodsi Department of Statistics
Data Mining: Concepts and Techniques Course Outline
iSRD Spam Review Detection with Imbalanced Data Distributions
Classification and Prediction
Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.
Course Introduction CSC 576: Data Mining.
Prepared by: Mahmoud Rafeek Al-Farra
Data Warehousing Data Mining Privacy
Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.
Presentation transcript:

Biological data representation and data mining Xin Chen

Biology is never again the same Accumulation of data High-throughput experiment Scattered and layered knowledge Challenge in representing and integrating the data and knowledge – Fidel and full representation of the observation, not only conclusion – Connecting heterogeneous types of data – Building a computational framework You never can understand what is an elephant by looking at its hairs

The crown jewel of biology My personal opinion: Data analysis in general – BLAST – homology analysis – HMM – concept of “families” – Structure analysis, clinical trials, orthogonal experimental design, etc. – Statistics adapted to biology Data mining in specific – Analysis of relationship between entities

Data mining flavors Representation of data: – Sample or Tuple represented by fixed or variable number of elements (features, which are categorical or continuous numbers) Binary relationships – Unsupervised learning Looking for structures in the samples, assuming a “similarity” with biological sense Example: K-means, hierarchical – Supervised learning Looking for a function that describes the relationship between features of samples Example: Support vector machine, neural network, Bayesian network, regression Network relationships: – Which assumed network structure/parameter best describes the observation – Confidence over the network and confidence over the network elements – Example: probabilistic network (Bayesian network), neural network

When your hands on … Pre-processing – Clean the data (outlier, missing value, dependency…) – Feel the data (structure, relevance … most important, most difficult, and most underestimated) Data mining – Choose an algorithm (adapt it if necessary) – Run the analysis Post-processing (interpretation) – What is expected and what is unexpected – Connecting results with knowledge and discoveries Biology is the key – Where to look is always more important than how to look

In a nutshell Results obtained with unbiased (knowledge-independent) approaches, If correspond to existing knowledge, are proof of your analysis approach and the validity of your discoveries.

The course format Each of you will given at least one paper presentation at class and finish a toy data mining project (with paper report) using datasets in the UCL Machine Learning Repository. You will be evaluated with 70% on your paper presentation and class activity, and 30% on your course project report.