Download presentation
Presentation is loading. Please wait.
Published byMarshall Edwards Modified over 8 years ago
1
Biological data representation and data mining Xin Chen xinchen@zju.edu.cn
2
Biology is never again the same Accumulation of data High-throughput experiment Scattered and layered knowledge Challenge in representing and integrating the data and knowledge – Fidel and full representation of the observation, not only conclusion – Connecting heterogeneous types of data – Building a computational framework You never can understand what is an elephant by looking at its hairs
3
The crown jewel of biology My personal opinion: Data analysis in general – BLAST – homology analysis – HMM – concept of “families” – Structure analysis, clinical trials, orthogonal experimental design, etc. – Statistics adapted to biology Data mining in specific – Analysis of relationship between entities
4
Data mining flavors Representation of data: – Sample or Tuple represented by fixed or variable number of elements (features, which are categorical or continuous numbers) Binary relationships – Unsupervised learning Looking for structures in the samples, assuming a “similarity” with biological sense Example: K-means, hierarchical – Supervised learning Looking for a function that describes the relationship between features of samples Example: Support vector machine, neural network, Bayesian network, regression Network relationships: – Which assumed network structure/parameter best describes the observation – Confidence over the network and confidence over the network elements – Example: probabilistic network (Bayesian network), neural network
5
When your hands on … Pre-processing – Clean the data (outlier, missing value, dependency…) – Feel the data (structure, relevance … most important, most difficult, and most underestimated) Data mining – Choose an algorithm (adapt it if necessary) – Run the analysis Post-processing (interpretation) – What is expected and what is unexpected – Connecting results with knowledge and discoveries Biology is the key – Where to look is always more important than how to look
6
In a nutshell Results obtained with unbiased (knowledge-independent) approaches, If correspond to existing knowledge, are proof of your analysis approach and the validity of your discoveries.
7
The course format Each of you will given at least one paper presentation at class and finish a toy data mining project (with paper report) using datasets in the UCL Machine Learning Repository. You will be evaluated with 70% on your paper presentation and class activity, and 30% on your course project report.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.