The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.

Slides:



Advertisements
Similar presentations
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Advertisements

CS583 – Data Mining and Text Mining
Web Search and Mining Course Overview 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 0: Course Overview.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Introduction BNFO 602 Usman Roshan. Course grade Project: –Find a bioinformatics topic by Feb 5th. This can be a paper or a research question you wish.
2015/6/1Course Introduction1 Welcome! MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology
CSE 591 (99689) Application of AI to molecular Biology (5:15 – 6: 30 PM, PSA 309) Instructor: Chitta Baral Office hours: Tuesday 2 to 5 PM.
Threshold selection in gene co- expression networks using spectral graph theory techniques Andy D Perkins*,Michael A Langston BMC Bioinformatics 1.
Jianlin Cheng, PhD Informatics Institute, Computer Science Department University of Missouri, Columbia Fall, 2011.
Basis State Prediction of Cell-Cycle Transcription Factors in Saccharomyces cerevisiae Dr. Matteo Pellegrini Dr. Shawn Cokus Sherri Rose UCLA Molecular,
COMP 110 Introduction to Programming Tabitha Peck M.S. January 9, 2008 MWF 3-3:50 pm Philips 367.
1 Data Mining Techniques Instructor: Ruoming Jin Fall 2006.
COMP 14 Introduction to Programming Miguel A. Otaduy Summer Session I, 2004 MTWRF 9:45-11:15 am Sitterson Hall 014.
An Overview of Our Course:
Data Mining – Intro.
Ch. Eick: Course Information COSC Introduction --- Part2 1. Another Introduction to Data Mining 2. Course Information.
EECS 395/495 Algorithmic Techniques for Bioinformatics General Introduction 9/27/2012 Ming-Yang Kao 19/27/2012.
1 Computer Engineering Department Islamic University of Gaza ECOM 6301: Advanced Computer Architectures (Graduate Course) Fall 2013 Prof. Mohammad A. Mikki.
CSCI 347 – Data Mining Lecture 01 – Course Overview.
Math 125 Statistics. About me  Nedjla Ougouag, PhD  Office: Room 702H  Ph: (312)   Homepage:
COMP 875 Machine Learning Methods in Image Analysis.
CS598CXZ (CS510) Advanced Topics in Information Retrieval (Fall 2014) Instructor: ChengXiang (“Cheng”) Zhai 1 Teaching Assistants: Xueqing Liu, Yinan Zhang.
Chapter 1 Introduction to Data Mining
Course Title Database Technologies Instructor: Dr ALI DAUD Course Credits: 3 with Lab Total Hours: 45 approximately.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
CS525 DATA MINING COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Overviews of ITCS 6161/8161: Advanced Topics on Database Systems Dr. Jianping Fan Department of Computer Science UNC-Charlotte
Intelligent systems in bioinformatics Introduction to the course.
Advanced Systems and Network Security Fall 2015 Instructor: Kun Sun, Ph.D.
Overview of CS Class Jiawei Han Department of Computer Science
Syllabus CS479(7118) / 679(7112): Introduction to Data Mining Spring-2008 course web site:
Course Information Sarah Diesburg Operating Systems COP 4610.
Introduction to Science Informatics Lecture 1. What Is Science? a dependence on external verification; an expectation of reproducible results; a focus.
Instructor: Jinze Liu Spring 2014 CS 685 Special Topics in Data mining.
Data Warehousing/Mining 1 Data Warehousing/Mining Comp 150DW Course Overview Instructor: Dan Hebert.
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2008 Colin Dewey Dept. of Biostatistics & Medical Informatics.
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
EECS 730 Introduction to Bioinformatics Microarray Luke Huan Electrical Engineering and Computer Science
General Information 439 – Data Mining Assist.Prof.Dr. Derya BİRANT.
9/03 Data Mining – Introduction G Dong (WSU)1 CS499/ Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.
IB 105 Environmental Biology MWF 11-11:50 2 hand outs: course syllabus and pre-test.
CSCE 5073 Section 001: Data Mining Spring Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,
1 Advanced Database System Design Instructor: Ruoming Jin Fall 2010.
Physics of Animation (Art/Physics 123) Prof. Alejandro Garcia Fall 2009 Class is fully enrolled and I am not allowed to add students. Sorry.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 1-2:30p Instructor: Christoph F. Eick Classroom:AH301
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
DATA MINING: LECTURE 1 By Dr. Hammad A. Qureshi Introduction to the Course and the Field There is an inherent meaning in everything. “Signs for people.
1 SBM411 資料探勘 陳春賢. 2 Lecture I Class Introduction.
CSC 4740 / 6740 Fall 2016 Data Mining Instructor: Yubao Wu Fall 2016.
CS583 – Data Mining and Text Mining
Term Project Proposal By J. H. Wang Apr. 7, 2017.
Syllabus Introduction to Computer Science
CS583 – Data Mining and Text Mining
Objectives of the Course and Preliminaries
CS583 – Data Mining and Text Mining
CS5040: Data Structures and Algorithms
Special Topics in Data Mining Applications Focus on: Text Mining
Data Mining: Concepts and Techniques Course Outline
©Jiawei Han and Micheline Kamber
ECE 751: Embedded Computing Systems Prof
CS7280: Special Topics in Data Mining Information/Social Networks
Introduction --- Part2 Another Introduction to Data Mining
Andy Wang Operating Systems COP 4610 / CGS 5765
Dept. of Computer Science University of Liverpool
Welcome! Knowledge Discovery and Data Mining
CSCE 4143 Section 001: Data Mining Spring 2019.
CSE591: Data Mining by H. Liu
Promising “Newer” Technologies to Cope with the
Presentation transcript:

The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide2 8/21/2006 Introduction Administrative Register for 3 hours of credit

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide3 8/21/2006 Introduction Me Luke Huan, assistant prof. in Electrical Engineering & Computer Science Homepage: Office: 2304 Eaton Hall Office hour: 10:00 – 11:00am Monday and Wednesday

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide4 8/21/2006 Introduction My Lecture Style I may tend to talk fast, especially when excited Class materials are highly interdisciplinary Use your questions to slow me down Ask for clarification, repetition of a strange phrase, jargons “If in doubt, speak it out”

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide5 8/21/2006 Introduction You Introduction: Who you are What department you are in Why you are taking the course

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide6 8/21/2006 Introduction Outline for Today What is mining biological data? What is this course about? Course home page Course references Paper presentation Final project Grading Forward class reviewing

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide7 8/21/2006 Introduction What is Mining Biological Data Goal: understanding the structure of biological data Patterns Descriptive models Predictive models Challenges: What is the nature of the data? What are the computational tasks? How to break a task into a group of computational components? How to evaluate the computational results? Applications Experimental design and hypothesis generation Synthesis novel proteins Drug design …

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide8 8/21/2006 Introduction What is this Course About? Learning… Problems in mining biological data Available techniques, their pros and cons How to combine techniques together Enough perception to avoid pitfalls Practicing… To present recent papers on a selected topic To work on a project that may involve A domain expert, A driving biological problem, and The development of new data mining techniques

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide9 8/21/2006 Introduction Class Information Class Homepage: Meeting time: 9:00 – 9:45 Monday, Wednesday, Friday Meeting place: Eaton Hall 2001 Prerequisite: none

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide10 8/21/2006 Introduction Textbook & References Textbook: none References Data Mining --- Concepts and techniques, by Han and Kamber, Morgan Kaufmann, (ISBN: ) The Elements of Statistical Learning --- Data Mining, Inference, and Prediction, by Hastie, Tibshirani, and Friedman, Springer, (ISBN: ) Bioinformatics: Genes, Proteins, and Computers, edited by Christine Orengo, David Jones, Janet Thornton, Bios Scientific Publishers, (ISBN: )

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide11 8/21/2006 Introduction Paper Presentation One per student Research paper(s) List of recommendations will be posted at the class webpage a week from now Your own pick (upon approval) Three parts Review the goal of the paper(s) Discuss the research challenges Present the techniques and comment on their pros and cons Questions and comments from audience Extra credit for active participants of class discussions Order of presentation: first come first pick Please send in your choice of paper by September 1st.

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide12 8/21/2006 Introduction Final Project Project (due Nov. 27th) One project I will post some suggestions at class website. I am soliciting projects from researchers on campus You are welcome to propose your own Discuss with me before you start Checkpoints Proposal: title and goal (due Sep. 8th) Background and related work (due Sep. 29th) Outline of approach (due Oct. 20th) Implementation & Evaluation (due Nov. 10th) Class demo (due Nov. 27th)

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide13 8/21/2006 Introduction Grading Grading scheme No homework No exam Paper presentation and discussion45% Project45% Attendance and Participation10%

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide14 8/21/2006 Introduction Forward Class Reviewing This is for overview, not content Don’t worry if you do not understand some of the words, that’s why you want to take this class. Gives an idea of what is coming Order of presentation might be shuffled to accommodate everyone’s schedule Topics may be adjusted with progresses of the class

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide15 8/21/2006 Introduction Week 1: Pattern Mining Frequent patterns: finding regularities in data Frequent patterns (set of items) are one that occur frequently in a data set Can we automatically profile customers? What products are often purchased together? IDItems bought 100f, a, c, d, g, I, m, p 200a, b, c, f, l,m, o 300b, f, h, j, o 400b, c, k, s, p 500a, f, c, e, l, p, m, n One hypothesis: {a, c}  {m} Customer Shopping basket

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide16 8/21/2006 Introduction Week 2: Advanced Pattern Mining Reducing number of patterns Maximal patterns and closed patterns Constraint-based mining Patterns with concept hierarchy Patterns in quantitative data Correlation vs. association

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide17 8/21/2006 Introduction Week 3: Mining Microarray Data from: Spellman, P. T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D. and Futcher, B. (1998), “Comprehensive Identification of Cell Cycle- regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization”, Molecular Biology of the Cell, 9, CH1ICH1BCH1DCH2ICH2B CTFC VPS EFB SSA FUN SP MDM CYS DEP NTG

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide18 8/21/2006 Introduction Week 4: Patterns in Sequences, Trees, and Graphs G1G1 p2p2 p5p5 a b b d y x y y y p1p1 p3p3 p4p4 c a b b y x y G2G2 q1q1 q3q3 q2q2 a b b y y G3G3 s1s1 s3s3 s2s2 c s4s4 y f = 3/3 P1P1 a b y y f=2/3 a b b y x y P3P3 a b b y x P2P2 P4P4 b c y b y f=3/3 f=2/3 P5P5 b b x P6P6 a  = 2/3 b

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide19 8/21/2006 Introduction Week 5: Pattern Discovery in Biomolecules Protein A sequence from 20 amino acids Adopts a stable 3D structure that can be measured experimentally Lys Gly LeuValAlaHis Oxygen Nitrogen Carbon Sulfur Ribbon Space filling Cartoon Surface

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide20 8/21/2006 Introduction Week 6: Descriptive Models Group objects into clusters Ones in the same cluster are similar Ones in different clusters are dissimilar Unsupervised learning: no predefined classes Cluster 1 Cluster 2 Outliers

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide21 8/21/2006 Introduction Week 7: Subspace Clustering Movie 1Movie 2Movie 3Movie 4Movie 5Movie 6Movie 7 Viewer Viewer Viewer Viewer Viewer Movie 1Movie 2Movie 3Movie 4Movie 5Movie 6Movie 7 Viewer Viewer Viewer Viewer Viewer 55534

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide22 8/21/2006 Introduction Week 7: Subspace Clustering

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide23 8/21/2006 Introduction Week 8: Mining Microarray (II) Apply subspace clustering to microarray analysis Find groups of genes that are co-regulated May integrate data from protein sequences and functional description of genes Applying subgraph mining to microarray analysis

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide24 8/21/2006 Introduction Week 9: Predictive Models Two-class version: Using “training data” from Class +1 and Class -1 Develop a “rule” for assigning new data to a Class Slides from J.S. Marron in Statistics at UNC

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide25 8/21/2006 Introduction Week 10: Classification Algorithms and Applications Decision tree Fishers linear discrimination method Kernel methods

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide26 8/21/2006 Introduction Week 11: Text Mining, Gene Ontology, Data Management Ontology seeks to describe or posit the basic categories and relationships of being or existence to define entities and types of entities within its framework. Ontology can be said to study conceptions of reality (Wikipedia).basic categoriesentitiestypes of entitiesreality GO is a database of terms for genes Terms are connected as a directed acyclic graph Levels represent specifity of the terms (not normalized) GO contains three different sub-ontologies: Molecular function Biological process Cellular component

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide27 8/21/2006 Introduction Week 12: Systems Biology & Proteomics Part of the biological system in a cell at the molecular level Source: A proteome is the set of all proteins in an organism

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide28 8/21/2006 Introduction Growth of Known Structures in Protein Data Bank (PDB) Year # of structures 35,000 Week 13: Analyzing Biological Networks Biological networks pose serious challenges and opportunities for the data mining research in computer science Large volume of data Heterogeneous data types Gary D. Bader & Christopher W.V. Hogue, Nature Biotechnology 20, (2002) Protein-protein interaction in yeast

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide29 8/21/2006 Introduction Week 14: bio-Data Integration Data are collected from many different sources Each piece of data describes part of a complicated (and not directly observable) biological process Combine data together to achieve better understanding and better prediction

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide30 8/21/2006 Introduction Week 15, 16: Project Presentation Check what you have learned from the class Celebrate the hard work!

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide31 8/21/2006 Introduction Further References Data mining Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD, etc. Journal: Data Mining and Knowledge Discovery, IEEE- TKDD Bioinformatics Conferences: ISMB, RECOMB, PSB, CSB, BIBE, etc. Journals: Bioinformatics, J. of Computational Biology, etc.

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide32 8/21/2006 Introduction Further References AI & Machine Learning Conferences: Machine learning (ICML), AAAI, IJCAI, etc. Journals: Machine Learning, Artificial Intelligence, etc. Statistics Conferences: Joint Stat. Meeting, etc. Journals: Annals of statistics, etc. Database systems Conferences: ACM-SIGMOD, ACM-PODS, VLDB, IEEE-ICDE, EDBT, ICDT, Journals: ACM-TODS, IEEE-TKDE etc. Visualization Conference proceedings: IEEE Visualization, ACM-SIGGraph, etc. Journals: IEEE Trans. visualization and computer graphics, etc.

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide33 8/21/2006 Introduction Mining Protein Structure Space Year Growth of the Protein Folds in the Structural Classification of Proteins Database (SCOP) # of folds Growth of Known Structures in Protein Data Bank (PDB) Year # of structures