Methodological Foundations of Biomedical Informatics (BMSC-GA 4449) Kelly Ruggles & David Fenyo.

Slides:



Advertisements
Similar presentations
Data integration across omics landscapes Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine
Advertisements

INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Database management concepts Database Management Systems (DBMS) An example of a database (relational) Database schema (e.g. relational) Data independence.
Mgt 20600: IT Management & Applications Databases
Fundamentals, Design, and Implementation, 9/e Chapter 7 Using SQL in Applications.
SQL Overview Defining a Schema CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 3 Slides adapted from those used by Jeffrey Ullman, via Jennifer.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Introduction to Information and Computer Science Databases and SQL Lecture b This material (Comp4_Unit6b) was developed by Oregon Health & Science University,
Chapter 2 Introduction to Database Development Database Processing David M. Kroenke © 2000 Prentice Hall.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
IDR Snapshot: Quantitative Assessment Methodology Evaluating Size and Comprehensiveness of an Integrated Data Repository Vojtech Huser, MD, PhD a James.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
s Advance Database Systems Week-2 Dr.Kwanchai Eurviriyanukul
Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
1 CS U430: Database Design Spring 2006 Panfeng (Tony) Zhou.
CpSc 462/662: Database Management Systems (DBMS) (TEXNH Approach) Introduction James Wang.
CHAPTER 8: MANAGING DATA RESOURCES. File Organization Terms Field: group of characters that represent something Record: group of related fields File:
MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
Dr. Mohamed Osman Hegazi 1 Database Systems Concepts Database Systems Concepts Course Outlines: Introduction to Databases and DBMS. Database System Concepts.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International.
Methodological Foundations of Biomedical Informatics (BMSC-GA 4449) Himanshu Grover.
Karl Clauser Proteomics and Biomarker Discovery Breast Cancer Proteomics and the use of TCGA Mutational Data - Broad Institute update/issues Karl Clauser.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
Text Mining Special Interest Group Stuart Murray, Wyeth Research Novartis Institute for Biomedical Research, Cambridge, MA 6-8 th October 2004.
Intro – Part 2 Introduction to Database Management: Ch 1 & 2.
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
James Anderson, M.D., Ph.D. Director Division of Program Coordination, Planning, and Strategic Initiatives October 27, 2011 The NIH Common Fund.
Access Review. Access Access is a database application A database is a collection of records and files organized for a particular purpose Access supports.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Database Systems Basic Data Management Concepts
COMPUTATIONAL ANALYSIS OF MULTILEVEL OMICS DATA FOR THE ELUCIDATION OF MOLECULAR MECHANISMS OF CANCER Presented by Azeez Ayomide Fatai Supervisor: Junaid.
The University of Akron Dept of Business Technology Computer Information Systems The Relational Model: Concepts 2440: 180 Database Concepts Instructor:
Valentina Di Francesco Senior Program Officer for Bioinformatics, Structural Genomics and Systems Biology Microbial Genomics.
SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman Bioinformatics Research Group SRI, International February 1, 2008.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
Central dogma: the story of life RNA DNA Protein.
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Trends Biomedical In silico. “Omics” a variety of new technologies help explain both normal and abnormal cell pathways, networks, and processes simultaneous.
Understanding Data Intensive Systems Using Dynamic Analysis and Visualization Nesrine NOUGHI.
CBioPortal Web resource for exploring, visualizing, and analyzing multidimentional cancer genomics data.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
IIC Information Flow Interesting ions? Priority list of interesting ions Empty priority list? QA/QC? Peptide identification Protein identification External.
Introduction to Information and Computer Science Databases and SQL Lecture d This material (Comp4_Unit6d) was developed by Oregon Health & Science University,
High throughput biology data management and data intensive computing drivers George Michaels.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
(1) Genotype-Tissue Expression (GTEx) Largest systematic study of genetic regulation in multiple tissues to date 53 tissues, 500+ donors, 9K samples, 180M.
Lecture 5 Data Model Design Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS
Presenter: Bradley Green.  What is Bioinformatics?  Brief History of Bioinformatics  Development  Computer Science and Bioinformatics  Current Applications.
CPSC-310 Database Systems
A graph-based integration of multiple layers of cancer genomics data (Progress Report) Do Kyoon Kim 1.
Hood College Master of Science in Bioinformatics (Proposed)
Relational Algebra Chapter 4, Part A
Dept of Biomedical Informatics University of Pittsburgh
Database management concepts
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Proteomics Informatics David Fenyő
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Database Systems Instructor Name: Lecture-3.
Database management concepts
Proteomics Informatics David Fenyő
Course Instructor: Supriya Gupta Asstt. Prof
Presentation transcript:

Methodological Foundations of Biomedical Informatics (BMSC-GA 4449) Kelly Ruggles & David Fenyo

Choose a Data Set Examples: Encyclopedia of DNA Elements (ENCODE) 1000 Genomes Project The Cancer Genome Atlas (TCGA) International Cancer Genome Consortium (ICGC) Clinical Proteomics Tumor Analysis Consortium (CPTAC) Human Microbiome Project (HMP) Epigenomics Roadmap Cancer Cell Line Encyclopedia (CCLE) Library of Integrated Network-Based Cellular Signatures (LINCS) Global Initiative for Sharing All Influenza Data (GISAID) The Genotype-tissue Expression Project (GTEx) A 3D Map of the Human Genome Central Line-Associated Bloodstream Infections (CLABSI) The JGI Genome Portal Chromosome Scrambling in S. cerevisiae

First Homework (Sept 8) Using the data set you have chosen: Write a database schema in SQL*. Implement the schema in for example MySQL. *You don't need to capture everything in your database schema, just select an interesting subset. Later homework on the data sets: Write a Python script for loading the data. Write a Python script to query the data. Make a web interface that supports simple queries. Perform statistical calculation on the data (both directly in Python and also by calling R from Python). Make a static visualization of some aspect of the data via the web interface (e.g. call R to make a heatmap). Make an interactive visualization of some aspect of the data via the web interface.

Database Design Determine the purpose of the database Find and organize the information required Divide the information into tables Turn information items into columns Specify primary keys Set up the table relationships. Refine the design Apply the normalization rules

Example Database Schema: Proteomics Project Experiment Sample Analysis Spectrum Peptide Amino_Acid Modification Protein

Example Database Schema: One-to-Many CREATE TABLE Project ( project_id int, description varchar(2000), … ); CREATE TABLE Experiment ( exp_id int, project_id int, description varchar(2000), … );

Example Database Schema: Many-to-Many CREATE TABLE Experiment ( exp_id int, project_id int, description varchar(2000), … ); CREATE TABLE Sample ( sample_id int, description varchar(2000), … ); CREATE TABLE Aliquot ( aliquot_id int, exp_id int, sample_id int, … );

Methodological Foundations of Biomedical Informatics (BMSC-GA 4449) Lecture 1 Introduction (September 1, 2014 TRB 718 5pm) Lecture 2 Scientific Programming (September 8, 2014 TRB 718 5pm) Lecture 3 Algorithms (September 15, 2014 TRB 718 5pm) Lecture 4 Statistics (September 22, 2014 TRB 718 5pm) Lecture 5 Linear Algebra (September 29, 2014 TRB 718 5pm) Lecture 6 Optimization (October 6, 2014 TRB 718 5pm) Lecture 7 Data visualization (October 13, 2014 TRB 718 5pm) Lecture 8 Experimental design (October 20, 2014 TRB 718 5pm) Lecture 9 Machine Learning (October 27, 2014 TRB 718 5pm) Lecture 10 Information Retrieval (November 3, 2014 TRB 718 5pm) Lecture 11 Signal Processing (November 10, 2014 TRB 718 5pm) Lecture 12 Pathways and Networks (November 17, 2014 TRB 718 5pm) Lecture 13 Modeling and Simulation (November 24, 2014 TRB 718 5pm) Lecture 14 Project Presentation (December 15, 2014 TRB 718 5pm)

Lecture 2 Scientific Programming Programming Languages: Python, R, MATLAB and JavaScript Databases: SQL and NoSQL High-Performance Computing Version control: Git

Lecture 3 Algorithms

Lecture 4 Statistics

Lecture 5 Linear Algebra

Lecture 6 Optimization

Lecture 7 Data Visualization The Cancer Genome Atlas Network, Comprehensive molecular portraits of human breast tumors. Nature. 490 (7418):61-70.

Lecture 8 Experimental Design Experimental Design by Christine Ambrosino

Lecture 9 Machine Learning

Lecture 10 Information Retrieval

Lecture 11 Signal Processing

Lecture 12 Pathways and Networks

Lecture 13 Modeling and Simulation ExperimentSimulation Chromosomal coordinate (kb)

Methodological Foundations of Biomedical Informatics (BMSC-GA 4449) Kelly Ruggles & David Fenyo