Data mining and statistical learning: lecture 1a Statistics and computer science for a data-rich world.

Slides:



Advertisements
Similar presentations
1 Graduates’ Attributes : EMF, EUR-ACE and Federal Educational Standards Alexander I. Chuchalin, Chair of the RAEE Accreditation Board Graduates’ Attributes.
Advertisements

High Performance Computing Course Notes Grid Computing.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Component 2: The Culture of Health Care Unit 4: Health care processes and decision making Lecture 1 This material was developed by Oregon Health & Science.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Computer Science Prof. Bill Pugh Dept. of Computer Science.
Topics in Computational Biology (COSI 230a) Pengyu Hong 09/02/2005.
Theresa Tsosie-Robledo MS RN-BC February 15, 2012
Introduction to Data Science Kamal Al Nasr, Matthew Hayes and Jean-Claude Pedjeu Computer Science and Mathematical Sciences College of Engineering Tennessee.
Tools for Publishing Environmental Observations on the Internet Justin Berger, Undergraduate Researcher Jeff Horsburgh, Faculty Mentor David Tarboton,
Copyright © 2014 Pearson Education, Inc. 1 It's what you learn after you know it all that counts. John Wooden Key Terms and Review (Chapter 6) Enhancing.
The NIH Roadmap for Medical Research
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Medical Informatics Basics
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Mining Large Data at SDSC Natasha Balac, Ph.D.. A Deluge of Data Astronomy Life Sciences Modeling and Simulation Data Management and Mining Geosciences.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Formal Empirical Applied Mathematical and technical methods and theories Cognitive, behavioral, and organizational techniques and theories ImagingBioInformaticsClinical.
Understanding Data Analytics and Data Mining Introduction.
Visual Analytics University of Texas – Pan American CSCI 6361, Spring 2014 From Stasko, 2013.
1 Common Challenges Across Scientific Disciplines Laurence Field CERN 18 th November 2013.
Information Systems Basic Core Specialization Clinical Imaging BioInformatics Public Health Computer Science Methods (formal models) Biomedical Decision.
Charles Tappert Seidenberg School of CSIS, Pace University
Applications of Computers in pharmacy
Medical Informatics Basics
Medical Informatics Basics Lection 1 Associated professor Andriy Semenets Department of Medical Informatics.
Chapter 1 Introduction to Data Mining
 Day 59 Computer Science and Industry Exploring The Intersection Between CS and Other Fields.
Introduction to Science Informatics Lecture 1. What Is Science? a dependence on external verification; an expectation of reproducible results; a focus.
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.
Integrating Upward Supporting managers and executives.
Harbin Institute of Technology Computer Science and Bioinformatics Wang Yadong Second US-China Computer Science Leadership Summit.
Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.
The Swiss Grid Initiative Context and Initiation Work by CSCS Peter Kunszt, CSCS.
GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.
Experts in numerical algorithms and High Performance Computing services Challenges of the exponential increase in data Andrew Jones March 2010 SOS14.
Computational Science & Engineering meeting national needs Steven F. Ashby SIAG-CSE Chair March 24, 2003.
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
College of Computer Science, SCU Computer English Lecture 1 Computer Science Yang Ning 1/46.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Clinical Research Informatics [CRI]. Informatics, defined generally as the intersection of information and computer science with a health-related discipline,
LECTURE 2: DATA MINING. WHAT IS DATA MINING? 2 D ATA M INING AND D ATA W AREHOUSES ? It evolved in to being as the science of databases evolved Database.
| | Healthcare Science careers.
A Rapid-Learning Health System Using in silico research Lynn Etheredge Wolfram Data Summit - September 9, 2010.
The Culture of Healthcare Healthcare Processes and Decision Making Lecture a This material (Comp2_Unit4a) was developed by Oregon Health & Science University,
Effect of Alcohol on Brain Development NormalFetal Alcohol Syndrome.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
Range of Computer Applications. Computer Applications Scientific Word Processing Spreadsheets E-commerce Business Educational Industrial National level.
Announce-1 CSE 5810Announcements  Informatics is:  Management and Processing of Data  From Multiple Sources/Contexts  Involves Classification (Ontologies),
Data Science Interview Questions 1.What do you mean by word Data Science? Data Science is the extraction of knowledge from large.
361 Lec1. Lecture Topics 1)Healthcare Informatics & Related Terms. 2)Knowledge Worker Roles. 3)Informatics and Informatics Forms. 4)Informatics Competencies.
FALL 2007 DIANNE HANSFORD CPI 101: Introduction to Informatics Sumber dari : IntroLecture.ppt.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
There is an inherent meaning in everything. “Signs for people who can see.”
Healthcare Science careers
Lean Innovative Connected Vessels
Semantic Web - caBIG Abstract: 21st century biomedical research is driven by massive amounts of data: automated technologies generate hundreds of.
What contribution can automated reasoning make to e-Science?
Introduction C.Eng 714 Spring 2010.
Brain Initiative Informational Conference Call
Conceptual Frameworks, Models, and Theories
CSc4730/6730 Scientific Visualization
Grid Application Model and Design and Implementation of Grid Services
Collaborative Smart House Environment Computer Science Department University of Cyprus Contact: Christodoulou Eleni.
Data Mining.
Big DATA.
7th East African Health & Scientific Conference
Presentation transcript:

Data mining and statistical learning: lecture 1a Statistics and computer science for a data-rich world

Data mining and statistical learning: lecture 1a 2020 Computing: Everything everywhere Declan Butler, nature, Vol 440, Issue no. 7083, 23 March 2006 Computing is getting exponentially cheaper Tiny computers that constantly monitor ecosystems, buildings and even human bodies could turn science on its head

Data mining and statistical learning: lecture 1a 2020 Computing: Everything everywhere Declan Butler, nature, Vol 440, Issue no. 7083, 23 March 2006 Science of the future: researchers can keep a constant eye on the flow of a Norwegian glacier by tracking miniature sensors buried beneath the ice.

Data mining and statistical learning: lecture 1a Examples of huge databases Transaction databases Customer relations databases Electronic health records (patient information) Records of phone calls and website visits Security information Weather and climate data Astrophysics data Particle accelerator data

Data mining and statistical learning: lecture 1a Emerging Database Infrastructure 2001: The National Virtual Observatory project gets under way in the United States, developing methods for mining huge astronomical data sets. 2001: The US National Institutes of Health launches the Biomedical Informatics Research Network (BIRN), a grid of supercomputers designed to let multiple institutions share data. 2007: INSPIRE (The INfrastructure for SPatial InfoRmation in Europe). The INSPIRE initiative intends to trigger the creation of a European spatial information infrastructure that delivers to the users integrated spatial information services. 2007: CERN's Large Hadron Collider in Switzerland, the world's largest particle accelerator, is slated to come online. The flood of data it delivers will demand more processing power than ever before.

Data mining and statistical learning: lecture 1a The future of scientific computing nature, Vol 440, Issue no. 7083, 23 March 2006 Science will increasingly be done directly in the database, finding relationships among existing data, while someone else performs the data collecting role This means that scientists will have to understand computer science much the same way as they previously had to understand mathematics, as a basic tool with which to do their jobs

Data mining and statistical learning: lecture 1a 2020 Computing: Everything everywhere Declan Butler, nature, Vol 440, Issue no. 7083, 23 March 2006 In the medical sciences, researchers will be able to mine up-to-the- minute databases instead of painstakingly collecting their own data The understanding of diseases, and the efficacy of treatments will be dissected by ceaselessly monitoring huge clinical populations It will be a very different way of thinking, sifting through the data to find patterns.

Data mining and statistical learning: lecture 1a A two-way street to science’s future Ian Foster, nature, Vol 440, Issue no. 7083, 23 March 2006 Science is increasingly about information: its collection, organization and transformation George Djorgovski: “Applied computer science is now playing the role which mathematics did from the seventeenth through the twentieth centuries: providing an orderly, formal framework and exploratory apparatus for other sciences” Science is becoming less reductionist and more integrative

Data mining and statistical learning: lecture 1a Science in an exponential world Alexander Szalay and Jim Gray, nature, Vol 440, Issue no. 7083, 23 March 2006 Increasingly, scientists are analysing complex systems that require data to be combined from several groups and even several disciplines. Important discoveries are made by scientists and teams who combine different skill sets – not just biologists, physicists and chemists, but also computer scientists, statisticians and data-visualization experts.

Data mining and statistical learning: lecture 1a Exceeding human limits Stephen H. Muggleton, nature, Vol 440, Issue no. 7083, 23 March 2006 A single high-throughput experiment in biology can easily generate more than a gigabyte of data per day. It is clear that the future of science involves the expansion of automation in all its aspects: data collection, storage of information, hypothesis formation and experimentation. We are seeing a range of techniques from mathematics, statistics and computer science being used to create scientific models from empirical data in an increasingly automated way. But, there is a severe danger that increases in speed and volume of data generation could lead to decreases in comprehensibility!

Data mining and statistical learning: lecture 1a Visual Analytics Visual analytics integrates new computational and theory- based tools with innovative interactive techniques and visual representations to enable human-information discourse. The design of the tools and the techniques is based on cognitive, design, and perceptual principles. Illuminating the Path: The Research and Development Agenda for Visual Analytics

Data mining and statistical learning: lecture 1a Organizing Undergraduate and Graduate Training It is important to realize that today’s graduate students need formal training in areas beyond their central discipline: they need to know some data management, computational concepts and statistical techniques.

Data mining and statistical learning: lecture 1a Key competences Artificial intelligence and machine learning Databases and data warehousing Statistics for prediction, classification, and assessment of data quality Visual analytics Scientific computing

Data mining and statistical learning: lecture 1a The science of statistics in a data-rich world Decreasing interestIncreasing interest Hypothesis testing Description and visualization Prediction and classification Theoretically derivedResampling techniques estimatorsSimulation (MC, MCMC) Classical linear modelsGeneralized linear models Generalized additive models Neural networks