1 Peter Fox Data Science – ITEC/CSCI/ERTH-6961-01 Week 6, October 5, 2010 Introduction to Data Mining.

Slides:



Advertisements
Similar presentations
Measurement and modeling of aerosol Fe speciation Jingqiu Mao (GFDL/NOAA), Songmiao Fan (GFDL/NOAA), Ying Chen (Fudan U, China)
Advertisements

Marine Ecosystems and Food Webs. Carbon Cycle Marine Biota Export Production.
Atmospheric Iron Flux and Surface Chlorophyll at South Atlantic Ocean: A case study Near Patagonia J. Hernandez*, D. J. Erickson III*, P. Ginoux†, W. Gregg‡,
Calcifying plankton and their modulation of the north Atlantic, sub-arctic and European shelf-sea sinks of atmospheric carbon dioxide from Satellite Earth.
Data warehouse example
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Chapter 9 Business Intelligence Systems
1 Peter Fox Data Science – ITEC/CSCI/ERTH-4750/6750 Week 9, October 21, 2014 Intro to Data Mining for Data Science.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Data Mining By Archana Ketkar.
What is the Saharan Air Layer? The Saharan Air Layer (SAL) is a layer of warm, dry, dusty air which normally overlays the cooler more humid surface air.
KDD for Science Data Analysis Issues and Examples.
Chapter 5 Data mining : A Closer Look.
Introduction to machine learning
Abbie Harris - NOAA Ocean Acidification Think Tank #5 Current and Future Research at the Institute for Marine Remote Sensing Abbie Rae Harris Institute.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Understanding Data Analytics and Data Mining Introduction.
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
The influence of dust on ocean biogeochemistry; chasing Saharan dust storms Eric Achterberg, Micha Rijkenberg, Polly Hill, Matt Patey, Maria Nielsdottir,
Dr. Russell Anderson Dr. Musa Jafar West Texas A&M University.
State of the Oceans Portfolio Committee 13 February 2013.
IB 362 Lecture 12 Productivity and Food Webs.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Equatorial Pacific primary productivity: Spatial and temporal variability and links to carbon cycling Pete Strutton College of Oceanic and Atmospheric.
1 DESERT DUST YOU HAVE A HANDOUT THAT PROVIDES AN OVERVIEW OF THE SUBJECT – READ IT AS WELL AS MY BRIEF SUMMARY DUST STORMS COMMON PHENOMENON IS ARID AND.
AI Week 14 Machine Learning: Introduction to Data Mining Lee McCluskey, room 3/10
Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for.
1 Foundations VI: Discovery, Access and Semantic Integration Data Mining and Knowledge Discovery - Continued Deborah McGuinness and Joanne Luciano with.
Arrigo (2005) Marine Microorganisms and Global Nutrient Cycles Nature 437: Issues: Redfield stoichiometry Co-limitation N 2 -Fixation Anammox.
Operational assimilation of dust optical depth Bruce Ingleby, Yaswant Pradhan and Malcolm Brooks © Crown copyright 08/2013 Met Office and the Met Office.
Online measurements of chemical composition and size distribution of submicron aerosol particles in east Baltic region Inga Rimšelytė Institute of Physics.
Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net.
BIOINFORMATIC MAPPING OF OCEAN BIOGEOCHEMICAL PROVINCES Matthew Oliver (U. Del.) Oscar Schofield (Rutgers) Andrew Irwin (Mt. Allison University) Paul Falkowski.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Why dump iron in the oceans? Lessons learned from ocean iron fertilization experiments Ken O. Buesseler Woods Hole Oceanographic Institution.
Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1.
Timothy Logan University of North Dakota Department of Atmospheric Science.
Theory West African dust outbreaks and the relationship with North Atlantic hurricanes Amato T. Evan, Christopher S. Velden, Andrew K. Heidinger & Jason.
Dust as a Tracer of Climate Change in Antarctica and as modulator of Phytoplankton Activity Ice core records show a correlation of dust deposition and.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Filling the Gap in the Ocean Color Record Watson Gregg and Nancy Casey NASA/Global Modeling and Assimilation Office ABSTRACT A critical.
Assimilation of Aqua Ocean Chlorophyll Data in a Global Three-Dimensional Model Watson Gregg NASA/Global Modeling and Assimilation Office.
Assessment on Phytoplankton Quantity in Coastal Area by Using Remote Sensing Data RI Songgun Marine Environment Monitoring and Forecasting Division State.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Incorporating Satellite Time-Series data into Modeling Watson Gregg NASA/GSFC/Global Modeling and Assimilation Office Topics: Models, Satellite, and In.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Cluster Analysis This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed under a Creative Commons.
SNS COLLEGE OF TECHNOLOGY
What Is Cluster Analysis?
By Arijit Chatterjee Dr
DATA MINING © Prentice Hall.
NASA’s Ocean Color Online Visualization and Analysis System
Intro to “Data Mining” for Data Science
NASA’s Ocean Color Online Visualization and Analysis System
Data Warehousing and Data Mining
Prepared by: Mahmoud Rafeek Al-Farra
Presentation transcript:

1 Peter Fox Data Science – ITEC/CSCI/ERTH Week 6, October 5, 2010 Introduction to Data Mining

Contents Data Mining definitions, what it is, is not, types Distributed applications – modern data mining Science example A specific toolkit set of examples (next week) –Classifier –Image analysis – clouds 2

Data Mining – What it is Extracting knowledge from large amounts of data Motivation –Our ability to collect data has expanded rapidly –It is impossible to analyze all of the data manually –Data contains valuable information that can aid in decision making Uses techniques from: –Pattern Recognition –Machine Learning –Statistics –High Performance Database Systems –OLAP Plus techniques unique to data mining (Association rules) Data mining methods must be efficient and scalable

Data Mining – What it isn’t Small Scale –Data mining methods are designed for large data sets –Scale is one of the characteristics that distinguishes data mining applications from traditional machine learning applications Foolproof –Data mining techniques will discover patterns in any data –The patterns discovered may be meaningless –It is up to the user to determine how to interpret the results –“Make it foolproof and they’ll just invent a better fool” Magic –Data mining techniques cannot generate information that is not present in the data –They can only find the patterns that are already there

Data Mining – Types of Mining Classification (Supervised Learning) –Classifiers are created using labeled training samples –Training samples created by ground truth / experts –Classifier later used to classify unknown samples Clustering (Unsupervised Learning) –Grouping objects into classes so that similar objects are in the same class and dissimilar objects are in different classes –Discover overall distribution patterns and relationships between attributes Association Rule Mining –Initially developed for market basket analysis –Goal is to discover relationships between attributes –Uses include decision support, classification and clustering Other Types of Mining –Outlier Analysis –Concept / Class Description –Time Series Analysis

Data Mining in the ‘new’ Distributed Data/Services Paradigm

Science Motivation Study the impact of natural iron fertilization process such as dust storm on plankton growth and subsequent DMS production –Plankton plays an important role in the carbon cycle –Plankton growth is strongly influenced by nutrient availability (Fe/Ph) –Dust deposition is important source of Fe over ocean –Satellite data is an effective tool for monitoring the effects of dust fertilization

Hypothesis In remote ocean locations there is a positive correlation between the area averaged atmospheric aerosol loading and oceanic chlorophyll concentration There is a time lag between oceanic dust deposition and the photosynthetic activity

Primary source of ocean nutrients WIND BLOWNDUST SAHARA SEDIMENTS FROM RIVER OCEAN UPWELLING

SAHARA DUST SST CLOUDS NUTRIENTS CHLOROPHYLL Factors modulating dust-ocean photosynthetic effect

Objectives Use satellite data to determine, if atmospheric dust loading and phytoplankton photosynthetic activity are correlated. Determine physical processes responsible for observed relationship

Data and Method Data sets obtained from SeaWiFS and MODIS during 2000 – 2006 are employed MODIS derived AOT SeaWIFS MODIS AOT

The areas of study Tropical North Atlantic Ocean 2-West coast of Central Africa 3- Patagonia 4-South Atlantic Ocean 5-South Coast of Australia 6-Middle East 7- Coast of China 8-Arctic Ocean *Figure: annual SeaWiFS chlorophyll image for 2001

Tropical North Atlantic Ocean  dust from Sahara Desert Chlorophyll AOT

Arabian Sea  Dust from Middle East Chlorophyll AOT

Summary … Dust impacts oceans photosynthetic activity, positive correlations in some areas NEGATIVE correlation in other areas, especially in the Saharan basin Hypothesis for explaining observations of negative correlation: In areas that are not nutrient limited, dust reduces photosynthetic activity But also need to consider the effect of clouds, ocean currents. Also need to isolate the effects of dust. MODIS AOT product includes contribution from dust, DMS, biomass burning etc.

What is next Next week –Remainder of Data mining Reading –Baker, Barton, Peterson and Fox