Machine Learning Applications in Biological Classification of River Water Quality Saso Dzeroski, Jasna Grobovic and William J. Walley 98419-548 조 동 연.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Data Mining Classification: Alternative Techniques
Ecological Evaluation Index (EEI) A biotic index for the implementation of WFD in rocky coastal and sedimentary transitional Mediterranean waters by Sotiris.
ECOLOGICAL RESPONSES TO NUTRIENTS Utah Division of Water Quality Snake Creek, Heber Valley, 2014.
Lec 12: Rapid Bioassessment Protocols (RBP’s)
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
 Simpson's Diversity Index is a measure of diversity. In ecology, it is often used to quantify the biodiversity of a habitat. It takes into account the.
Gene selection using Random Voronoi Ensembles Stefano Rovetta Department of Computer and Information Sciences, University of Genoa, Italy Francesco masulli.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
3.3 Toward Statistical Inference. What is statistical inference? Statistical inference is using a fact about a sample to estimate the truth about the.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 28 Slide 1 Process Improvement.
Feature Selection for Regression Problems
Unsupervised Pattern Recognition for the Interpretation of Ecological Data by Mark A. O’Connor Centre for Intelligent Environmental Systems School of Computing.
CHAPTER 18 Weighted Averaging From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Community Ordination and Gamma Diversity Techniques James A. Danoff-Burg Dept. Ecol., Evol., & Envir. Biol. Columbia University.
Metric (Family Level) Standard Best Value (95 th or 5 th percentile) Worst Possible Value Expected Response to Degradation Total Taxa180 EPT Taxa120 %EPT91.90.
Codex Guidelines for the Application of HACCP
Ecological Evaluation Index (EEI) A biotic index for the implementation of WFD by Sotiris Orfanidis (February 2008)
Tasks and Training the Intermediate Age Students for Informatics Competitions Emil Kelevedjiev Zornitsa Dzhenkova BULGARIA.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
CRITICAL TECHNICAL ELEMENTS FOR A BIOASSESSMENT PROGRAM Michael T. Barbour, Tetra Tech Chris O. Yoder, MBI.
ARROW: system for the evaluation of the status of waters in the Czech Republic Jiří Jarkovský 1) Institute of Biostatistics and Analyses, Masaryk University,
Rasch trees: A new method for detecting differential item functioning in the Rasch model Carolin Strobl Julia Kopf Achim Zeileis.
Universit at Dortmund, LS VIII
Bacterial production and Microsystin in Lake Taihu GUANG GAO Nanjing Institute of Geography & Limnology, CAS
Colorado EDAS Enhancement and Index Development 2004 Tetra Tech, Inc. and Utah State University.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Bab 5 Classification: Alternative Techniques Part 1 Rule-Based Classifer.
Accuracy Based Generation of Thermodynamic Properties for Light Water in RELAP5-3D 2010 IRUG Meeting Cliff Davis.
Modeling algorithm of heat-and-mass transfer processes at microwave heating of capillary-porous materials Modeling algorithm of heat-and-mass transfer.
SAMO 2007, 21 Jun 2007 Davood Shahsavani and Anders Grimvall Linköping University.
How did you estimate the number of daisies? Did you try to count them all? Or did you use another method? See if this will help…
A B S T R A C T The study presents the application of selected chemometric techniques to the pollution monitoring dataset, namely, cluster analysis,
Extent and Mask Extent of original data Extent of analysis area Mask – areas of interest Remember all rasters are rectangles.
Carmen M Sarabia-Cobo. University of Cantabria Spain
D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description.
14. Behavioural Cloning of Control Skill Ivan Bratko, Tanja Urbancic and Claude Sammut 발표 : 송창빈.
Population Parameters – Chapter 9. Population – a group of organisms of the same species occupying a particular space at a particular time. Demes – groups.
Pollution and Monitoring
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Applications of Machine Learning to Ecological Modelling Saso Dzeroski Jozef Stefan Institute Ljubljana, Slovenia.
Identifying Changes to Stream Condition caused by Urbanization How understanding the responses can improve ecological risk characterization
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Environmental Assessment and Sustainability CIV913 BIOLOGICAL ASSESSMENT of River Water Quality Assessing the biological quality of fresh waters : Wright,
CpSc 881: Machine Learning
User Modeling and Recommender Systems: Introduction to recommender systems Adolfo Ruiz Calleja 06/09/2014.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Detecting Ecological Effects of Development in the Wappingers and Fishkill Watersheds Karin Limburg, Karen Stainbrook, Bongghi Hong SUNY College of Environmental.
Examples. Path Model 1 Simple mediation model. Much of the influence of Family Background (SES) is indirect.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Ecological Techniques Quadrats, Transects and Measuring Abiotic Factors.
Extending linear models by transformation (section 3.4 in text) (lectures 3&4 on amlbook.com)
Using Regional Models to Assess the Relative Effects of Stressors Lester L. Yuan National Center for Environmental Assessment U.S. Environmental Protection.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter : Zhao-Wei Luo Che-Jung Chang,Der-Chiang.
CIS Working Group 2A ECOSTAT Overall Approach to the Ecological Classification 01 July 2003 D/UK WGL CIS 2A.
1 EUROPEAN TOPIC CENTRE ON WATER EUROWATERNET Towards an Index of Quality of the National Data in Waterbase.
Chapter 25 Process Improvement.
Final Evaluation Lab Practicum Take Home Assessment Formal Examination
An Artificial Intelligence Approach to Precision Oncology
Water quality indexing – surface water
Freshwater ecosystem management and practice in Thailand
Unit 3 Science Investigation Skills
Statistical Analysis Error Bars
Automated beef classification
Carolin Meier & Daniel Hering (University of Duisburg-Essen)
Department of Electrical Engineering
INTRODUCTION MATERIALS & METHOD RESULTS & DISCUSSION CONCLUSION
REFCOND Workshop Uppsala, May 2001
CIS Working Group 2A ECOSTAT SCG Meeting in Brussels
Presentation transcript:

Machine Learning Applications in Biological Classification of River Water Quality Saso Dzeroski, Jasna Grobovic and William J. Walley 조 동 연

Contents Introduction Learning Rules for Biological Classification of British Rivers  The Data  The Experiment Analysis of Data about Slovenian Rivers  The Influence of Physical and Chemical Parameters on Selected Organisms  Biological Classification Discussion

Introduction Indicator Organisms (Bioindicators)  Given a biological sample, information on the presence and density of all indicator organisms present in the sample is usually combined to derive a biological index that reflects the quality of the water as the site where the sample was taken Saprobic Index  The main Problem: subjectivity The subjectivity introduced at intermediate levels can and should be minimized.

Learning Rules for Biological Classification of British River Data  292 samples of 80 benthic macroinvertebrates  Abundance of animals  0: no members of the particular family  1: 1-2  2: 3-9  3:  4:  5:  6: more than 1000  Sparse matrix  Five classes

Experiments 1  Modified CN2 algorithm  Measure the relative information score  Use the m-estimate instead of the Laplace estimate  The rules were required to be highly significant (99%).  15 difference values of m were tried (0, 0.01, 0.25., 0.5, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 and 1024).  Criterion  Information Score  Accuracy  Smaller value of the parameter m

Result 1  12 rules, m = 32  83% accuracy on the training set, 75% information content  Each rule covered 25 examples and contained 5 conditions.  The expert’s conclusions confirmed the rules.

Experiment 2  The main criticism was that the rules use only a small number of taxa, whereas the expert takes into account the whole community.  Six additional attributes  MoreThan0, MoreThan1, …, MoreThan5  reflect the number of families Result 2  13 rules, m = 64  accuracy 84%, information content 80%

Experiment 3  195 training example, 97 test example  Obvious performance improvement from the original to the extended problem.

Analysis of Data about Slovenian Rivers Data  4 years ( )  Biological samples are taken twice a year (summer, winter).  Physical and chemical analyses are performed several times a year for each sampling site.  698 water examples  training (70% cases), test (30% cases)

The Influence of Physical and Chemical Parameters on Selected Organisms  From an ecological and water quality of view, these are important research topic.  Binary Classification: Present / Absent  Attributes  Plants: Hardness, NO2, NO3, NH4, PO4, SiO2, Fe, Detergents, COD, BOD  Animals: Temperature, PH, O2, Saturation, COD, BOD

Result  Accuracy: 66% - 85%  Information score: 23% - 50%  rules for each taxa  The average rule length was less than 5 conditions.  Average rule coverage was 15 to 45 examples.

Nitzschia palea Elmis sp.

Biological Classification  13 physical and chemical parameters  27 bioindicators  7 classes  The majority class comprises 339 of the 698 examples, thus the default accuracy is 48.6%.

Discussion We have described several applications of rule induction in the domain of biological water quality classification.  The produced rules are transparent and can be easily understood by experts.  The induced rule contained valuable knowledge about the domain studied.  Machine learning techniques can be useful tools for classification and data analysis in the domain of river water quality and other ecological domains.