Download presentation
Presentation is loading. Please wait.
Published byRonald Morrison Modified over 9 years ago
1
Machine Learning Applications in Biological Classification of River Water Quality Saso Dzeroski, Jasna Grobovic and William J. Walley 98419-548 조 동 연
2
Contents Introduction Learning Rules for Biological Classification of British Rivers The Data The Experiment Analysis of Data about Slovenian Rivers The Influence of Physical and Chemical Parameters on Selected Organisms Biological Classification Discussion
3
Introduction Indicator Organisms (Bioindicators) Given a biological sample, information on the presence and density of all indicator organisms present in the sample is usually combined to derive a biological index that reflects the quality of the water as the site where the sample was taken Saprobic Index The main Problem: subjectivity The subjectivity introduced at intermediate levels can and should be minimized.
4
Learning Rules for Biological Classification of British River Data 292 samples of 80 benthic macroinvertebrates Abundance of animals 0: no members of the particular family 1: 1-2 2: 3-9 3: 10-49 4: 50-99 5: 100-999 6: more than 1000 Sparse matrix Five classes
5
Experiments 1 Modified CN2 algorithm Measure the relative information score Use the m-estimate instead of the Laplace estimate The rules were required to be highly significant (99%). 15 difference values of m were tried (0, 0.01, 0.25., 0.5, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 and 1024). Criterion Information Score Accuracy Smaller value of the parameter m
6
Result 1 12 rules, m = 32 83% accuracy on the training set, 75% information content Each rule covered 25 examples and contained 5 conditions. The expert’s conclusions confirmed the rules.
7
Experiment 2 The main criticism was that the rules use only a small number of taxa, whereas the expert takes into account the whole community. Six additional attributes MoreThan0, MoreThan1, …, MoreThan5 reflect the number of families Result 2 13 rules, m = 64 accuracy 84%, information content 80%
8
Experiment 3 195 training example, 97 test example Obvious performance improvement from the original to the extended problem.
9
Analysis of Data about Slovenian Rivers Data 4 years (1990 - 1993) Biological samples are taken twice a year (summer, winter). Physical and chemical analyses are performed several times a year for each sampling site. 698 water examples training (70% - 489 cases), test (30% - 209 cases)
10
The Influence of Physical and Chemical Parameters on Selected Organisms From an ecological and water quality of view, these are important research topic. Binary Classification: Present / Absent Attributes Plants: Hardness, NO2, NO3, NH4, PO4, SiO2, Fe, Detergents, COD, BOD Animals: Temperature, PH, O2, Saturation, COD, BOD
12
Result Accuracy: 66% - 85% Information score: 23% - 50% 10 - 20 rules for each taxa The average rule length was less than 5 conditions. Average rule coverage was 15 to 45 examples.
13
Nitzschia palea Elmis sp.
14
Biological Classification 13 physical and chemical parameters 27 bioindicators 7 classes The majority class comprises 339 of the 698 examples, thus the default accuracy is 48.6%.
16
Discussion We have described several applications of rule induction in the domain of biological water quality classification. The produced rules are transparent and can be easily understood by experts. The induced rule contained valuable knowledge about the domain studied. Machine learning techniques can be useful tools for classification and data analysis in the domain of river water quality and other ecological domains.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.