Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning Applications in Biological Classification of River Water Quality Saso Dzeroski, Jasna Grobovic and William J. Walley 98419-548 조 동 연.

Similar presentations


Presentation on theme: "Machine Learning Applications in Biological Classification of River Water Quality Saso Dzeroski, Jasna Grobovic and William J. Walley 98419-548 조 동 연."— Presentation transcript:

1 Machine Learning Applications in Biological Classification of River Water Quality Saso Dzeroski, Jasna Grobovic and William J. Walley 98419-548 조 동 연

2 Contents Introduction Learning Rules for Biological Classification of British Rivers  The Data  The Experiment Analysis of Data about Slovenian Rivers  The Influence of Physical and Chemical Parameters on Selected Organisms  Biological Classification Discussion

3 Introduction Indicator Organisms (Bioindicators)  Given a biological sample, information on the presence and density of all indicator organisms present in the sample is usually combined to derive a biological index that reflects the quality of the water as the site where the sample was taken Saprobic Index  The main Problem: subjectivity The subjectivity introduced at intermediate levels can and should be minimized.

4 Learning Rules for Biological Classification of British River Data  292 samples of 80 benthic macroinvertebrates  Abundance of animals  0: no members of the particular family  1: 1-2  2: 3-9  3: 10-49  4: 50-99  5: 100-999  6: more than 1000  Sparse matrix  Five classes

5 Experiments 1  Modified CN2 algorithm  Measure the relative information score  Use the m-estimate instead of the Laplace estimate  The rules were required to be highly significant (99%).  15 difference values of m were tried (0, 0.01, 0.25., 0.5, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 and 1024).  Criterion  Information Score  Accuracy  Smaller value of the parameter m

6 Result 1  12 rules, m = 32  83% accuracy on the training set, 75% information content  Each rule covered 25 examples and contained 5 conditions.  The expert’s conclusions confirmed the rules.

7 Experiment 2  The main criticism was that the rules use only a small number of taxa, whereas the expert takes into account the whole community.  Six additional attributes  MoreThan0, MoreThan1, …, MoreThan5  reflect the number of families Result 2  13 rules, m = 64  accuracy 84%, information content 80%

8 Experiment 3  195 training example, 97 test example  Obvious performance improvement from the original to the extended problem.

9 Analysis of Data about Slovenian Rivers Data  4 years (1990 - 1993)  Biological samples are taken twice a year (summer, winter).  Physical and chemical analyses are performed several times a year for each sampling site.  698 water examples  training (70% - 489 cases), test (30% - 209 cases)

10 The Influence of Physical and Chemical Parameters on Selected Organisms  From an ecological and water quality of view, these are important research topic.  Binary Classification: Present / Absent  Attributes  Plants: Hardness, NO2, NO3, NH4, PO4, SiO2, Fe, Detergents, COD, BOD  Animals: Temperature, PH, O2, Saturation, COD, BOD

11

12 Result  Accuracy: 66% - 85%  Information score: 23% - 50%  10 - 20 rules for each taxa  The average rule length was less than 5 conditions.  Average rule coverage was 15 to 45 examples.

13 Nitzschia palea Elmis sp.

14 Biological Classification  13 physical and chemical parameters  27 bioindicators  7 classes  The majority class comprises 339 of the 698 examples, thus the default accuracy is 48.6%.

15

16 Discussion We have described several applications of rule induction in the domain of biological water quality classification.  The produced rules are transparent and can be easily understood by experts.  The induced rule contained valuable knowledge about the domain studied.  Machine learning techniques can be useful tools for classification and data analysis in the domain of river water quality and other ecological domains.


Download ppt "Machine Learning Applications in Biological Classification of River Water Quality Saso Dzeroski, Jasna Grobovic and William J. Walley 98419-548 조 동 연."

Similar presentations


Ads by Google