Presentation is loading. Please wait.

Presentation is loading. Please wait.

A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment.

Similar presentations


Presentation on theme: "A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment."— Presentation transcript:

1 A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores Claudio Quintano, Rosalia Castellano, Sergio Longobardi UNIVERSITY OF NAPLES “PARTHENOPE” claudio.quintano@uniparthenope.it lia.castellano@uniparthenope.it sergio.longobardi@uniparthenope.it

2 -2--2- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” OUTLINE This work considers data on students’ performance assessments collected by the Italian National Evaluation Institute of the Ministry of Education (INVALSI) OUTLIER UNITS, at class level, which brings to biased distributions of the average scores by class OUTLIER UNITS, at class level, which brings to biased distributions of the average scores by class The AIM is to MITIGATE THE PRESENCE of outliers and correcting the overestimation of children ability The AIM is to MITIGATE THE PRESENCE of outliers and correcting the overestimation of children ability THE INVALSI SURVEY 3 AREAS reading, mathematics and science 5 SCHOOL LEVELS –2 th and 4 th year of primary school –1 th year of lower secondary –1 th and 3 th year of upper secondary

3 -3--3- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” MATHEMATICS CLASS MEAN SCORE - S.Y 2004/05 III CLASS UPPER SECONDARY SCHOOL I CLASS UPPER SECONDARY SCHOOL I CLASS LOWER SECONDARY SCHOOL IV CLASS PRIMARY SCHOOL II CLASS PRIMARY SCHOOL DISTRIBUTIONS OF MEAN SCORES AT CLASS LEVEL (MATHEMATICS ASSESSMENT)

4 -4--4- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” CLASS MEAN SCORE II CLASS - PRIMARY SCHOOL Reading s.y. 2004/05 Mathematics s.y. 2004/05 Science s.y. 2004/05 Reading s.y. 2005/06 Mathematics s.y. 2005/06 Science s.y. 2005/06

5 -5--5- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” Deletion of micro units –students- considered as “PSEUDO NON RESPONDENTS” Students who haven’t given the minimum number of answers to compute a performance score The presence of these units varies from 9% to 16% STEP I

6 -6--6- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” Class mean score : COMPUTATION OF CLASS LEVEL INDICATOR SCORE OF I TH STUDENT OF J TH CLASS NUMBER OF RESPONDENT STUDENTS OF J TH CLASS For each student class the following indexes are computed: Standard deviation of mean score Class non response rate NUMBER BOTH OF ITEM NON REPSONSES AND OF INVALID RESPONSES FOR THE I TH STUDENT OF THE J TH CLASS NUMBER OF RESPONDENT STUDENTS OF J TH CLASS NUMBER OF ADMINISTERED ITEMS TO J TH CLASS Index of answers’ homogeneity GINI MEASURE OF HETEROGENEITY COMPUTED FOR EACH S TH TEST QUESTION ADMINISTERED TO EACH STUDENT OF J TH CLASS SUMMARY At first step the micro units considered as “pseudo-non respondents” have been dropped from dataset then the following indexes, at class level, are computed: At first step the micro units considered as “pseudo-non respondents” have been dropped from dataset then the following indexes, at class level, are computed: Class mean score Standard deviation of mean score Class non response rate Index of answers’ homogeneity

7 -7--7- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” PRINCIPAL COMPONENT ANALYSIS (PCA) By the PCA we are able to describe the answer behaviour of each student class through two variables CONTRAPOSITION FIRST Component SECOND Component OUTLIERS IDENTIFICATION AXIS INDEX OF CLASS COLLABORATION TO SURVEY Class non response rate

8 -8--8- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” It is possible to detect, graphically, the outlier classes of students Projection on the first two factorial axes plane of second class primary students PRINCIPAL COMPONENT ANALYSIS (PCA) OUTLIER CLASSES

9 -9--9- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” Computation of fuzzy partition matrix where for each students’ class (rows of the matrix) the degree of belonging to each cluster (columns of the matrix) is computed FUZZY K-MEANS APPROACH THE FUZZY K-MEANS APPROACH On the basis of the two factorial dimensions the students’classes are classified in 8 clusters by a FUZZY K- MEANS algorithm

10 -10- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” DETECTION OF OUTLIERS Projection of centroids computed by fuzzy k-means High negative scores on “outliers identification axis” (x-axis) that indicates a high class average scores and minimum within variability respect to scores and test answers OUTLIER CLUSTER Factorial scores close to zero respect to the “index of class collaboration to survey”

11 -11- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” Indicating with “a” the outlier cluster, the degree of belonging to this cluster is: µ ja Otherwise it can be interpreted as the “outlier level” of each class This measure is considered as the “outlier probability” of j th class DETECTION OF OUTLIERS

12 -12- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” W j varies from 0 to 1 The students’ class with high probability to belong to outlier cluster will have a low weight while the class very far from this cluster will have a weight close to 1 CORRECTION PROCEDURE On the basis of the outlier cluster degree, a weighting factor is developed: a weighting factor is developed: W j =1 - µ ja Weighting factor Outlier probability

13 -13- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” EFFECTS OF THE CORRECTION PROCEDURE ORIGINAL DISTRIBUTION ADJUSTED DISTRIBUTION

14 -14- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” THE INSPIRATION PRINCIPLE OUTLIER NOT OUTLIER Go over the dichotomous logic FUZZYAPPROACH Compute an “OUTLIER LEVEL” measure for each unit to calibrate the correction

15 -15- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” RELATIONSHIP BETWEEN THE SCHOOL LOCALIZATION AND THE PRESENCE OF OUTLIER CLASSES Box plot of outlier level µ ja Degree to belonging to the outlier cluster (cluster n.2)

16 -16- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” CLASS AVERAGE SCORE DISTRIBUTIONS ONLY FOR THE NORTHERN AND CENTRAL REGIONS RELATIONSHIP BETWEEN THE SCHOOL LOCALIZATION AND THE PRESENCE OF OUTLIER CLASSES

17 -17- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” REGIONAL SCORES NOT WEIGHTED AVERAGE WEIGHTED AVERAGE

18 -18- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” denotes the ratio of students of j th class that has given the t th answer to s th question Index of answers’ homogeneity Where E sj is a Gini measure of heterogeneity: The Gini measure is equal to zero when all students of j th class have given the same answer to the s th question. It reaches the maximum value: h-1/h (h is the number of alternative answers to question s th ) when there is perfect heterogeneity of answers to s th question in the j th class The mean of the Q Gini indexes (E sj ) computed for each s th test Question administered to each student of j th class:

19 -19- A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment test scores C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope” Original distributionAdjusted distribution MEAN74,7171,67 MODE100,0068,75 I QUARTILE64,4263,12 MEDIAN73,6171,09 III QUARTILE85,9480,69 KURTOSIS SKEWNESS EFFECTS OF THE CORRECTION PROCEDURE


Download ppt "A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment."

Similar presentations


Ads by Google