Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predicting Bullying Victims using Classification Models Okja Kim Computer Science Dept., MET College, Boston University Jae Young Lee Computer Science.

Similar presentations


Presentation on theme: "Predicting Bullying Victims using Classification Models Okja Kim Computer Science Dept., MET College, Boston University Jae Young Lee Computer Science."— Presentation transcript:

1 Predicting Bullying Victims using Classification Models Okja Kim Computer Science Dept., MET College, Boston University Jae Young Lee Computer Science Dept., MET College, Boston University Paper id: 16

2 Agenda Introduction Dataset Preprocessing Building Models Conclusion CSECS 2015 Boston2

3 Introduction Bullying is unwanted, aggressive behavior among school aged children that involves a real or perceived power imbalance. The behavior is repeated, or has the potential to be repeated, over time. Bullying includes actions such as making threats, spreading rumors, attacking someone physically or verbally, and excluding someone from a group on purpose.(The US Department of Health and Human Services) [1]. School bullying is one of the most important and serious issues facing students, parents, and educators. Bullying affects not only the present emotional and physical health but very often the whole life of the affected students. One important and effective way of managing school bullying problem is to identify potential bullying victims. Once a student is identified as a potential bullying victim, school officials and parents can take proactive measures to prevent bullying. The goal of this case study was to build a classification model which would predict bullying victims. CSECS 2015 _ Boston3

4 Introduction This study used the report and its accompanying dataset prepared for the Institute of Education Sciences (IES) by Regional Educational Laboratory Northeast administered by Education Development Center, Inc. [2]. The goal of the report was to test characteristics of bullying, bullying victims, and bullying victims' schools to determine which were associated with reporting to school officials. This report was generated based on a nation-wide survey, National Crime Victimization Survey, administered by the U.S. Census Bureau on behalf of the Bureau of Justice Statistics to persons ages between 12 and 18 in selected households across the contiguous United States. Dataset The initial dataset included 5621 instances and 389 attributes. Each instance represents a student. All attributes, except some attributes that are included for administrative purposes, belong to one of three categories: bullying victimization, bullying victims, and bullying victims' schools. CSECS 2015 _ Boston4

5 Dataset - Attributes Some of the attributes in the bullying victimization category: Attributes that represent the type of bullying include: Threatened, Destroyed property, Spread rumors Pushed, shoved, tripped, and the like Number of bullying experienced Location where bullying occurred Some attributes belonging to the bullying victims category: Attributes that represent sociodemographic characteristics: Gender, Race/ethnicity Current grade, Household income Attributes that represent school-related experience and perceptions: Student's academic performance Whether the student skipped classes during the academic year Whether the student had an adult at school who cares about him or her Whether the student has a friend at school to talk to Some of the attributes in the bullying victims' schools are: Whether the school is public or private Whether the school is church-related Everyone knows school rules Students know the punishment School rules are strictly enforced Teachers treat students with respect Whether the school has security guards Whether the school has metal detectors Whether the school has locked doors Whether the school has a student code of conduct The goal was to build a classification model which would predict whether a student will be bullied or not using only attributes in the bullying victims category and the bullying victims' schools category. CSECS 2015 _ Boston5

6 Data Preprocessing – Create a class attribute The bullying victimization category attributes indicating student was bullied (45 attributes) The type of bullying a student experienced (7 att.) A new class label, class  1 : any of the 7 attributes has yes value  The remaining instances was removed if it has a missing value, (68 instances)  0 : all remaining instances Remove the 7 attributes attribute code attribute label number of yes tuples number of no tuples number of missing values vs071made fun of, called names1185437858 vs072spread rumors1015453868 vs073threatened323524058 vs074pushed, shoved, tripped, etc.628493459 vs075do things not wanted235533155 vs076excluded301526060 vs077destroyed property231533456 CSECS 2015 _ Boston6

7 Data Preprocessing – Remove attributes All other that indicate a bullying occurred (38 attributes) One attribute indicating report of bullying Irrelevant attributes (34 attributes) Redundant attributes (one from each 17 pairs of attributes) Attributes having the same value for all instances (10 attributes) Handling missing values, in general  Replaced with a global constant  Replaced with the attribute mean of numeric attribute  Replaced with an attribute mode of nominal attribute  Remove the instance from the dataset Our approach  Removed all attributes with missing values in more than 20% of instances (111 attributes)  Remaining missing values were left in the dataset CSECS 2015 _ Boston7 5,553 instances and 172 attributes

8 Building Models using All Attributes CSECS 2015 _ Boston8 ClassifierAccuracyAUC Naïve Bayes71.92510.75 J48 Decision Tree69.94420.623 Logistic74.24820.769 Neural Network71.38480.703 Support Vector Machine67.17090.504 Bagging (Logistic)74.05010.767 AdaBoost (Logistic)74.24820.691 Accuracy: fraction of instances correctly classified Area under curve, or AUC: ROC curve is effective visual tool for comparing different classifier models. Plots the trade-off between TPR (true positive rate) and FPR (false positive rate). When the difference between curves of classifiers is not visually obvious, we can use AUC. The closer to 1.0, the better

9 Building Models using Subset of Attributes CSECS 2015 _ Boston9 Attribute selection algorithm Number of attributes AccuracyAUC ChiSquaredAttributeEval25 73.32970.759 GainRatioAttributeEval25 73.18570.743 InfoGainAttributeEval25 73.34770.758 OneRAttributeEval25 73.13160.717 CfsSubsetEvalBestFirst17 73.41980.766 WrapperSubsetEvalJ48Greedy27 74.03210.769 WrapperSubsetEvalJ48BestFirst11 72.44730.646 WrapperSubsetEvalNBGreedy18 74.0140.726 WrapperSubsetEvalNBBestFirst18 74.0140.726 WrapperSubsetEvalLogisticGreedy23 75.74280.772 WrapperSubsetEvalLogisticBestFirst26 75.41870.766 There still were 172 attributes Some of them may not contribute to classification Used the attribute selection algorithms on Weka to select subset of attributes with higher prediction accuracy If an algorithm automatically determines the number of attributes, we used them. If an algorithm generates ranks, we selected top 24 attributes.

10 ConclusionReferences For this dataset, the logistic regression algorithm worked best. Wrapper attribute selection algorithms performed better than filter attribute selection algorithms for this dataset. Attribute selection algorithms increased the accuracy of classification models, but the increase was not very significant. School bullying is a serious problem. We showed that a classification model can be built to predict whether a student would be bullied or not. The classification model can be an effective tool, when used along with other tools such as teachers' and parents' observations, to identify and prevent potential bullying victimization. CSECS 2015 _ Boston10 [1] http://www.stopbullying.gov/what-is-bullying/index.htmlhttp://www.stopbullying.gov/what-is-bullying/index.html [2] A. Petrosino, S. Guckenburg, J. DeVoe, and T. Hanson. What characteristics of bullying, bullying victims, and schools are associated with increased reporting of bullying to school officials? (Issues and Answers Report, REL 2010 – No. 092)Washington DC: US Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Northeast and Islands. [3] Pang-ning Tan et. al. Introduction to Data Mining, Addison Wesley, 2005. [4] S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In: Proc. Int'l Conf. on Information and Knowledge Management, pp. 148-155, 1998. [5] M.A.Hall. Correlation-based feature selection for machine learning. PhD thesis, Department of Computer Science, University of Waikato, Hamilton, New Zealand, 1998. [6] Ron Kohavi and George H. John. Wrappers for feature subset selection. In: Artificial Intelligence, 97:273-324, 1997.


Download ppt "Predicting Bullying Victims using Classification Models Okja Kim Computer Science Dept., MET College, Boston University Jae Young Lee Computer Science."

Similar presentations


Ads by Google