Collage Score Card & Software defect prediction

Slides:



Advertisements
Similar presentations
Chapter 13 The Data Warehouse
Advertisements

By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Accessing Organizational Information—Data Warehouse
Introduction to Data Mining with XLMiner
CSC 177 Data warehouse and Mining project Pooja Vora Vishma Shah Guided by – Prof. Meiliu lu.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
Business Driven Technology Unit 2
1 Data and Knowledge Management. 2 Data Management: A Critical Success Factor The difficulties and the process Data sources and collection Data quality.
Business Intelligence components Introduction. Microsoft® SQL Server™ 2005 is a complete business intelligence (BI) platform that provides the features,
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Investigating mobile based prediction modelling of academic performance for primary school pupils: a data mining approach. by Mvurya Mgala Supervisors:
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Data Mining Techniques
Chapter 4-1. Chapter 4-2 Database Management Systems Overview  Not a database  Separate software system Functions  Enables users to utilize database.
Understanding Data Analytics and Data Mining Introduction.
DATA WAREHOUSING IN SQL SERVER 2005/2008 BUSINESS INTELLIGENCE.
CIS 9002 Kannan Mohan Department of CIS Zicklin School of Business, Baruch College.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Weka Project assignment 3
Working with Reports in Microsoft Excel Session Version 1.0 © 2011 Aptech Limited.
Fluency with Information Technology INFO100 and CSE100 Katherine Deibel Katherine Deibel, Fluency in Information Technology1.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
1 Data Warehouses BUAD/American University Data Warehouses.
Database A database is a collection of data organized to meet users’ needs. In this section: Database Structure Database Tools Industrial Databases Concepts.
Data Warehousing.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
Introduction – Addressing Business Challenges Microsoft® Business Intelligence Solutions.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
CISB113 Fundamentals of Information Systems Data Management.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Foundations of Business Intelligence: Databases and Information Management.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Warehousing.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
MIS 451 Building Business Intelligence Systems Data Staging.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
Data Mining: Data Prepossessing What is to be done before we get to Data Mining?
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Saskatoon SAS user group
Intro to MIS – MGS351 Databases and Data Warehouses
A Smart Tool to Predict Salary Trends of H1-B Holders
Market Basket Analysis
Restaurant Revenue Prediction using Machine Learning Algorithms
Admission Prediction System
CS240A Final Project 2.
Chapter 13 Business Intelligence and Data Warehouses
Chapter 13 The Data Warehouse
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
CALIFORNIA STATE UNIVERSITY, SACRAMENTO
Data Warehouse and OLAP
University of Houston-Clear Lake Kaiser Permanente San Jose
Data Warehousing and Data Mining
An Introduction to Data Warehousing
C.U.SHAH COLLEGE OF ENG. & TECH.
A Restaurant Recommendation System Based on Range and Skyline Queries
Introduction of Week 9 Return assignment 5-2
Data Warehouse.
Data Warehousing Concepts
Big DATA.
Welcome! Knowledge Discovery and Data Mining
Data Warehouse and OLAP
Credit Card Fraudulent Transaction Detection
Presentation transcript:

Collage Score Card & Software defect prediction Prepared by: Meetkumar Patel Srivats Srinivasan GUIDANCE BY: Prof. Meiliu Lu

Agenda Data Warehousing Project Data Mining Project Background Introduction Technologies Explored Implementation Steps Future scope Data Mining Project Objective Algorithm Applied Demo Learning Experience References

Background Source website : www.data.gov , http://promise.site.uottawa.ca/SERepository/datasets-page.html Two datasets : Collage Scorecard Software Defect Prediction dataset Collage Scorecard dataset : Data from 2009-2013 17 attributes,37835 entries Software Defect Prediction dataset: 22 attributes,1100 entries

Introduction The primary objective of our project is to design data mart. We have used Star schema to generate it. This data mart answers questions related to US universities. The primary users of the Data Mart would be High School Students.

Technologies Explored Data Preprocessing Microsoft Excel Spreadsheet MySQL Server Data Mart MsSQL Server Java OLAP Operations SQL Server Queries

Implementation Steps Data Cleaning and Preprocessing Data Mart Querying Tool

Data Cleaning and Preprocessing Original data had 80,000 rows and 1700 columns, we trimmed data to 37835 rows and 17 related columns. Add missing values using SQL Script. Since 5 years data are there we added year column for segregation.

Data Mart Data mart is implemented on star schema base Data Mart provided following information to user University details on basis of below attributes University ID Programs Type of Degree SAT & AWT scores Region State

Highest Degree Degree_ID Degree_Name State State_ID State_Name Fact Table University_ID State_ID Degree_ID PDegree_ID Region_ID Program_ID Scores University University_ID University_name Zip Website Predominant Degree PDegree_ID PDegreee_Name Region Region_ID Region_Name Program Program_ID Program_Name Star Schema

Future Scope Privileged user can insert new records in future Integrate Google Maps for location and directions Develop Web based and Mobile based environment.

Objective Mining data to extract knowledge from available data. Analyze the behavior of different data mining tools. This project focus on the high-performance fault/error predictors based on data mining technique such as Random Forests and the algorithms based on a new computational intelligence approach.

Data Mining Tools Used Classification Algorithm Weka Rapid Miner J48 Random Tree Logistic

Data Mining We will use attributes like cyclomatic complexity, essential complexity, design complexity, total number of operators, total no. of operands, volume, program length, difficulty, intelligence , effort , line count etc. Mining these attributes to study how they affect the quality of software to be produced. The final result using these attributes is to predict if its a defect or not. {true, false}.

Data Mining Pre- processing data – The collected data were noisy, missing useful info and inconsistent. First step was the Data preparation processes that consist of checking the data distribution and outliers, dealing with empty or missing values, enriching data, and transforming data into analyzable formats should be employed to improve data quality and to thus enable effective data mining.

Data Mining Algorithm Implementation Firstly, the algorithm is implemented in WEKA to gain the “Root Mean square error” and then used the Rapid Miner to obtain the graphical output. The lesser the “Root Mean square error” the efficient the algorithm is with the particular data set.

Data Mining (WEKA) J48

Data Mining (WEKA) Naïve Bayesian

Data Mining (WEKA) Random Tree

Data Mining (Rapid Miner)

Data Mining

Learning Experience Analytical processing Learned different data mining tools like Weka, rapid Miner Learned about real time application for different data mining algorithms

DEMO

Conclusion Weka predicted the “Root Mean Square error” on basis of which few algorithms were shortlisted. But, Weka wasn’t able to show the graphical representation sound and clear. So, Rapid Miner came into consideration through which we were able to simplify the graphs and able predict the probability of defect with ease.

References http://www.sciencedirect.com/science/article/pii/S0020025508005173 http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4031804&tag=1 http://promise.site.uottawa.ca/SERepository/datasets-page.html http://recommender-systems.readthedocs.org/en/latest/datamining.html