MongoDB and Python Analysis

Slides:



Advertisements
Similar presentations
Descriptive Statistics. Descriptive Statistics: Summarizing your data and getting an overview of the dataset  Why do you want to start with Descriptive.
Advertisements

Dating Structures Rebecca Camus Bridget Cerne Matt Liszewski Chris Malone.
Project #3 by Daiva Kuncaite Problem 31 (p. 190)
Total Population of Age (Years) of People that Smoke
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
INFORMATION TECHNOLOGY IN BUSINESS AND SOCIETY SESSION 19 – GETTING DATA AND VISUALIZING IT SEAN J. TAYLOR.
Warm-up 3.2 Getting a line on the pattern Suppose you were to collect data for each pair of variables. You want to make a scatterplot. Which variable would.
MAT 150 Algebra – Class #6 Topics: Find exact Linear Models for data
Analyzing Surveys. The Goal Once research is collected Analyze to find patterns Analyze to find connections (Correlations) Find the cause of these correlations.
Access: Queries Ad-hoc Reporting Chapter T. Access Queries Queries Access Properties Sorting Selection Criteria Calculations.
Basic data analysis. The basic analysis of SPSS that will be introduced in this class  Frequencies  This analysis produces frequency tables showing.
CHAPTER 1: FUNCTIONS, GRAPHS, AND MODELS; LINEAR FUNCTIONS Section 1.6: Fitting Lines to Data Points: Modeling Linear Functions 1.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
CADA Final Review Assessment –Continuous assessment (10%) –Mini-project (20%) –Mid-test (20%) –Final Examination (50%) 40% from Part 1 & 2 60% from Part.
Total Population of Age (Years) of People. Pie Chart of Males and Females that Smoke Systematic Gender Sample Total Population: 32.
Fears & Phobias Resource from youthworkresoure.com.
“ People ’ s number one fear is public speaking. Number two is death. Now that means to the average person that if you have to go to a funeral, you ’
Numbers in Science. Why Do We Collect Data? We collect data to analyze test results, calculate averages, compare our data with other sets of data, and.
ACIS Introduction to Data Analytics & Business Intelligence Ad-hoc Reporting Query Basics.
Model Regress Linear 3Factor Excel 2013 V0F 1 by Milo Schield Member: International Statistical Institute US Rep: International Statistical Literacy Project.
03 | Express and Databases
Autumn School Dynamic MSM16-18 November 2015 | L-Esch-sur-Alzette Slide 1 Aggregates, output and other basic tools.
Steps Continuous Categorical Histogram Scatter Boxplot Child’s Height Linear Regression Dad’s Height Gender Continuous Y X1, X2 X3 Type Variable Mom’s.
IENG-385 Statistical Methods for Engineers SPSS (Statistical package for social science) LAB # 1 (An Introduction to SPSS)
Introduction to SPSS Review of Concepts (stats and scales) Data entry (the workspace and labels) – By hand – Import Excel Running an analysis-
1.1 Displaying Data Visually Learning goal:Classify data by type Create appropriate graphs.
Center of Statistical Analysis
SPSS: Using statistical software — a primer
Anticipating Patterns Statistical Inference
Activity 5 minutes Why would it make sense to fear the following objects or situations? The dark Heights Flying Closed spaces Rats.
Young adults by gender and chance of getting rich
Lecture on Fears
A little VOCAB.
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
Looking at data Visualization tools.
The Skinny on High School
Correlation – Regression
Exploring Computer Science Lesson 5-7
BELLWORK.
MongoDB Connection in Husky
DEPARTMENT OF COMPUTER SCIENCE
10 of the Most Common Phobias
By: Allison Wilson and Victoria Fernandez Period: 2nd
Chapter 2 Describing Data: Graphs and Tables
Using SPSS for Simple Regression
Creating graphs in Excel
BELLWORK.
Welcome to MongoDC: MD Edition
Population Structures
Inference for Regression
Chapter 2 Looking at Data— Relationships
No Fear Here! Video 1. No Fear Here! Video 1 An extreme or irrational fear or aversion to something. Phobia An extreme or irrational fear or aversion.
Communication Anxiety
Engagement Survey Results: Demographics
1.1 Analyzing Categorical Data.
CSE 482 Lecture 5: NoSQL.
$ $
Section 1.1 Analyzing Categorical Data
CS122 Using Relational Databases and SQL
CS5220 Advanced Topics in Web Programming Introduction to MongoDB
APPLIED DATA ANALYSIS/ANALYTICS using STATA M.A.Isiaka FCMA, ACA, CIIA, ANIMN, PhD Department of Economics, Accounting & Finance College of Management.
Sadalage & Fowler (Amazon)
Warmup A teacher is compiling information about his students. He asks for name, age, student ID, GPA and whether they ride the bus to school. For.
Displaying Data – Charts & Graphs
interpreting and analysing data
Exercise 1: Entering data into SPSS
Homework: PG. 204 #30, 31 pg. 212 #35,36 30.) a. Reading scores are predicted to increase by for each one-point increase in IQ. For x=90: 45.98;
Data Validation practice in Statistics Lithuania
Presentation transcript:

MongoDB and Python Analysis Robert, Kevin, Simi

Dataset Selection For this assignment we chose the Young People Survey from kaggle.com Data includes a large amount of 1-5 responses to various questions and answers to questions regarding demographics. https://www.kaggle.com/miroslavsabo/young-people-survey

Loading into Mongo $ mongoimport -d Assingment3 -c Youngpeepss --type csv --file /Users/robertbrasso/Python/responses.csv –headerline Mongo – enter shell Show dbs – list databases Use Assignment3 – use Assignment3 database Show collections – list collections Db.Youngpeeps.find() – show documents in collection

Connect Mongo to Python import pymongo from pymongo import MongoClient client = MongoClient() #connects to local host over port 27017 (mongo default) db = client.Assignment3 collection = db.Youngpeepss

Mongo Queries in Python #Average weight,age, and height of males and females and unreported demographicsbygenderpipeline = [ {"$group": {"_id": "$Gender", "weightaverage" : {"$avg": "$Weight"}, "ageaverage" :{"$avg": "$Age"}, "heightaverage”:{"$avg":"$Height"}}}] print(list(db.Youngpeepss.aggregate(demographicsbygenderpipeline))) OUTPUT: [{'_id': '', 'weightaverage': 64.2, 'ageaverage': 22.2, 'heightaverage': 172.0}, {'_id': 'male', 'weightaverage': 77.08888888888889, 'ageaverage': 20.87286063569682, 'heightaverage': 181.75802469135803}, {'_id': 'female', 'weightaverage': 58.963793103448275, 'ageaverage': 20.113752122241088, 'heightaverage': 167.77068965517242}]

Queries cont… #Average age of all survey responders agepipeline = [ {"$group": {"_id": "null", "avgage":{"$avg":"$Age"}}}] print(list(db.Youngpeepss.aggregate(agepipeline))) OUTPUT: [{'_id': 'null', 'avgage': 20.43369890329013}]

Phobia Survey PHOBIAS Flying: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Thunder, lightning: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Darkness: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Heights: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Spiders: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Snakes: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Rats, mice: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Ageing: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Dangerous dogs: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Public speaking: Not afraid at all 1-2-3-4-5 Very afraid of (integer)

Phobia Query #Average response for Phobia survey questions by gender phobiaavgpipeline = [ {"$group": {"_id": "$Gender", "Ageing" :{"$avg": "$Ageing"}, "Dangerous Dogs" :{"$avg": "$Dangerous dogs"}, "Darkness" :{"$avg": "$Darkness"}, "Flying" :{"$avg": "$Flying"}, "Heights" :{"$avg": "$Heights"}, "Public Speaking" :{"$avg": "$Fear of public speaking"}, "Rats" :{"$avg": "$Rats"}, "Snakes" :{"$avg": "$Snakes"}, "Spiders" :{"$avg": "$Spiders"}, "Storms" :{"$avg": "$Storm"} }}] print(list(db.Youngpeepss.aggregate(phobiaavgpipeline)))

Phobia results OUTPUT: [{'_id': '', 'Ageing': 3.3333333333333335, 'Dangerous Dogs': 3.8333333333333335, 'Darkness': 2.8333333333333335, 'Flying': 3.6666666666666665, 'Heights': 3.3333333333333335, 'Public Speaking': 3.5, 'Rats': 2.0, 'Snakes': 2.8333333333333335, 'Spiders': 3.0, 'Storms': 2.3333333333333335}, {'_id': 'male', 'Ageing': 2.3041362530413627, 'Dangerous Dogs': 2.7031630170316303, 'Darkness': 1.7609756097560976, 'Flying': 1.882640586797066, 'Heights': 2.5990220048899757, 'Public Speaking': 2.604878048780488, 'Rats': 1.9633251833740832, 'Snakes': 2.6155717761557176, 'Spiders': 2.2053789731051343, 'Storms': 1.5571776155717763}, {'_id': 'female', 'Ageing': 2.7652027027027026, 'Dangerous Dogs': 3.27027027027027, 'Darkness': 2.5844594594594597, 'Flying': 2.168918918918919, 'Heights': 2.6199324324324325, 'Public Speaking': 2.9342327150084317, 'Rats': 2.7212837837837838, 'Snakes': 3.315345699831366, 'Spiders': 3.2542372881355934, 'Storms': 2.258445945945946}]

Visualizations – Stacked Bar Chart

Visualizations – Scatter Plot

Visualization – Normalized Pivot Bar Chart

Visualizations – Linear Regression r-squared: 0.570501715392 p value: 2.87661838931e-88

Visualizations – Linear Regression r-squared: 0.241811258989 p value: 6.63363215029e-15

Visualizations – Linear Regression r-squared: 0.503959458065 p value: 3.66536245603e-66