MongoDB and Python Analysis Robert, Kevin, Simi
Dataset Selection For this assignment we chose the Young People Survey from kaggle.com Data includes a large amount of 1-5 responses to various questions and answers to questions regarding demographics. https://www.kaggle.com/miroslavsabo/young-people-survey
Loading into Mongo $ mongoimport -d Assingment3 -c Youngpeepss --type csv --file /Users/robertbrasso/Python/responses.csv –headerline Mongo – enter shell Show dbs – list databases Use Assignment3 – use Assignment3 database Show collections – list collections Db.Youngpeeps.find() – show documents in collection
Connect Mongo to Python import pymongo from pymongo import MongoClient client = MongoClient() #connects to local host over port 27017 (mongo default) db = client.Assignment3 collection = db.Youngpeepss
Mongo Queries in Python #Average weight,age, and height of males and females and unreported demographicsbygenderpipeline = [ {"$group": {"_id": "$Gender", "weightaverage" : {"$avg": "$Weight"}, "ageaverage" :{"$avg": "$Age"}, "heightaverage”:{"$avg":"$Height"}}}] print(list(db.Youngpeepss.aggregate(demographicsbygenderpipeline))) OUTPUT: [{'_id': '', 'weightaverage': 64.2, 'ageaverage': 22.2, 'heightaverage': 172.0}, {'_id': 'male', 'weightaverage': 77.08888888888889, 'ageaverage': 20.87286063569682, 'heightaverage': 181.75802469135803}, {'_id': 'female', 'weightaverage': 58.963793103448275, 'ageaverage': 20.113752122241088, 'heightaverage': 167.77068965517242}]
Queries cont… #Average age of all survey responders agepipeline = [ {"$group": {"_id": "null", "avgage":{"$avg":"$Age"}}}] print(list(db.Youngpeepss.aggregate(agepipeline))) OUTPUT: [{'_id': 'null', 'avgage': 20.43369890329013}]
Phobia Survey PHOBIAS Flying: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Thunder, lightning: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Darkness: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Heights: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Spiders: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Snakes: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Rats, mice: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Ageing: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Dangerous dogs: Not afraid at all 1-2-3-4-5 Very afraid of (integer) Public speaking: Not afraid at all 1-2-3-4-5 Very afraid of (integer)
Phobia Query #Average response for Phobia survey questions by gender phobiaavgpipeline = [ {"$group": {"_id": "$Gender", "Ageing" :{"$avg": "$Ageing"}, "Dangerous Dogs" :{"$avg": "$Dangerous dogs"}, "Darkness" :{"$avg": "$Darkness"}, "Flying" :{"$avg": "$Flying"}, "Heights" :{"$avg": "$Heights"}, "Public Speaking" :{"$avg": "$Fear of public speaking"}, "Rats" :{"$avg": "$Rats"}, "Snakes" :{"$avg": "$Snakes"}, "Spiders" :{"$avg": "$Spiders"}, "Storms" :{"$avg": "$Storm"} }}] print(list(db.Youngpeepss.aggregate(phobiaavgpipeline)))
Phobia results OUTPUT: [{'_id': '', 'Ageing': 3.3333333333333335, 'Dangerous Dogs': 3.8333333333333335, 'Darkness': 2.8333333333333335, 'Flying': 3.6666666666666665, 'Heights': 3.3333333333333335, 'Public Speaking': 3.5, 'Rats': 2.0, 'Snakes': 2.8333333333333335, 'Spiders': 3.0, 'Storms': 2.3333333333333335}, {'_id': 'male', 'Ageing': 2.3041362530413627, 'Dangerous Dogs': 2.7031630170316303, 'Darkness': 1.7609756097560976, 'Flying': 1.882640586797066, 'Heights': 2.5990220048899757, 'Public Speaking': 2.604878048780488, 'Rats': 1.9633251833740832, 'Snakes': 2.6155717761557176, 'Spiders': 2.2053789731051343, 'Storms': 1.5571776155717763}, {'_id': 'female', 'Ageing': 2.7652027027027026, 'Dangerous Dogs': 3.27027027027027, 'Darkness': 2.5844594594594597, 'Flying': 2.168918918918919, 'Heights': 2.6199324324324325, 'Public Speaking': 2.9342327150084317, 'Rats': 2.7212837837837838, 'Snakes': 3.315345699831366, 'Spiders': 3.2542372881355934, 'Storms': 2.258445945945946}]
Visualizations – Stacked Bar Chart
Visualizations – Scatter Plot
Visualization – Normalized Pivot Bar Chart
Visualizations – Linear Regression r-squared: 0.570501715392 p value: 2.87661838931e-88
Visualizations – Linear Regression r-squared: 0.241811258989 p value: 6.63363215029e-15
Visualizations – Linear Regression r-squared: 0.503959458065 p value: 3.66536245603e-66