Download presentation
Presentation is loading. Please wait.
Published byJosephine Clark Modified over 9 years ago
1
Harnessing Kansas City Open Data to Improve the Lives of Citizens SAMAA GAZZAZ – SUPERVISED BY DR. PRAVEEN RAO DEPT. COMPUTER SCIENCE ELECTRICAL ENGINEERING – SCHOOL OF COMPUTATION AND ENGINEERING UNIVERSITY OF MISSOURI-KANSAS CITY
2
Introduction SUROP
3
Big data is a general term describing datasets that are too large for existing data processing applications (e.g. real-life Kansas City crime data). In order to extract useful information from datasets we need a strong analyzing application that can handle large datasets as well as provide probabilistic implications of the data (e.g. BayesDB).
4
Objective We used an application developed by MIT researchers called “BayesDB” to understand Kansas City’s dataset. This was done after we tested the scalability of BayesDB. BayesDB uses built-in Bayesian Query Language (BQL) to analyze and harness probabilistic information off of the dataset. Research Question: Does there exist a relationship between different factors present in a KC crime; what are they? And is BayesDB a convenient application for analyzing big datasets?
5
Materials BayesDB (probabilistic analysis application) KCPD_CrimeData2014.CSV
6
Approach In order to test the scalability of BayesDB, we will calculate the time it needs to process and analyze different sizes of datasets. Next, we will explore the relationships between the crime factors using the statistical and probabilistic commands provided by BayesDB, which will help us expand our understanding of raw data.
7
Results 6814717507
8
Methods BayesDB features for harnessing Kansas City crime data: SUMMARIZE Provide a comprehensive understanding of the dataset. ANALYZE Learn the dataset and the dependencies between its attributes. INFER Fill in missing data based on the learnt model with a specific certainty. ESTIMATE Provide the dependencies within the table and the strength of such dependencies. PLOT Draw a simple bar chart showing the dependencies between columns SIMULATE Predict future data entries with respect to a certain value.
9
Results ESTIMATE PAIRWISE DEPENDENCE PROBABILITY FROM KCcrime;
10
Results From the resulting dependency graph, we can find these relations:
11
Results SIMULATE location_1,beat FROM KCcrime_demo GIVEN beat=134,involvement=ARR TIMES 10; +------------------------------------------+------+ | location_1 | beat | +------------------------------------------+------+ | (39.06872767300007, -94.51423880099998) | 134 | | (39.01528852400003, -94.56239074799998) | 134 | | (39.046353786000054, -94.60664335999996) | 134 | | (39.24470375300007, -94.46695072599994) | 134 | BayesDB is also useful in identifying independent factors, which helps save time.
12
Conclusion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.