Presentation is loading. Please wait.

Presentation is loading. Please wait.

Harnessing Kansas City Open Data to Improve the Lives of Citizens SAMAA GAZZAZ – SUPERVISED BY DR. PRAVEEN RAO DEPT. COMPUTER SCIENCE ELECTRICAL ENGINEERING.

Similar presentations


Presentation on theme: "Harnessing Kansas City Open Data to Improve the Lives of Citizens SAMAA GAZZAZ – SUPERVISED BY DR. PRAVEEN RAO DEPT. COMPUTER SCIENCE ELECTRICAL ENGINEERING."— Presentation transcript:

1 Harnessing Kansas City Open Data to Improve the Lives of Citizens SAMAA GAZZAZ – SUPERVISED BY DR. PRAVEEN RAO DEPT. COMPUTER SCIENCE ELECTRICAL ENGINEERING – SCHOOL OF COMPUTATION AND ENGINEERING UNIVERSITY OF MISSOURI-KANSAS CITY

2 Introduction SUROP

3  Big data is a general term describing datasets that are too large for existing data processing applications (e.g. real-life Kansas City crime data).  In order to extract useful information from datasets we need a strong analyzing application that can handle large datasets as well as provide probabilistic implications of the data (e.g. BayesDB).

4 Objective  We used an application developed by MIT researchers called “BayesDB” to understand Kansas City’s dataset. This was done after we tested the scalability of BayesDB. BayesDB uses built-in Bayesian Query Language (BQL) to analyze and harness probabilistic information off of the dataset.  Research Question: Does there exist a relationship between different factors present in a KC crime; what are they? And is BayesDB a convenient application for analyzing big datasets?

5 Materials BayesDB (probabilistic analysis application) KCPD_CrimeData2014.CSV

6 Approach  In order to test the scalability of BayesDB, we will calculate the time it needs to process and analyze different sizes of datasets.  Next, we will explore the relationships between the crime factors using the statistical and probabilistic commands provided by BayesDB, which will help us expand our understanding of raw data.

7 Results 6814717507

8 Methods  BayesDB features for harnessing Kansas City crime data: SUMMARIZE Provide a comprehensive understanding of the dataset. ANALYZE Learn the dataset and the dependencies between its attributes. INFER Fill in missing data based on the learnt model with a specific certainty. ESTIMATE Provide the dependencies within the table and the strength of such dependencies. PLOT Draw a simple bar chart showing the dependencies between columns SIMULATE Predict future data entries with respect to a certain value.

9 Results  ESTIMATE PAIRWISE DEPENDENCE PROBABILITY FROM KCcrime;

10 Results  From the resulting dependency graph, we can find these relations:

11 Results  SIMULATE location_1,beat FROM KCcrime_demo GIVEN beat=134,involvement=ARR TIMES 10; +------------------------------------------+------+ | location_1 | beat | +------------------------------------------+------+ | (39.06872767300007, -94.51423880099998) | 134 | | (39.01528852400003, -94.56239074799998) | 134 | | (39.046353786000054, -94.60664335999996) | 134 | | (39.24470375300007, -94.46695072599994) | 134 |  BayesDB is also useful in identifying independent factors, which helps save time.

12 Conclusion


Download ppt "Harnessing Kansas City Open Data to Improve the Lives of Citizens SAMAA GAZZAZ – SUPERVISED BY DR. PRAVEEN RAO DEPT. COMPUTER SCIENCE ELECTRICAL ENGINEERING."

Similar presentations


Ads by Google