Big Data Gulriz Kurban
Four Vs of Big Data Volume Variety Velocity Veracity
Volume IRS: (Economic Mobility) During fiscal year 2017, the IRS processed more than 245 million tax returns and other forms and issued more than 121 million individual income tax refunds.
Variety Structured Data: Databases Unstructured Data: Text, Images, Audio and Videos
Structured Data: Databases Sale Data: Netflix (118M subscribers), Amazon (Recommender systems) Electronic Health Data: Demographic information, diagnosis codes. Genotype data: SNPs
Unstructured Data: Text (NLP) Social Networks: Twitter, Facebook (Topic modeling) News feeds Health Data: Doctor’s notes Scientific Publications (drug toxicity) Books (digital humanities)
Unstructured Data: Images Instagram images Companies: Shutterstock Satellite Images
Kevin Matzen, Kavita Bala, and Noah Snavely at Cornell University in Ithaca, New York
Unstructured Data: Audio and Videos Enhanced organization and search using the video content: process videos frame-by-frame analyze gestures transcribe spoken language use facial recognition (YouTube, IBM Cloud Video, – personalized sports highlights, Alexa)
Veracity: Biases and noise in data Bias: Data mining social networks for election and referendum predictions Noise: Electronic health records data entry
Big Data and Poverty
Big Data and Poverty Low-income communities are among the most surveilled communities in America: public-benefits programs child-welfare systems monitoring programs for domestic-abuse offenders