Hadoop, Hive, JSON, and Data! Oh, my!! TJay Belt 1
Database Administrator at Imagine Learning me Read me Follow 2
Thanks to our Sponsors! Yearly Partners Gold Sponsors
Big Data ecosystem 30,000 feet view of our ecosystem Issues found along the way Overview 4
Json (JavaScript Object Notation) Lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate.
Json (JavaScript Object Notation) { "_id": " ", "Revision": 12, "ModelData": { "GradeLevel": "Kindergarten", "FirstLanguage": "English“ }, "SetTheStageData": { "LastSetTheStageLibraryWords": 1, "LastSetTheStageTakeATest": 0 }
Json (JavaScript Object Notation) "TestInstances": [{ "Product": "ILE", "Lesson": "30698aac-5a3d c-16de4ba9db70", "LessonBranch": "Main", "TestType": "PlacementTest", "TimeStarted": " T15:16: :00", "TimeCompleted": " T15:26: :00", "TestInstanceId": "1", "TestSectionInstances": [{ "TestSection": "Letter Recognition", "TestQuestionInstances": [{ "TestQuestion": "q43", "TimeStarted": " T15:17: :00", "TimeCompleted": " T15:17: :00", "TestOptionInstances": [{ "ClickCount": 1, "IsSelected": false, "ResponseLatency": 0, "TestOption": "opt256" }, { "ClickCount": 1, "IsSelected": false, "ResponseLatency": 0, "TestOption": "opt258" }, { "ClickCount": 1, "IsSelected": false, "ResponseLatency": 0, "TestOption": "opt257" }, { "ClickCount": 1, "IsSelected": true, "ResponseLatency": -8467, "TestOption": "opt253" }, { "ClickCount": 1, "IsSelected": false, "ResponseLatency": 0, "TestOption": "opt255" }, { "ClickCount": 1, "IsSelected": false, "ResponseLatency": 0, "TestOption": "opt254" }] },
Blob Storage Reliable, cost-effective cloud storage for large amounts of unstructured data Microsoft Azure Cloud
MongoDB MongoDB (from humongous) is a cross-platform document-oriented database. Classified as a NoSQL database that eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas Making the integration of data in certain types of applications easier and faster.
Hadoop is a Java-based programming framework that supports the processing of large data sets in a distributed computing environment.
MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster.
HIVE Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. It supports queries expressed in a language called HiveQL, which automatically translates SQL-like queries into MapReduce jobs executed on Hadoop.
What do we have? 13
Things we tried SQL Server Json procs SlamData PowerQuery DocumentDB MongoDirector SQL Azure
Issues I encountered 16
17
Issues I encountered 18
Issues I encountered 19
Thank You! TJay Belt Cell(801) Bloghttp://tjaybelt.blogspot.comhttp://tjaybelt.blogspot.com Linked Inwww.linkedin.com/in/tjaybeltwww.linkedin.com/in/tjaybelt Skypetjaybelt Google+linklink
Thanks to our Sponsors! Yearly Partners Gold Sponsors