Presentation is loading. Please wait.

Presentation is loading. Please wait.

Other formats for data Linked lists, Hash tables, JSON, Big Data, Hadoop & MapReduce. REST. Parallel processing exercise Homework: Plans for group sorting.

Similar presentations


Presentation on theme: "Other formats for data Linked lists, Hash tables, JSON, Big Data, Hadoop & MapReduce. REST. Parallel processing exercise Homework: Plans for group sorting."— Presentation transcript:

1 Other formats for data Linked lists, Hash tables, JSON, Big Data, Hadoop & MapReduce. REST. Parallel processing exercise Homework: Plans for group sorting. Prepare for RSA talk. Postings

2 Linked list Big array for data Array of arrays: think of rows Each row has information + one or more pointers to other rows. Various ways: –Forward pointing list: next item –Forward and back: next and previous item –Tree: first child item and next sibling or first child, next sibling, parent or first child, next sibling, parent1, parent2

3 Family example: name, a parent, 1 st child, next sibling Esther17 Anne062 Jeanine03 Daniel254 Aviva2 Annika3

4 Exercise Make your family tree each row has a name, parent1, (optionally include second parent), first child, next sibling you need to start somewhere Put down Not defined for things not in the table. Put down -1 for cases of no children, no next sibling

5 Hash tables Problem: how to find elements in a table? –no intrinsic order. If there was, you could use binary search. –Binary search: Compare value (or the key) to the middle value, if less than, search the lower half, if greater than, search the upper half, keep going… –Aside: Meyer family geography game

6 Hash table approach Have key-value pairs. Have task of finding if current key is in the table. –Assume there is a hash function that inputs the key and outputs the hash which corresponds to a slot in the table. fixed time to compute the function go to that spot. If empty, then store key-value there. If not empty, compare the keys, if it matches, then …. If not, check the next position, continue. –http://en.wikibooks.org/wiki/Data_Structures/Hash_Ta bleshttp://en.wikibooks.org/wiki/Data_Structures/Hash_Ta bles

7 Associative array Normal arrays use indices, typically starting with 0. An associative array uses values. Consider a set of 4 products: table, desk, chair, lamp. An associative array could be used to store the prices: table=>100, desk=>150, chair=>50, lamp=>20

8 key-value pairs so called key-value pairs is generalization of associative array and used in other systems. At its most general, there can be more than one key-value for a given key and the basic software OR your program needs to take care of this situation.

9 JSON http://www.json.org/ Format (syntax) for information –smaller than XML –available in many language name / value pairs –create using brackets. Use dot notation to access and modify arrays –create using square brackets. Square brackets with indices to access and modify.

10 Example var course = {"name":"Topics", "teacher": "Jeanine Meyer", "days": "MR"}; course.name =>"Topics" course.teacher => "Jeanine Meyer" course.days => "MR"

11 Example var list = { "class_list": [ {"firstname":"Groucho", "lastname": "Marx"}, {"firstname":"Harpo", "lastname": "Marx"}, {"firstname":"Zeppo", "lastname": "Marx"}, {"firstname":"Curly", "lastname": "Stooge"} ] } ; list[2].firstname => "Zeppo"

12 Big Data buzz word more than specific product Data that is –large in Volume –changes rapidly [or application requires up-to-date values] Velocity –different formats Variable PLUS not necessarily all owned by the organization attempting to use it. –in this case, can only query, no changes/updates, deletions or additions

13 Note A company / organization can store data in its own CLOUD (on servers) or cloud service offered by a vendor and still have total control. –Could even be relational database –Very large data bases, may be just key-value pairs

14 Cloud … can refer to one, some or all of the following where the programs are where the data is where the processors (aka computers) are for doing the calculations

15 REST Representational State Transfer –a "standard" / framework / style of communicating with Web services –typically, get information in the form of XML or JSON or something else Posting opportunity: find a specific service that provides REST connections….

16 Parallel processing / distributed processing Large amounts (volumes) of data Multiple number of processors How to speed up accomplishment of tasks? –Embarrassingly parallel refers to tasks that is easy to parallelize Take a list of numbers (say, prices) and increase each by 10% ?

17 What about Tasks in which some parts can be done in parallel, but some cannot How to devise ways to take advantage of multiple processors

18 Parallel exercise Divide into groups of 5 Each take a deck of cards Shuffle Devise plan to sort into order –suits hearts, spades, diamonds, clubs, –each suit A, 2, …. J, Q, K

19 Hadoop open source utilities for distributed computing http://hadoop.apache.org/ Includes MapReduce

20 MapReduce A MapReduce job map sets up tasks to be done in parallel reduce combines the results –may be local combine step and then a reduce across all output steps Requires a file system Data is in key/value pairs

21 Applications What are applications that using multiple processors for a [big] gain in speed?

22 Homework Come up with improved parallel sorting Postings: more on Hadoop, MapReduce, Big Data, etc.


Download ppt "Other formats for data Linked lists, Hash tables, JSON, Big Data, Hadoop & MapReduce. REST. Parallel processing exercise Homework: Plans for group sorting."

Similar presentations


Ads by Google