The Memory B. Ramamurthy C B. Ramamurthy
Topics for discussion On chip memory On board memory System memory Off system/online storage/ secondary memory File system abstraction Offline/ tertiary memory RAID: Redundant Array of Inexpensive Disks NAS: Network Accessible Storage SAN: Storage area networks DB and DBMS: Data base and DB management systems Distributed file system Google file system Hadoop file system C B. Ramamurthy
Data and Computation Continuum Compute intensive Ex: computation of digits of PI Data intensive Ex: analyzing web logs C B. Ramamurthy
On chip memory Registers Cache Buffers (instruction pipeline) Characteristics: volatile C B. Ramamurthy
On board memory Cache Instructions cache Data cache Translation look aside buffers (TLB) Characteristics: content addressable, set-associative organization C B. Ramamurthy
System memory Erasable/writable non-volatile memory RAM : Random access memory: main memory Read and write possible volatile ROM: Read only memory: boot programs for operating systems Flash memory: Erasable/writable non-volatile memory SDRAM: synch dynamic RAM others EAROM C B. Ramamurthy
Off-system storage (Earlier Lectures covered these) Off system/online storage/ secondary memory File system abstraction Offline/ tertiary memory RAID: Redundant Array of Inexpensive Disks NAS: Network Accessible Storage SAN: Storage area networks C B. Ramamurthy
Database and Database Management System Data source Transactional Data base server Relational db or similar foundation Tables, rows, result set, SQL ODBC: open data base connectivity Very successful business model: Oracle, DB2, MySQL, and others Persistence models: EJB, DAO, ADO (I am not going to expand the abbreviation.. ) C B. Ramamurthy
Distributed file system(DFS) A dedicated server manages the files for an compute environment For example, nickelback,cse.buffalo.edu is your file server and that is why we did not want you to run your user applications on this machine. DFS addresses various transparencies: location transparency, sharing, performance etc. Examples: NFS, NFS+, AFS (Andrew FS)… (you will study these in Distributed Systems course) C B. Ramamurthy
Issues with ultra-scale data How to store the large amount of data? On commodity hardware or special hardware Large storage implies large number of devices to store them. How to address shortening MTTF (Mean time to failure)? How to realize “fault tolerance”? Redundancy/replication is a solution How to manage the replication and the health of the large number of devices? More importantly how to partition the large scale data to store in these storage devices (nodes)? How to parallelize processing of the data stored at multiple “nodes”? C B. Ramamurthy
On to Google File Internet introduced a new challenge in the form web logs, web crawler’s data: large scale “peta scale” But observe that this type of data has an uniquely different characteristic than your transactional or the “order” data on amazon.com: “write once” ; so is HIPPA protected healthcare and patient information; Google exploited this characteristics in its Google file system: S. Ghemavat C B. Ramamurthy
Hadoop File System (HFS) Hadoop file system is a reverse engineered version of the GFS : this is my first opinion on HFS HFS is a distributed file system for large scale data Data throughput is more important than latency Batch computing than interactive time shared computing C B. Ramamurthy
MapReduce Cat combine reduce part0 map split Bat part1 Dog Other Words (size: TByte) map split combine reduce part0 part1 part2
Exercise: Count the number of occurrences of the word in the text This is a cat. Cat sits on a roof. The roof is a tin roof. There is a tin can on the roof. Cat kicks the can. It rolls on the roof and falls on the next roof. The cat rolls too. It sits on the can. C B. Ramamurthy