File Systems for Cloud Computing Chittaranjan Hota, PhD Faculty Incharge, Information Processing Division Birla Institute of Technology & Science-Pilani, Hyderabad Campus Jawahar Nagar, Shameerpet, Ranga Reddy District, Hyderabad, AP, India 16 th March 2013 Computer Sc Dept., Utkal University, Vani Vihar, Bhubaneswar
Growth of the Internet Source: Cisco VNI Global Forecast, Source: Internet world stats
Golden era in Computing Powerful multi- core processors General purpose graphic processors Superior software methodologies Virtualization leveraging the powerful hardware Wider bandwidth for communication Proliferation of devices Explosion of domain applications Cloud Futures 2011, Redmond
Cloud computing: Is it a hype? from $41 billion in 2011 to $241 billion in 2020
Scaling up… SETI
What is Cloud Computing?
Files Permanent Storage Information sharing Files have data and attributes
What Distributed File System Provides Provide accesses to data stored at servers using file system interfaces What are the file system interfaces? o Open a file, check status on a file, close a file o Read data from a file o Write data to a file o Lock a file or part of a file o List files in a directory, delete a directory o Delete a file, rename a file, add a symbolic link to a file etc.
DFS Design Issues Mounting Caching Hints Bulk Data Transfer Replica management Writing policies
NFS architecture Client computerServer computer UNIX file system NFS client NFS server UNIX file system Application program Application program Virtual file system PC DOS UNIX kernel system calls RPC for (remote operations) UNIX Operations on local files Operations on remote files UNIX kernel Net work
Google File System Metadata: namespace, access control, mapping of files to chunks, and current location of chunks
HDFS Design Files stored as blocks o Default 64MB Reliability through replication o replicated across 3+ DataNodes Single NameNode coordinates access, metadata o Centralized management No data caching o Little benefit due to large data sets, streaming reads
Commodity Hardware
HDFS Architecture HDFS-Aware Application POSIX APIHDFS API Regular VFS with local and NFS-supported files Specific drivers Separate HDFS view Network stack HDFS NameNode HDFS DataNode
HDFS Architecture Namenode B replication Rack1 Rack2 Client Blocks Datanodes Client Write Read Metadata ops Metadata(Name, replicas, …) Block ops
HDFS File Read HDFS Client Client Node Distributed FileSystems FSData InputStream 1: open 3: read 6: close NameNode namenode 2: get block location DataNode datanode DataNode datanode DataNode datanode 4: read 5: read
Hadoop Clusters
Rack Awareness node r1r2 r1rack n2 d1 d2 Data center d=2 n1 d=0 n1 d=4 d=6
HDFS Write HDFS Client Client Node Distributed FileSystems FSData OutputStream 1: create 3: write 6: close NameNode namenode 2: create DataNode datanode DataNode datanode DataNode datanode 4: write packet5: ack packet 7: complete Pipeline
Data Center NODE RACK Replica Placement
Computational Grids [Source: IBM TJ Watson Research Center]
Load Distribution
Map/Reduce
SLURM
Crowd Sourcing
Foxtrot: Associating audio with locations
Allen Telescope Array Search for Extra Terrestrial Intelligence
Thank You!