Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.

Similar presentations


Presentation on theme: "Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20."— Presentation transcript:

1 Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20 Sep 2011

2 The amount of Biological data is exploding The raw data for a human genome corresponds to 100’s of Gbytes. Cost of human genome has fallen from $100 M to ~ $3000 (July 2011) Main bottleneck is now reconstructing genome from the data generated. Much of the original Dogma is now seen as a simplification RNA now seen to play a fundamental role miRNA How DNA is stored is also crucial

3 Greater exploration 1000’s of genomes being scanned Different species Cancer genomics Methylome RNA-seq Have not got time to talk about metabolomics/proteomics...

4 Caveat to sequence data There are many different companies building equipment to perform sequencing. They all have their own biases and sources of systematic error. The data generated is discrete in nature which tends to make people think it’s accurate. It could be as susceptible to systematic biases as microarrays are. Interpretation and analysis is the real bottleneck.

5 The era of Big Data Biological data - Petabytes now, expected to be Exabyte (millions of Terabyte) by 2020. High Energy Physics - Large Hadron Collider producing Pbytes of data per year Square Kilometre Array (full operation 2024) - one Exabyte a day Haven’t even mentioned Google or Bing yet....

6 Problems - Solutions - Cloud Computing ? Data sets this size cannot be moved about on the Internet. Data must be analysed, not just retrieved. Many people want access to this data, many of whom are not computational scientists may not have financial resources to buy powerful computers may want access to best software, best practices etc. data to be updated in a timely fashion

7 Solutions - Cloud Computing ? Cloud computing may be the solution. Data centre for cloud co-located with data generation. Processing as well as data retrieval done at data generation centre.

8 Cloud Computing Definition - “If it looks like a duck” Features of cloud computing are Computing is mostly done at a data centre provided by a vendor Client-side computing is minimal Servers at data centre make heavy use of virtualisation (as oppose to Grids) Client can select number of instances of VM and data usage Client pays on a per-use basis - “Somebody’s Credit Card is being used” The computing is treated as a utility rather than a resource.

9 Cloud providers Amazon Web Services (AWS) Provide Linux or Windows VM You get a command line. Microsoft - Azure More complicated method of submission Open Source - Eucalyptus (stability ?) Other providers out there...

10 Advantages Data centre can be where the data is generated and accessed everywhere (in theory). Data could be kept up to date. Analysis tools could be kept up to date Services can be developed which go significantly beyond a simple command line interface (Azure works along these lines). Scalability - if you want 1 or 100 VM’s you can get it.

11 Disadvantages At present vendors do not provide tailored environment for Scientific client. VM (regardless of OS) is effectively blank canvas and hence have to upload all the right binaries, libraries and data that you need. Data may not be in the correct configuration - storage/compute tradeoff. Like any utility have to watch use carefully ! Vendor lock in. Security. Legal issues - licensing, nationality of vendor and data centre.

12 Show me the money Commercial clouds charge on a per use basis. Disk space CPU time Amazon and Microsoft charge via time VM is deployed Google tries to charge per CPU cycle. Move from once-off payment model to rolling costs.

13 Big Data - a new discipline ? Big Data Machine Learning / Pattern Recognition Hardware Quality Control Finance / Accounting

14 Conclusions Microarray data gives us a first insight into the dynamic cell. Sequence and other omic data sets are expanding into Petabytes. Big Data is upon us. Cloud computing is not a panacea. Cloud computing may democratise access to Big Data.


Download ppt "Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20."

Similar presentations


Ads by Google