Presentation is loading. Please wait.

Presentation is loading. Please wait.

GridQTL High Performance QTL analysis via the Grid/Cloud.

Similar presentations


Presentation on theme: "GridQTL High Performance QTL analysis via the Grid/Cloud."— Presentation transcript:

1 GridQTL High Performance QTL analysis via the Grid/Cloud

2 GridQTL BBSRC funded 5 Years initially, then 3 years (£1.5M, then £750K) Part of "Integrative Biology" vision - "Allow prediction from gene sequence to consequence” Institute of Evolutionary Biology (IEB), Edinburgh University Roslin Institute, Edinburgh National e-Science Centre, Edinburgh EPCC, Edinburgh Information Services, Edinburgh University

3 QTLs Quantitative Trait Loci Positions along a chromosome that have an influence on a continuously varying physical trait. Traits (Phenotypes) Weight, Height, Eye Colour, Hypertension, Cancer... Influenced by many loci and environmental factors - "Multifactorial". NOT looking for single position effects 70% Cystic Fibrosis cases. Huntington's Disease.

4 Genomic Data Look at structure of chromosome pair. Discover positions that differ from norm. Locate alleles SNPs. Deletions. Insertions.

5 Phenotypic Data Keep record of the trait for each sample. Roslin Institute uses Pigs. Easy to create pedigrees. Similar genome to humans. Many studies in short time.

6 Statistical Process Genetic information mixed during reproduction (Meiosis). Positions close on chromosome tend to be crossed together. A statistical process that needs mathematical modelling.

7 QTLs - Calculation Genomic Data Known markers on chromosomes or other regions. determine alleles (variants) of these markers. Phenotypic Data variation of trait data over pedigrees recorded. Pedigree Data Build up pedigrees to model inheritance of chosen markers and their variants. which pedigree can best identify QTLs?

8 QTLExpress 2001 Web tool using Java servlets evolved from Fortran applications Simple statistical models employed. Data sets of size KBytes Running time minutes on 2GHz Pentium

9 GridQTL Ramp up in Data Size and Processing Time Data sets MBytes Processing times hours/days on 2GHz Intel Pentium More users expected More advanced models. Variance, principal, independent components analyses. Bayesian statistics. Random Walk Monte Carlo (MCMC). So more computing resources Clusters - UK National Grid Service, ECDF HPC - investigate parallelism and optimisation of algorithms. Hector

10 Complex QTL models Need more complex models that need more data so that: Effect of QTL interactions can be modelled. Epistasis - how genes interact Effect of QTL on more than one trait. Pleiotropy Managing data from DNA chips (many markers and traits at once. eQTL Fine mapping of QTL loci. Linkage Disequilibrium (LD) Variance Component Analysis (VCA)

11 GridQTL Local machine– tomcat web server Portal Technologies – GridSphere Grid – NGS and ECDF Grid middleware (globus) Now qsub Digital Certificates - authentication Now ssh key pair

12 EPCC Sub contract programming work General system programming Queuing system for local and grid jobs Portal work Memory and parallel issues Cloud work

13 Usage Released Autumn 2006 50 users use portal in a month. 40 analyses/day local server. 4 cpu hours/day local server. 50 analyses/day Grid. 40 cpu hours/day on Grid. 500 users and 70 citations summer 2012.

14 Demo

15

16

17

18

19

20 User Count

21 Analyses & CPU count

22 User Studies User Projects Sheep – birth weight, milk & fleece quality Cattle, Sheep, Pigs & Chickens – growth, quality Horses – airway obstructions for racehorses Fish harvest traits Crocodiles – scale quality Eucalyptus Trees – wood quality Mouse – obesity Foxes - domesticity

23 CloudQTL Solution to long term sustainability of service. No infrastructure cost. Guaranteed analyse in time. Pay as you go model. Google, Microsoft, Amazon offer routes. Amazon preferred. EPCC route to ECDF


Download ppt "GridQTL High Performance QTL analysis via the Grid/Cloud."

Similar presentations


Ads by Google