Presentation is loading. Please wait.

Presentation is loading. Please wait.

Andy Jenkinson, EBI An Introduction to DAS. Summary of Topics What is Data Integration? Problems in Data Integration An architectural overview of DAS.

Similar presentations


Presentation on theme: "Andy Jenkinson, EBI An Introduction to DAS. Summary of Topics What is Data Integration? Problems in Data Integration An architectural overview of DAS."— Presentation transcript:

1 Andy Jenkinson, EBI An Introduction to DAS

2 Summary of Topics What is Data Integration? Problems in Data Integration An architectural overview of DAS Brief History of DAS The DAS Game Explore some examples

3 What is Data Integration

4 All These are Data Integration Reading some papers so you can write a report Exploring some database websites so you can learn about a topic Downloading some data from different databases so you can analyse it Downloading some data from different databases so you can combine it with your own

5 All These are Data Integration Reading some papers so you can write a report Exploring some database websites so you can learn about a topic Downloading some data from different databases so you can analyse it Downloading some data from different databases so you can combine it with your own

6 Data Integration “Automatic” data integration pulling in data from different locations processing it creating a resource derived from the data done via computers, not humans e.g. creating/updating a data warehouse Warehouse PDBEnsemblUniProt

7 Warehouse model

8

9

10

11

12

13 Distributed Annotation System Distributed Federated Client-Server architecture RESTful web services

14 Federation Not federation: Web services SOAP REST GFF, etc etc Are federation: PSICQUIC BioMoby Semantic Web (sort of)

15 Warehouse model

16 DAS model

17 Architectural Overview

18 DAS Databases are all different DAS is a uniform facet of a database – always the same Databases evolve when the database changes, DAS stays the same Databases age DAS data comes directly from the provider so is always fresh Databases are big DAS uses real-time targeted queries

19 History Developed circa 1999 for sharing genome annotations Expanded 2004 onwards more data types better metadata addition of Registry Generally pre-computed data used for visual display

20 To Summarise… The Distributed Annotation System is… A network of biological data sources An example of federation A collection of REST web services The DAS Protocol is… An integration platform A client-server protocol An agreed standard

21 DAS Architecture A client asks for data from many servers HTTP requests identically structured URLs, the same parameters Each server behaves in the same way pre-defined set of behaviours e.g. provide a sequence, provide annotations of a sequence Each server provides different data in the same format DAS-XML

22 DAS Concepts Reference sequence e.g. “chromosome X” or “NT_025741” Annotation (a.k.a Feature) information attached to a location in a sequence e.g. “substitution at residue 326 of BRCA1” Non-positional annotation information attached to the reference as a whole e.g. “found in basolateral plasma membrane”

23 DAS Concepts Reference source server that provides “core” reference data e.g. GRCh37 sequence data Annotation source DAS Registry catalogue of DAS sources and their capabilities Client Software that combines the data together

24 Architectural Overview

25 The Game!

26 And a real example http://www.ebi.ac.uk/dasty/

27 The DAS Protocol Defines 3 constraints

28 The DAS Protocol Defines 3 constraints transport layer: HTTP Data transport Standard HTTP Includes compression Some additional headers, e.g. to indicate DAS version

29 The DAS Protocol Defines 3 constraints transport layer: HTTP query format: constrained REST URLs Well-defined query URLs A client can issue a command http://www.ebi.ac.uk/das-srv/uniprot/das/uniprot/sequence?segment=P15056 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^ ^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^ site prefix das source command arguments

30 Defines 3 constraints transport layer: HTTP query format: constrained REST URLs response format: constrained XML XML format server responds with a simple XML document MAALSGGGGGGAEPGQALFNGDMEPEAGAGAGAAASSAADPAIPEEVWNIKQMIKLTQ... The DAS Protocol

31 Try these curl ‘http://www.ebi.ac.uk/das- srv/uniprot/das/uniprot/sequence?segment=P15056’ curl ‘http://www.dasregistry.org/das/sources?capability=features&authorit y=UniProt’ > /tmp/sources more /tmp/sources curl ‘http://das.cbs.dtu.dk:9000/das/netphos/features?segment=P15056’

32 Tools to help DAS client libraries: Bio::Das::Lite (Perl) JDAS, BioJava (Java) JsDAS (Javascript) DAS servers: ProServer (Perl) MyDas, Dazzle (Java) Example clients: Ensembl, Dalliance, MyKaryoView (genomic) Dasty, Pfam, SPICE, Jalview (protein)

33 Image Credits Flickr/muir.ceardach Flickr/Horia Varlan Flickr/Alessandro Pinna Fotopedia/Jean-Marie Hullot listicles.com/?p=3485 heartattackgrill.com Olivier H. Beauchesne


Download ppt "Andy Jenkinson, EBI An Introduction to DAS. Summary of Topics What is Data Integration? Problems in Data Integration An architectural overview of DAS."

Similar presentations


Ads by Google