Download presentation
Presentation is loading. Please wait.
Published byMarilyn Tucker Modified over 9 years ago
1
Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD
2
Context and goals ● Heterogeneous metadata management on grids Clusters of clusters ● High-level queries using metadata ● Easy and flexible deployment and configuration ● Minimal overhead ● Various interfaces ● Initial target application domains Biocomputing (lots of metadata, few data) Microscopic imaging (lots of data data, few metadata)
3
The Gedeon middleware Metadata management on lightweight grids ● Records of (attribute,value) pairs stored in files Flexible requests ● Can be combined through scripting Various interfaces ● Command line (tools) ● Libraries ● Virtual FS (legacy applications support) Deployment “à la carte” ● Composition of various data sources Performances ● Dedicated I/O library ● Semantic caching
4
Outline 1.General architecture a.Gedeon internal structure b.Composition of various data sources 2.Practical use 3.« dual » cache Conclusion
5
Example of a deployment Query Interface (API, FS, GUI,...) Local proxy Interconnect middleware Local proxy Interconnect Client Servers « close » to the client Storage sites cache
6
Gedeon components ● Gedeon Kernel fuple ● I/O Library ● Evaluate the queries lowerG ● Operators to compose bases ● Remote access ● Interface API lowerG Virtual FS ● Cache application vSGF lowerG fuple network cache fuple network lowerG Local proxy
7
What inside the sources? ● Records of pairs attribute/value Id classifA classifB 457 Bacteria Clostridia taille26 ref Record
8
Example of composition of sources client + J Metadata can be local or copies site S1 site S2 site S3 RR
9
... Union enreg. A1 enreg. A2 enreg. A3 enreg. A4 + enreg. B1 enreg. B2 enreg. B3 enreg. B4... enreg. A1 enreg. A2 enreg. A3 enreg. A4 enreg. B1 enreg. B2 enreg. B3 enreg. B4 Unify storage space + Parallel evaluation
10
Round Robin RR Fault Tolerance client Source 1 Source 2
11
Round Robin RR Load Balancing client Source 1 Source 2 client
12
... Join operator Id A1 A2 457 v1 v2 A3v3 Id A1 A2 458 v4 v5 A3v6 J Id... Id An 457 vAn1 Id An 458 vAn2... Id A1 A2 457 v1 v2 A3v3 Id A1 A2 458 v4 v5 A3v6 AnvAn1 AnvAn2 Enrich a source with another
13
Outline 1.General architecture a.Gedeon internal structure b.Composition of various data sources 2.Practical use 3.« dual » cache Conclusion
14
Tools 1/2 ● Libraries ● CLI ● Operations sort projection select index ...
15
Tools 2/2 sort(attr='taille') ● Examples sort $> cat mesmeta.g | fsort 'taille' > trie_taille.g index create_idx(attr='Id').Id.idx search_idx('Id', 'P0123')
16
Language for the requests ● Simple ($, type control with the operators) ● Regular expressions ● Of the second order
17
Select expression Id classifB 459 Bacteria taille47 Id classifA 460 Fermicutes Select $Id>459 Id classifA 460 Fermicutes Id classifA classifB 457 Bacteria Clostridia taille26
18
Select using regexp Id classifA classifB Id classifB 457 Bacteria Clostridia 459 Bacteria taille26 taille47 Id classifA 460 Fermicutes Select $classifB==/.*a$/ Id classifA classifB 457 Bacteria Clostridia taille26 Id classifB 459 Bacteria taille47
19
Select using 2nd order logic Id classifA classifB Id classifB 457 Bacteria Clostridia 459 Bacteria taille26 taille47 Id classifA 460 Fermicutes Select $/classif[AB]/==Bacteria && $taille>=36 Id classifB 459 Bacteria taille47
20
Virtual FS interface ● Just a specific file-oriented interface ● Data and metadata can be anywhere in the grid ● Definition of logical directories Ex: cd '$classifB==|.*a$|' « and » between directories 1 filename =value of a metadata: logical view /fs_virt/$classifB==|.*a$|> ls 457 459 /fs_virt/$classifB==|.*a$|> cat *>/tmp/mater /fs_virt/$classifB==|.*a$|>
21
Outline 1.General architecture a.Gedeon internal structure b.Composition of various data sources 2.Practical use 3.« dual » cache Conclusion
22
Dual cache (1) ● 2 cooperative caches cache of requests (R, {id,...}) -> save computing power cache of data (id, {attr,...}) -> save bandwidth ● Semantic cache Can evaluate a query using the data in the cache Can generate a remainder to complement the data cached
23
Example ● Refinement of a request 1)'$OC==/Eukaryota/' -> (R, Lid={id1,id2,...}) 2)'$OC==/Eukaryota/ && $year>=1998' Select(*Lid, '$year>=1998')
24
Dual cache (2) ● Distributed semantic cache Typically used inside communities ● Lots of common requests No location constraints ● Members of the community can be geographically scattered ● Distributed data cache Minimize time and data transfer Cooperation between close, from a topological point of view, sites
25
Dual cache (3) Grenoble Servers Rennes Dual cache Query cache Object cache Semantic locality Community Eukaryota Community Archaea Geographic locality
26
Dual cache (4) ● Work in progress on the notion of distance Find geographical proximity Find common interests between communities ● Create hybrid communities based on their requests ● Could be used to change the cache parameters Manual and/or automatic
27
Conclusion ● A data integration middleware Handling of metadata ● Distributed and modular Deployment can be done according to architectural/organisational constraints ● Definition of a dual cache infrastructure Reflect both organisational use ● Prototype in use Packaging and documentation needed
28
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.