Presentation is loading. Please wait.

Presentation is loading. Please wait.

29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis.

Similar presentations


Presentation on theme: "29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis."— Presentation transcript:

1 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis

2 29 th November 2001 Databases Databases store information of relevance to a group of users e.g. employees personal details, for a Personnel department employees income details, for a Payroll department details of molecular structure and interaction, for a Drug company details of TV broadcasts and ratings, for a TV company

3 29 th November 2001 Data models The information stored in a database is expressed using a data model The binary relational data model is a very simple data model In this model, information is represented using entities and binary relationships between them These can be represented as the nodes and edges of a graph e.g. here is the schema of a ViewingFigures database:

4 29 th November 2001

5 Data and Schema The schema of a database defines the type and format of the actual data – it is part of the databases meta data The data in the database conforms to the schema. So a fragment of the ViewingFigures data might be:

6 29 th November 2001

7

8

9 The TriStarp Project The TriStarp research project, led by Prof Peter King from the mid 1980s, aimed to (1) develop repository technology for binary relational information (2) develop languages for computing with this kind of information Mir Derakhshan worked on (1). Carol Small and I worked on (2). We were supported by CASE studentships from IBM UK Labs, Prof Geoff Sharman and Norman Winterbottom being our industrial supervisors

10 29 th November 2001 Computing with Binary Relational Data There are two natural candidates for this: logic languages - explored by Carol functional languages - the topic of my PhD research, resulting in the FDL language (1990)

11 29 th November 2001 The Logic Approach Find all actors who star in programme P205 stars(P205,x?) stars P205 x? stars ProgrammeActor

12 29 th November 2001 The Logic Approach Find all programmes in which Kevin Bacon stars stars(p?,Kevin Bacon) stars p?Kevin Bacon stars ProgrammeActor

13 29 th November 2001 The Logic Approach Find all actors who have starred with Kevin Bacon stars(p?,Kevin Bacon),stars(p?,x?) stars ProgrammeActor stars p?Kevin Bacon x? stars

14 29 th November 2001 The Functional Approach The functional approach interprets binary relationships as functions, leading to the so-called functional data model stars ProgrammeActor inv_stars

15 29 th November 2001 The Functional Approach Find all actors who star in programme P205 stars P205 stars ProgrammeActor inv_stars

16 29 th November 2001 The Functional Approach Find all programmes in which Kevin Bacon stars inv_stars Kevin Bacon stars ProgrammeActor inv_stars

17 29 th November 2001 The Functional Approach Find all actors who have starred with Kevin Bacon [x | p inv_stars Kevin Bacon; x stars p] stars ProgrammeActor inv_stars

18 29 th November 2001 More complex queries Find the most popular programme showing at 10pm on 1 st November, 2001: let maxViewers = max [viewers s | s inv_date (1,11,2001); (start s) 2200] in [of s | s inv_viewers maxViewers]

19 29 th November 2001 Derived Functions Find the most popular programme showing at time t on date d: mostPopular t d = let maxViewers = max [viewers s | s inv_date d; (start s) t] in [of s | s inv_viewers maxViewers]

20 29 th November 2001 Recursive functions Find actors linked to Kevin Bacon via any number of edges labelled stars: linkedTo [Kevin Bacon] where: linkedTo result = let new = [x | y result; p inv_stars y; x stars p] in if (subset new result) then result else linkedTo (new U result) stars ProgrammeActor linkedTo

21 29 th November 2001 Oracle of Bacon at Virginia www.cs.virginia.edu/oracle Bacon NumberNo of People 01 11479 2115203 3285896 465055 54535 6534 781 828 91 101 Total linkable actors472814

22 29 th November 2001 Higher-order functions More generally: linkedTo s = complete (stars,inv_stars) s where: complete (f,inv_f) result = let new = [x | b result; a inv_f b; x f a] in if (subset new result) then result else complete (f,inv_f) (new U result) f A B linkedTo

23 29 th November 2001 Collaboration Networks Find all people linked to a person P via the author relationship: complete (author,inv_author) [P] author PaperPerson inv_author

24 29 th November 2001 Acknowledgements… If we ask the simpler query [x | p inv_author Alexandra Poulovassilis; x author p] author PaperPerson inv_author we obtain the people with whom I have co- authored research papers: J.Bailey K.Benkerimi S.Courtenage P.Demetriades M.Derakhshan B.Heydecker S.Hild P.J.H.King M.Levene N.Lorentzos P.J.McBrien P.Newson E.Nonas R.Offen S.Reddi S.Schwarz C.Small E.Tuv P.T.Wood L.Xu

25 29 th November 2001 Drawbacks of the Binary Relational Model Despite its elegance, the binary relational model has some drawbacks : (a) large binary relational schemas can be hard to understand (b) it is not so natural for representing higher-dimensional relationships

26 29 th November 2001 The Hypernode Model (a) led to research into nested-graph data models with Mark Levene

27 29 th November 2001 Higher-dimensional relationships An example of problem (b) is the 3-way relationship between Distribution companies, Programmes and TV companies which has to be represented by an entity and 3 binary relationships: Supply DistrCo Programme TVCo

28 29 th November 2001 The PFL Language This led to the development of a new functional language PFL, with Carol Small, which directly supports higher-dimensional relationships e.g. the supply relationship is accessed by a single selector function |supply : (DistrCo,Programme,TVCo) [(DistrCo,Programme,TVCo)] Some examples: |supply (Any,P205,BBC) |supply (Any,Any,BBC) |supply (Any,P205,Any)

29 29 th November 2001 Active Databases Up to now, I have been looking at schema, data and derived database information In the 1990s a new kind of database information was being explored, namely event-condition-action rules of the form: on event if condition do action ECA rules make a database active in that it can automatically execute actions if events occur and conditions hold

30 29 th November 2001 Active PFL In a project during mid 1990s, we extended PFL with ECA rules (with Swarup Reddi and Carol Small) For example: on insert viewers if [s | (s,n) |viewersInc (Any,Any); n < 500000] do insert [s | (s,n) |viewersInc (Any,Any); n < 500000] lowRated viewers ShowingNumber

31 29 th November 2001 PFLs ECA rule execution semantics We specified these in PFL itself, to experiment before implementing: execSched (db,s) = if s = [] then (db,[]) else execSched (schedRules (exec (head s,db),s)) schedRules (db,a:s) = let (db,pre,suf) = fold schedRule (db,[],[]) (triggers a) in (db,pre ++ s ++ suf) schedRule i (db,pre,suf) = if (eval (event-condition-query i) db) = {} then (db,pre,suf) else updateSched (actions i,mode i,db,pre,suf)

32 29 th November 2001 Analysing and Optimising ECA rules Techniques are needed for analysing and optimising the behaviour of ECA rules In a project that started in late 1990s, we have been using the functional semantics of ECA rule execution as the basis for developing such techniques (with James Bailey, Simon Courtenage, Pete Newson) In particular, we have been investigating abstract interpretation and partial evaluation of the rule execution semantics for analysis and optimisation, respectively.

33 29 th November 2001 Abstract execution semantics execSched* (db*,s*) = if s* = [] then (db*,[]) else execSched* (schedRules* (exec* (head s*,db*),s*)) schedRules* (db*,a*:s*) = let (db*,pre*,suf*) = fold schedRule* (db*,[],[]) (triggers a*) in (db*,pre* ++ s* ++ suf*) schedRule* i (db*,pre*,suf*) = if (eval* (event-condition-query i) db*) = False then (db*,pre*,suf*) else updateSched (actions i,mode i,db*,pre*,suf*)

34 29 th November 2001 Correctness of the Abstract Execution If for all queries q, abstract databases db*, and abstract actions a*: conc (exec* (a*,db*)) is a superset of [exec (a,db) | (a,db) conc (a*,db*)] eval* q db* = False implies that for all db in conc db*, eval q db = {} then execSched* is a conservative test for rule termination rule unreachability

35 29 th November 2001 Partial Evaluation of Rule Execution Produce a specialised equation for schedRules for each kind of rule action that may appear at the head of the schedule: schedRules (db,a1:s) = let (db,pre,suf) = fold schedRule (db,[],[]) (triggers a1) in (db,pre ++ s ++ suf) schedRules (db,a2:s) = let (db,pre,suf) = fold schedRule (db,[],[]) (triggers a2) in (db,pre ++ s ++ suf)...

36 29 th November 2001 Partial Evaluation of Rule Execution Suppose action a1 triggers rule 2 and rule 3 (in that order of priority). Then we can replace triggers a1 above by [2,3] and apply fold obtaining: schedRules (db,a1:s) = let (db,pre,suf) = schedRule (schedRule (db,[],[]) 2) 3 in (db,pre ++ s ++ suf)

37 29 th November 2001 Partial Evaluation of Rule Execution Now we can apply schedRule (assuming rule 2 has Immediate scheduling mode and rule 3 Deferred scheduling mode): schedRules (db,a1:s) = let (db,pre,suf) = if (eval (event-condition-query 2) db) = {} then if (eval (event-condition-query 3) db) = {} then (db,[],[]) else (db,[],bind (actions 3) db) else if (eval (event-condition-query 3) db) = {} then (db,bind (actions 2) db,[]) else (db,bind (actions 2) db,bind (actions 3) db) in (db,pre ++ s ++ suf)

38 29 th November 2001 Heterogeneous Databases So far, I have been discussing single databases However, larger-scale applications may need to integrate information from several databases, possibly supporting different data models To integrate information stored in such heterogeneous databases it is necessary to form a single, integrated schema Conflicts may existing between the various source schemas, which must be removed by applying transformations to these schemas

39 29 th November 2001

40 Graphs for Schema Transformation In work with Peter McBrien started in late 1990s, we have developed a general framework for transforming and integrating heterogeneous database schemas We represent schemas expressed in higher-level data models, such as relational or object-oriented, in terms of a nested-graph data model, thus allowing us to transform between different data models In our schema transformation framework, new schema constructs are defined using queries over existing constructs In our framework, schema transformations are reversible, thus allowing query and data translation between schemas:

41 29 th November 2001

42

43 addClass Series [p|(p,S) category] addClass Doc [p|(p,D) category] addClass Film [p|(p,F) category] addClass Prog [p|(p,c) category]

44 29 th November 2001 addSubClass Film Prog addSubClass Doc Prog addSubClass Series Prog addClass Series [p|(p,S) category] addClass Doc [p|(p,D) category] addClass Film [p|(p,F) category] addClass Prog [p|(p,c) category]

45 29 th November 2001 addSubClass Film Prog addSubClass Doc Prog addSubClass Series Prog addClass Series [p|(p,S) category] addClass Doc [p|(p,D) category] addClass Film [p|(p,F) category] addClass Prog [p|(p,c) category] delRel category [(p,F)|p Film] U [(p,D)|p Doc] U [(p,S)|p Series]

46 29 th November 2001 addConstraint subset Film Prog addConstraint subset Doc Prog addConstraint subset Series Prog addNode Series [p|(p,S) category] addNode Doc [p|(p,D) category] addNode Film [p|(p,F) category] addNode Prog [p|(p,c) category] delEdge category [(p,F)|p Film] U [(p,D)|p Doc] U [(p,S)|p Series] delNode Programme Prog delNode Category [F,D,S]

47 29 th November 2001 delConstraint subset Film Prog delConstraint subset Doc Prog delConstraint subset Series Prog delNode Series [p|(p,S) category] delNode Doc [p|(p,D) category] delNode Film [p|(p,F) category] delNode Prog [p|(p,c) category] addEdge category [(p,F)|p Film] U [(p,D)|p Doc] U [(p,S)|p Series] addNode Programme Prog addNode Category [F,D,S]

48 29 th November 2001 Given a transformation from a schema S1 to a schema S2, and a query Q on S1, we use the delete transformation steps to substitute for constructs of S1 which are not in S2 e.g. from the previous slide: [title p | p Film U Doc] on: translates into [title p | p [p | (p,F) category] U [p | (p,D) category] on: Query Translation

49 29 th November 2001 Functions for Database Integration In the formal specification of our framework, each schema transformation is a function t : Database Database where a database consists of schema+data We are currently implementing our framework within the Automed project We are planning to handle query language heterogeneity in Automed by translation into/from a functional intermediate query language

50 29 th November 2001 Future Research Extending Automed to also handle materialised views and view updates, leading to a data warehousing approach to data integration Data warehousing of genomic data (in collaboration with Profs Thornton, Orengo, Barton, and Drs Keller, Martin, Shepherd) Moving beyond database integration and database dynamics to data integration on the Web and Web dynamics: handling XML data sources within Automed developing an ECA rule language for XML


Download ppt "29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis."

Similar presentations


Ads by Google