Download presentation
Presentation is loading. Please wait.
1
1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles http://www.cobase.cs.ucla.edu
2
2 Conventional Query Answering Need to know the detailed database schema Cannot get approximate answers Cannot answer conceptual queries Cooperative Query Answering Derive approximate Answers Answer Conceptual Queries
3
3 Find a seaport with railway facility in Los Angeles CoBase Servers Heterogeneous Information Sources CoBase provides: Relaxation Approximation Association Explanation Find a nearby friendly airport that can land F-15 Domain Knowledge Find hospitals with facility similar to St. John’s near LAX Cooperative Queries
4
4 Generalization and Specialization More Conceptual Query Specific Query Conceptual Query Specific Query Generalization Specialization Generalization Specialization
5
5 Type Abstraction Hierarchy (TAH) Chemical-Suit Size TAH (A non-numerical TAH) All_Sizes Large_Size Small_Size Very_Small Small_to_Medium Large_to_Extra_Large Very_Large XLXXLLMSXXSXXXS Provide multi-level knowledge representations
6
6 Type Abstraction Hierarchy (TAH) CA N. CAS. CAC. CA San Jose Palo Alto Sacramento Davis San Diego Long Beach LASF (Location Example)
7
7 Relaxation Agent query conditions constraints Use knowledge-based approach (generalization and specialization via Type Abstraction Hierarchy) to relax the followings for matching:
8
8 Query Relaxation Yes Query Display Answers Relax Attribute Database No Query Modification TAHs
9
9
10
10 Visualization of Relaxation Process Query: Find seaports in the given region. given region relaxed region
11
11
12
12 Relaxation Control Primitives not-relaxable runway-length relaxation-order (runway length, location) preference-list unacceptable-list answer-size relaxation-level
13
13 Relaxation Primitives ^ (approximate) ^ 9 am between near-to (context- sensitive) Airport near-to LAX Restaurant near-to UCLA similar-to Airport similar-to LAX base-on (traffic,runway) within
14
14 Similar-to Find all airports in Tunisia similar to the Bizerte airport based on runway length and (more importantly) runway width. select aport_name, runway_length, runway_width from runways, countries where aport_name similar-to ‘Bizerte’ based-on ((runway_length 1.0) (runway_width 2.0)) and country_state_name = ‘Tunisia’ and countries.glc_cd = runways.glc_cd
15
15 Similar-to Result Similar-to module ranks the returned answers according to mean-squared error.
16
16 Unacceptable List Operator NE Tunisia Central Tunisia NW Tunisia SW Tunisia Bizerte El Borma... Central Tunisia SW Tunisia Gafsa El Borma Type Abstraction Hierarchy Trimmed TAH Avoid Northern Tunisia! CoBase Relaxation Manager Constraint Gafsa
17
17 TAH Generation for Numerical Attribute Values Relaxation Error Difference between the exact value and the returned approximate value The expected error is weighted by the probability of occurrence of each value DISC (Distribution Sensitive Clustering) is based on the attribute values and frequency distribution of the data
18
18 TAH Generation for Non-numerical Attribute Values Pattern Based Knowledge Induction (PBKI) Rule-based approach Clusters attribute values into TAH based on other attributes in the relation (i.e., Inter- Attributes Relationships) Provides attribute correlation value (measure how well the rules applied to the databases)
19
19 Type Abstraction Hierarchy (TAH) Location Name Runway Length All Short Medium Long 0... 700 700... 1K 1K... 5K Tunisia NE Tunisia Bizerte Tunis Djedeida Central Tunisia SW Tunisia El Borma... Provide multi-level knowledge representations
20
20 Associative Query Answering Provide relevant information not explicitly asked by the user User Query: List all airports with runway length between 8500 and approximately 10000 feet Query Answers Associated Attributes and Answers User Type = Pilot User Type = Planner
21
21 CoBase and GLAD Integration Wesley W. Chu
22
22 CoBase Functionality Provide approximate matching Find HETs with capacity of approximate 5-ton Provide conceptual query answering Find “Earth Moving” Equipment Provide content-sensitive spatial queries Find storage sites near selected location (Integration with MATT map server) Provide relaxation control Relaxation order Not-relaxable At-least (answer set, quantity on hand)
23
23 Cooperative Operations Added to GLAD Implicit Query Relaxation Explicit Query Relaxation Approximate operator Similar-to/based-on Spatial relaxation Relaxation Control Relaxation-order Not-relaxable At-least (answer-set size, quantity on hand)
24
24 CoBase Features Added to GLAD Enhance GLAD queries with cooperative operators (similar-to, relaxation-order, etc.) Display the query relaxation process modified query conditions (value, spatial) type abstraction hierarchies Rank returned answers with similarity measures e.g., spatial relaxation ranks answers according to their distance from the selected location
25
25 CoBase and GLAD TIE Report Collection Report Query Constructor Filter Editor Object Cache Display Generator Query Collection GLAD CoBase Query Editor CoBase Relaxation Manager Knowledge Base Data Cache CoBase Data Source Manager Databases NSNs Spatial Area Selection
26
26 GLAD Query Find NSNs of aircraft with passenger capacity > 10, combat type = 'I', capacity weight <= 2 tons and price < 700,000. select nsn, price, pax_capacity_qty, capacity_wt_ston from nsn_description where (upper(class) = '7' and upper(cbs_category_nomen) = 'AIRCRAFT' and price < 700000 and pax_capacity_qty > 10 and upper (combat_type) = 'I' and capacity_wt_ston <= 2)
27
27 CoGLAD Query with Relaxation Control Operators Find NSNs of aircrafts with passenger capacity > 10, combat type = 'I', capacity weight <= 2 tons and price < 700,000. Attribute passenger capacity is not relaxable. Relax price first and then capacity weight. select nsn, price, pax_capacity_qty, capacity_wt_ston from nsn_description where (upper(class) = '7' and upper(cbs_category_nomen) = 'AIRCRAFT' and price < 700000 and pax_capacity_qty > 10 and upper (combat_type) = 'I' and capacity_wt_ston <= 2) not-relaxable pax_capacity_qty relaxation-order price capacity_wt_ston
28
28 CoGLAD Query with Similar-to Operator Find aircraft similar to NSN = '0000IB0000961' based on the attributes price, passenger capacity and air mileage. Passenger capacity has a weight of 8 and price and air mileage has a weight of 1. select nsn from nsn_description where upper(nsn) similar-to '0000IB0000961' based-on ((price 1.0) (pax_capacity_qty 8.0) (air_mileage 1.0)) at-least 4 * '0000IB0000961' is an answer from the previous query
29
29 CoGLAD Query with Approximate Operator Find DLA stock report with NSN like ‘%8340% (FSC for tents and tarpaulin) and on-hand quantity is approximate 150. select nsn, ric from dla_stock_report where nsn like ‘%8340%’ and on_hand_quantity = ~ 150
30
30 Adding Constraints to a Query GLAD query select nsn, ric from dla_stock_report where nsn like ‘%8340%’ and nomenclature like ‘%TARP%’ Query with added constraints select nsn, ric from dla_stock_report where nsn like ‘%8340%’ and nomenclature like ‘%TARP%’ and on_hand_quantity = ~ 150 and size_in_square_feet = 350
31
31 Example of Spatial Relaxation NSNs selected an area on the map constraint: quantity on hand CoBase Relaxation Manager satisfy constraints Yes No return the answers Query Processing relax the selected area based on the context-sensitive TAHs
32
32 Spatial Relaxation with Relaxation Control relaxation-order: size, (latitude, longitude) not-relaxable: price at-least: value: size of the tarpaulin quantity on hand: relax until enough quantity on hand (specified by the user) is obtained
33
33 Scalable and Extensible CoBase Architecture
34
34 Mediator Inter-Communications via KQML Module Objects APIs Content Language Data Actions CoBase Ontology Mediator A Module A CoBase Ontology CoBase Content Language KQML Mediator B Module B CoBase Ontology CoBase Content Language KQML
35
35
36
36 Query Answers Without CoBase Query: find chemical suits
37
37
38
38
39
39
40
40
41
41
42
42
43
43 Electronic Warfare Identify and locate sources of radiated electromagnetic energy Determine emitter type based on the operating parameters of observed signals: Radio Frequency (RF) Pulse Repetition Frequency (PRF) Pulse Duration (PD) Scan Period (SP) other operating parameters Determine platform sites near the line of the bearing of an emitter This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ Frew, et al.), Camden, NJ
44
44 Performance Improvement by Using CoBase in EW Conventional DB: parameter ranges from emitter specifications CoBase: DB: peak parameters (RF,PRF) and parameter ranges (PD,SP) KB: TAHs based on RF and PRF peak parameters TAHs based on PD and SP parameter ranges Case 1: emitter signals without noise Case 2: add noise - PD & SP (10%), PRF (5%), RF (2.5%) Sample Size: 1000 signals Emitter Types: 75 This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ Frew, et al.), Camden, NJ
45
45 Current CoBase Users and Applications
46
46 Conclusions Provide user and context sensitive query relaxations (structured and unstructured data) Provide additional information (associative query answering) based on past cases CoSQL (Cooperative SQL) similar-to, near-to, approximate relaxation control operators GUI map server, high-level query formation
47
47
48
48 CoSent: An Active Data Base Technology Natural language-like rule supports conceptual & approximate terms Decompose natural language-like rule to low level rules via knowledge based (TAH) Mimic human cognitive process and thus ease in rule specification Ease in rule maintenance
49
49 CoSent: An Active Database Technologies Trigger with high-level rules containing conceptual term (e.g., bad, heavy) and approximate operators (e.g., similar-to, near-to, approximate) Allow trigger conditions to be specified with fuzzy and conceptual terms Mimic human cognitive expression CoSent monitors temporal composition events and executes rules with conceptual and approximate terms.
50
50 Key Features of CoSent User defined rules transformed into low-level range values via knowledge base--Type Abstraction Hierarchies (TAHs) TAHs are typically generated from data sources automatically Leveraged on conventional DBMS (e.g., Oracle, Sybase, Teradata) triggering systems Rule definition is either specified by domain expert or derived by data mining technologies
51
51 Example of Rule Definitions with Data Mining Technology Find attributes that frequently appear together for a given target attribute. If bad road condition and also bad weather, then cause traffic congestion. If a person wrote many bad checks and also has past eviction, then this person is a poor credit risk. Based on the frequency of occurrence, the derived rules can be ranked according to certain information measure.
52
52 Conventional vs. Natural Language-Like Rules Natural Language-Like Rul If the weather turns bad, then notify all affected units in that region and all those that are near to that region. Conventional Rule If wind_speed > MAX_WIND_SPEED and wave_height > MAX_WAVE_HEIGHT then notify affected units in regions.
53
53 Natural Language-Like Rule Specifications Example 2 If the aircraft has a fuel contamination problem and the aircraft type is similar-to‘C-5’ based on the fuel type and fueling method, then notify the authority Example 1 If the number of departures of large cargo carrier (e.g., C-5, C-141) becomes significantly low in the past seven days, notify the Air Mobility Command.
54
54 Example Wind Speed (meters/second) 14.9 13.5 12.2 12 11.8 10.6 10.5 10 8.3 7.9 8.1 7.7 7.1 Wave Height (meter) 3.3 3.1 2.6 2.8 2.3 2.7 2.5 2.3 2.2 2 1.8 Wind Speed (meter/second) 7.4 7.7 7 6.5 6.6 6.5 6.6 6.4 5.9 5.7 6 4.5 4 3.7 Wave Height (meter) 1.9 1.7 1.6 1.5 1.6 1.4 1.5 1.4 1.6 1.4 1.3 1.2 Wind Speed is the hourly average over an eight-minute period for buoys and a two- minute period for land stations Wave height is sampled in a 20-minute period DoD Transportation Planning Weather Report Table
55
55 TAH Example Wave Height [0.6, 7.2] VERY LOW [0.6, 1.25] LOW [1.25, 1.75] HIGH [1.75, 2.45] VERY HIGH [2.45, 7.2] Wave Height
56
56 A Portion of Wave Height TAH
57
57 Triggering Based on Temporal Composite Events Notify the commander if within the past seven days, the total departure of C-5 is significantly low and the filter problem on C-5 is extremely high. C-5 Departure Low 9-134.5 High 134.5-208 Very Low 53-134.5 Signt. Low 9-53 Signt High 162-208 Very High 134.5-162 C-5 Filter Problem Low 0-53 High 53-79 Very Low 36-53 Extra. Low 0-36 Ex High 60-79 Very High 53-60
58
58 Natural Language-Like Rule Translations Rule Definition TAH Conventional triggering system (e.g.,Oracle, Sybase,Teradata) Low-level rules Natural Language-Like Rules Rule Parser Rule Rep Rule Decomposer Rule Translator Rule Translation/Relaxation
59
59 CoSent Architecture Trigger Action (output) Rule Parser Relaxation Engine TAHs Rule Base Rule Manager Event Manager Action Manager Natural Language-Like Rule Composite Event Specification and Notification CoSent Server (input) (input/output) Rule Translation/ Relaxation Commercial relational database systems (e.g., Oracle, Sybase, Teradata, etc.)
60
60 CoSent Demo Natural Language-like rule with conceptual terms :“very high wave height” and ”very strong wind speed” Natural language-like rule with approximate term “nearby” and conceptual term “bad weather” Install trigger by drag-and-drop on the desired location on the map
61
61 Natural Language-Like Rule Natural language-like rule containing conceptual terms, such as wave_height = “very-high” and wind_speed = “very-strong”, can be translated to range values by domain knowledge. For instance, type abstraction hierarchy. Natural language-like rules reduce the number of rules, thus easing rule maintenance
62
62
63
63
64
64
65
65
66
66
67
67 Rules With Approximate Terms Rules can contain approximate terms, such as near-by and approximate, thus ease in rule specification The Trigger can be installed on the desired location on a map by drag-and-drop method The near-by region affected by the bad weather condition is specified by the trigger condition shown by a red circle
68
68
69
69
70
70
71
71
72
72
73
73
74
74
75
75 Map Server Architecture
76
76 Current Capabilities of Map Server Visualization of Query Answers Icons Paths Enter Query Constraints Graphically Visualization of Query Relaxation Process
77
77 Visualization of Relaxation Process Query: Find seaports in the given region. given region relaxed region
78
78 Explanation Agent Based on process traces and invocation rules, generate English-like explanation of: Relaxation process Quality of approximate matching Further explanation on definitions and terms in explanation
79
79 Explanation of Relaxation Process
80
80 Extend near-to Primitive Points to Regions
81
81 Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong
82
82 Outlines Overview Semantic Graph Model High-Level Query Formation for SPJ queries Incremental Query Formation for Complex Queries Conclusions
83
83 Overview: Query Formation Based on semantic graph model, including user- defined relationships User specifies requests and constraints Formulate simple query by graph search technique Candidates ranked by information measure English-like query description A complex query can be formulated by a series of simple queries
84
84 Related Work Query formulation as Steiner tree problem (Wald and Sorenson, 1984) limited to partial 2-tree graphs Formulate simple Select-Project-Join (SPJ) queries via Universal Relation Model: no need to specify natural joins (Ullman 1988, Vardi, 1988) Object-oriented query path expression completion: partial order relationship between different path for ranking (Ioannidis and Lashkari, 1994) Query-by-Icon (QBI) [Massari and Chrysanthis, 1995] Natural language interfaces (text/voice): logical form to query
85
85 Semantic Graph Model Weighted graph G=(V,E): Nodes: entities -- strong, weak, user-defined Links: relationships -- ISA, HAS, simple, complex, user-defined For relational databases: nodes: relations links: natural and user-defined joins Weight: information measure of a node or link
86
86 Query Feature Query expression in a semantic graph Query Topic, T: A set of Joins represented by links Query Constraints, C: Query Conditions Query Aspect, A: Attribute list
87
87 A query topic for “aircraft can land on airports at geographical locations of countries” airports runways can land have is a located airfield_chars geoloc country
88
88 Semi-Automatic Generation of Semantic Model Find natural joins through key and foreign key between nodes. User-defined links can be added into the graph model. Designers need to specify link types and assign names to all the elements in the graph.
89
89 Example of Semantic Model Generation AIRPORT: APORT_NM, GEOLOC_TYPE, GLC_CD, ELEV_FT, …; key: APORT_NM. RUNWAY: APORT_NM, RUNWAY_NM, GLC_CD, RUNWAY_LENGTH_FT, RUNWAY_WIDTH_FT, …; key: RUNWAY_NM. GEOLOC: GLC_CD, GLC_NM, CY_CD, LATITUDE, LONGITUDE, …; key: GLC_CD. COUNTRY: CY_CD, CY_NM, …; key: CY_CD. Links: AIRPORT--RUNWAY: APORT_NM; AIRPORT--GEOLOC: GLC_CD; RUNWAY--GEOLOC: GLC_CD; GEOLOC--COUNTRY: CY_CD;
90
90 Information Measure Information measure of a node or link, a I(a) = - log P(a) where P(a) is the probability of a being used in queries. Assume nodes and links are independent, for a subgraph with a set of elements A={a i | i = 1, …, n}, information measure is additive: n I(A) = SUM I(a i ) i = 1
91
91 Information Measure (cont.) Initial Information Measure: all the nodes = 1 different nodes have a different value Information measure is normalized and converted into counts Probability of a node or a link is P(a i ) = c i /c Update Information measure Ranking based on Information measure, thus adapt to user feedback
92
92 Query Formulation To formulate (simple) queries without knowledge of query language or database schema Example: Find airports in Tunisia that can land a C-5 cargo plane User input: Query aspect: AIRPORTS.APORT_NM Constraints: AIRCRAFT_AIRFIELD_CHARS.AC_TYPE_NAME = ‘C-5’ COUNTRY_STATE.CY_NM = ‘Tunisia’ Links: CAN LAND
93
93 Formulated Query SELECT R3.APORT_NM FROMAIRCRAFT_AIRFIELD_CHARS R0 AIRPORTS R3, COUNTRY_STATE R11 GEOLOC R12, RUNWAYS R16 WHERER0.AC_TYPE_NM = ‘C-5’ AND R11.CY_NM = ‘Tunisia’ AND R0.WT_MIN_AVG_LAND_DIST_FT <= R16.RUNWAY_LENGTH-FT AND R0.WT_MIN_RUNWAY_WIDTH_FT <= R16.RUNWAY_WIDTH_FT AND R11.GLC_CD = R3. GLC_CD AND R3.APORT_NM = R16.APORT_NM AND R11.CY_CD = R11.CY_CD
94
94 Query Completion as Graph Search Problem Given: An incomplete input query topic T i Find a set of links to complete the topic (to make T i connected) Minimum Missing Information principle: The query completion candidate T c (the missing links and nodes) for an incomplete input topic T i contains the minimum information
95
95 Query Formulation Algorithm Input: subgraph T of the semantic graph G Find candidates with the minimum Information measure Two methods used to limit the search scope: L-step-bound paths: paths that connect two components with at most L links, to limit search within the neighborhood of the input subgraph k-minimum completion candidates: only at most k candidates with minimum Information measure are kept (alpha-beta pruning)
96
96 Initial Components and 2-Step-Bound Paths For the “CAN LAND” Query airports repair (1) 2 aircraftsairports haveauthorize 12 (2) runways can land airports country geoloc atis a 11 geoloc atlocated 11 geoloc is alocated 11 airports have 1 (3) (4) (5) (6) (a) Initial components (b) 2-step-bound paths airfield_chars airports runways airfield_chars country airports
97
97 The Semantic Graph For the Transportation Domain airports runways can land Relation Node at have is a located 2 1 11 1 weather airfield_chars geoloccountry
98
98 Incremental Query Formulation To assist user reach a complex query goal with a series of simple queries The subsequent queries may depend on results of preceding queries (derived relations) Issues Incorporate derived relations into the semantic graph Suggest missing attributes to link isolated derived nodes to the graph Incremental Query Formulation
99
99 Incremental Query Examples Find airports in Tunisia. Which of these airports can land a C-5? What is the weather at these airports?
100
100 Incorporating Derived Relations Source relation: contributes attributes to the derived relations Derived relation: inherits properties of attributes from their source relations Deriving link: links to the source relations through inherited keys Inherited link: inherits links from the source relations
101
101 Extended semantic graph showing derived nodes, derived links and inherited links airports runways can land Relation Node at have is a located 2 1 11 1 Derived Node Derived Link Inherited Link airfield_chars weather geoloc country airporttunisiacanlandairporttunisiacanlandweather airporttunisia
102
102 Suggesting Key Attributes for a Query Find source relations for the isolated derived relation. Suggest key of the source relations as attributes to include.
103
103 Concept and Attribute Specification Interface
104
104 Query Constraint Specification
105
105 Action Specification
106
106 English-Like Query Description and the Formulated Query
107
107 Conclusions Semantic graph model provides a basis for query formulation search Ranking of query candidates by information measure in formulation provides adaptive behavior Incremental query formulation is effective for complex queries GUI and voice interface can be built for query formulation from high-level concepts
108
108
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.