Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:

Similar presentations


Presentation on theme: "Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:"— Presentation transcript:

1 Oracle Optimizer

2 Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes: sname, rating TABLE ACCESS BY ROWID AND-EQUAL INDEX RANGE SCAN Sailors(sname) INDEX RANGE SCAN Sailors(rating) Suppose we also have an index on (sname, rating) –How should the query be performed?

3 Operations that Manipulate Data Sets Up until now, all operations returned the rows as they were found There are operations that must find all rows before returning a single row Try to avoid these operations for online users! –SORT ORDER BY: query with order by select sname, age from Sailors order by age;

4 Operations that Manipulate Data Sets –SORT UNIQUE: sorting records while eliminating duplicates e.g., query with distinct; query with minus, intersect or union select DISTINCT age from Sailors; –SORT AGGREGATE, SORT GROUP BY: queries with aggregate or grouping functions (like MIN, MAX)

5 Is the table always accessed? What if there is no index?

6 Operations that Manipulate Data Sets Consider the query: –select sname from sailors union select bname from boats;

7 Operations that Manipulate Data Sets Consider the query: –select sname from sailors minus select bname from boats; How do you think that Oracle implements intersect? union all?

8 Select age, COUNT(*) from Sailors GROUP BY age SORT GROUP BY TABLE ACCESS FULL Operations that Manipulate Data Sets

9 Distinct What should Oracle do when processing the query (assuming that sid is the primary key): –select distinct sid from Sailors

10 Join Methods Select * from Sailors, Reserves where Sailors.sid = Reserves.sid Oracle can use an index on Sailors.sid or on Reserves.sid (note that both will not be used) Join Methods: MERGE JOIN, NESTED LOOPS, HASH JOIN

11 Nested Loops Joins Block nested loop join NESTED LOOPS TABLE ACCESS FULL OF our_outer_table TABLE ACCESS FULL OF our_inner_table Index nested loop join NESTED LOOPS TABLE ACCESS FULL OF our_outer_table TABLE ACCESS BY ROWID OF our_inner_table INDEX RANGE SCAN OF inner_table_index

12 When Are Nested Loops Joins Used? If tables are of unequal size If results should be returned online

13 Hash Join //Partition R into k partitions foreach tuple r in R do //flush when fills read r and add it to buffer page h(r i ) foreach tuple s in S do //flush when fills read s and add it to buffer page h(s j ) for l = 1..k //Build in-memory hash table for R l using h2 foreach tuple r in R l do read r and insert into hash table with h2 foreach tuple s in S l do read s and probe table using h2 output matching pairs

14 Hash Join Plan HASH JOIN TABLE ACCESS FULL OF table_A TABLE ACCESS FULL OF table_B

15 When Are Hash Joins Used? If tables are small If results should be returned online

16 Sort-Merge Join Plan MERGE JOIN SORT JOIN TABLE ACCESS FULL OF table_A SORT JOIN TABLE ACCESS FULL OF table_B

17 When Are Sort/Merge Joins Used? Performs badly when tables are of unequal size. Why?

18 Hints You can give the optimizer hints about how to perform query evaluation Hints are written in /*+ */ right after the select Note: These are only hints. The oracle optimizer can choose to ignore your hints

19 Examples Select /*+ FULL (sailors) */ sid From sailors Where sname=‘Joe’; Select /*+ INDEX (sailors) */ sid From sailors Where sname=‘Joe’; Select /*+ INDEX (sailors s_ind) */ sid From sailors S, reserves R Where S.sid=R.sid AND sname=‘Joe’;

20 More Examples Select /*+ USE_NL (sailors) */ sid From sailors S, reserves R Where S.sid=R.sid AND sname=‘Joe’; Select /*+ USE_MERGE (sailors, reserves) */ sid From sailors S, reserves R Where S.sid=R.sid AND sname=‘Joe’; Select /*+ USE_HASH */ sid From sailors S, reserves R Where S.sid=R.sid AND sname=‘Joe’; inner table

21 Information Retrieval and DB

22 CONTAINS Introduce text search in SQL CONTAINS operator select Name from article where CONTAINS(abstract, ‘play’) > 0; Can combine OR, AND

23 Stemming Given the “stem” of a word, Oracle will expand the list of words to search for to include all words having the same stem –Stem of plays, played, playing, playful: play –where CONTAINS(abstract, ‘$play’) > 0;

24 Ranking We need to rank between the retrieved tuples according to their relevance –Open challenge –Several implementations for oracle The following slides are based on those of Dr. Sara Cohen

25 The Vector Space Model The Vector Space Model (VSM) is a way of representing text data through the words that they contain It is a standard technique in Information Retrieval In the following, we call this text data, document (classical IR) The VSM allows decisions to be made about which documents are similar to each other and to keyword queries

26 How Does it Work? Each document is represented as a vector which contains a value for each word in the vocabulary –this value is 0, if the word does not appear in the document Similarly, a query is represented as a vector The rank of the document with respect the the query is the distance between their vectors

27 Example: Boolean Value P 1 = “I live in a green house with a green roof” P 2 = “There is no life form on Mars” P 3 = “Men love green cars” P 4 = “I saw some little green men yesterday” 1 if the word appears, 0 otherwise

28 Example: Boolean Value P 1 = “I live in a green house with a green roof” P 2 = “There is no life form on Mars” P 3 = “Men love green cars” P 4 = “I saw some little green men yesterday” Vector for P 1

29 Example: Boolean Value Q = green OR men OR mars

30 Distance Between Vectors For two vectors d and d’ the cosine distance between d and d’ is given by: d  d’ is the scalar product of d and d’, calculated by multiplying corresponding values together |d| is the norm of d The “cosine measure” calculates the cosine between the vectors in a high-dimensional virtual space

31 Distance Between Documents t1t1 d2d2 d1d1 d3d3 d4d4 d5d5 t3t3 t2t2 θ φ

32 Example Consider the query Q="green men" and the document P 3 = "Men love green cars" The cosine distance: –scalar product: 1*0 + 1*1+ 1*0 + 1*1 = 2 –norms:  (1 2 + 1 2 + 1 2 + 1 2 ) = 2  (0 2 + 1 2 + 0 2 + 1 2 ) =  2 –Similarity: 2/(2  2) = 1/  2 Only dimensions that are non- zero in one of the vectors are shown

33 Defining Vector Values: TF Instead of boolean value, put word frequency (called tf, for "term frequency") What affect does this give? Sometimes a normalized version is used: –term frequency/number of words in the document

34 Normalized TF Always: Sum = 1

35 Another Option: Defining Vector Values as IDF We can combine TF with IDF, inverse document frequency –1/(number of documents containing the word) What is the affect?

36 Normalized IDF Sometimes a normalized version is used: The logarithm gives less influence to IDF when TF and IDF are combined What is the value for a word that appears in all documents? Why? Number of documents Number of documents in which w appears

37 Standard Measure is TF-IDF Use normalized TF times normalized IDF Note: Once the values are chosen (using any of the schemes considered), we use cosine distance to compare the document and query

38 XML (Extensible Markup Language) and the Semi-Structured Data Model

39 Motivation We have seen that relational databases are very convenient to query. However: –There is a LOT of data not in relational databases!! Perhaps the most widely accessed database is the web, and it certainly isn’t a relational database.

40 Querying the Web The web can be queried using a search engine, however, we can’t ask questions like: –What is the lowest price for which a Jaguar is sold on the web? Problems: –There are no facilities for asking complex questions, such as aggregation of data

41 Understanding the Web In order to query the web, we must be able to understand it. 2 Computer Science Approaches: –Artificial Intelligence Approach –Database Approach

42 Database Approach “The web is unstructured and we will structure it” Sometimes problems that are very difficult can be solved easily by enforcing a standard Encourage the use of XML as a standard for data exchange on the web

43 Jeff Cohen 04-828-1345 054-470-778 jeffc@cs.technion.ac.il Irma Levy 03-426-1142 irmal@yourmail.com Example XML Document Opening Tag Attribute Element Closing Tag

44 Very Unstructured XML The insured’s Corolla broke through the guard rail and plummeted into the ravine. The cause was determined to be faulty brakes. Amazingly there were no casualties.

45 XML Vs. HTML XML and HTML are brothers. They are both special cases of SGML. HTML has specific tag and attribute names. These are associated with a specific meaning XML can have any tag and attribute name. These are not associated with any meaning HTML is used to specify visual style XML is used to specify meaning

46 A Different Data Model RelationalSemi-Structured Abstract Model Sets of tuples Labeled Directed Graph Concrete Model TablesXML Documents Standard for Storing Data Data Exchange Separating Content from Style

47 Data Exchange Problem: Many data sources, each of a different type (different vendor), with a different schema. –How can the data be combined and used together? –How can different companies collaborate on their data? –What format should be used to exchange the data?

48 Separating Content from Style Web sites develop over time Important to separate style from data in order to allow changes to the site structure and appearance Using XML, we can store data alone CSS separates style from data only in a limited way Using XSL, this data can be translated into HTML The data can be translated differently as the site develops

49 Write Once Use Everywhere XML Data XSL WML (hand-held devices) XSL HTML (web browser XSL TEXT (Excel)

50 Using XML Quering and Searching XML: There are query languages and search engines that query XML and return XML. Examples: Xpath, Xquery /SQL4X, Equix, XSEarch Displaying XML: An XML document can have an associated style-sheet which specifies how the document should be translated to HTML. Examples: CSS, XSL

51 DTD: Document Type Descriptors Document Type Descriptors (DTDs) impose structure on an XML document There is some relationship between a DTD and a schema


Download ppt "Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:"

Similar presentations


Ads by Google