Download presentation
Presentation is loading. Please wait.
1
Getting Started Writing a Thesis/Dissertation
Dr. Karen C. Davis Electrical & Computer Engineering Dept.
2
Graduation
3
ECES 877 Advanced Data Models and Query Optimization
logical physical advanced data models object-relational data warehouse XML Spring 2007 coming to a classroom near you!
4
Relational Algebra Query Trees
Sujan Turlapaty’s thesis defense: Performance Analysis of Self-Maintainable Data Warehousing Algorithms, 11/99
5
Multiple View Processing Plan (MVPP)
⋈orderkey (v7) Customer (C) Orders (O) Lineitem (L) Nation (N) Part (P) Q1 Q2 Q3 πO.orderkey, O.shippriority (v9) πC.custkey, C.name, C.acctbal, N.name, C.address, C.phone (v12) πP.type, L.extendedprice (v15) σ C.mktsegment = “building” and L.shipdate = “ ” (v8) σ O.orderdate = “ ” (v11) σ L.shipdate = “ ” (v14) ⋈nationkey (v10) ⋈custkey (v6) ⋈partkey (v13) πname, address, phone, acctbal, nationkey, custkey, mktsegment (v1) πorderkey, orderdate, custkey, shippriority (v2) πpartkey, orderkey, shipdate, extendedprice (v3) πnationkey, name (v4) πpartkey, type(v5) view chromosome: index chromosome: Fitness: sum of query processing costs of individual queries using the views and indexes selected This is a typical MVPP. We saw this figure previously to understand the huge solution space.This is built by merging the query tress of Q1 , q2 AND q3.A typical view selection solution ,a view chromosome would like this.A 1 indicates that the view is selected and a 0 that it is not.Similarly, in index chromosome, since the indexes can be built on selection and join attrbutes, If the the number of indexes is 7 The index chromosome looks like. Talk about the correction to the algorithm of moving selections. thesis defense of Sirisha Machiraju: Space Allocation for Materialized Views and Indexes Using Genetic Algorithms, June 2002
6
BH System Architecture
Michael Brant, Binding Hash Technique for XML Query Optimization, 2006
7
Ph.D. (2) Satish Venkatesan, 1996, Database Modeling for Electronic Design Automation Environments, awarded ECECS Outstanding Dissertation Award, 1996. Yunsong Zhan, XML-based Data Integration for Application Interoperability, 2002. M.S. (24) Lun Ye, A Compiler Cooperative Dynamic Memory Management System for C++, 1993. Ron Meade, EasyOpt: A Design Optimization Interface Package, 1994. Rao Seshagiri Kasinadhuni, Design and Performance Issues of Client-Server DBMS Architectures, 1994. Samir Nigam, Transformation-based Semantic Query Optimization for Object-Oriented Databases, 1994. Baskaran Dharmarajan, The Property Map: A Theoretical Foundation and Query Optimization Algorithms, 1997. Mala Rajamani, Reduction and Maintenance of Self-maintainable Views for Data Warehousing, 1997. Veena Pandiri, A Global Framework for Distributed Agent-based Systems, 1997. Radha Ganapathy, Selection of Self-Maintainable Views to Materialize in a Data Warehouse, 1998. Vishal Sheth, Extended Property Maps: An Efficient Access Mechanism for Retrieval from Large Data Sets, 1998. Gayathri Krishnan, Physical Schema Design for Object Databases, 1998. Shobha Ravishankar, Object-Oriented Index Selection and Integration, 1998. Ji Qin, Access Plan Generation for Property Maps and Multidimensional Indexes, 1999. Sujan Turlapaty, Performance Analysis of Self-Maintainable Data Warehousing Algorithms, 1999. Unmi Tina Kang, Path Inherited Dictionary Index (PIDI): An Integrated Object-Oriented Database Index, 2000. Jennifer Grommon-Litton, Heuristic Design Algorithms and Evaluation Methods for Property Maps, 2000. Rajeswari Malladi, Applying Multiple Query Optimization in Mobile Databases, 2001. Xioaming Du, Dynamic Channel and Broadcast Disk Organization in Mobile Databases, 2001. Krishnamoorthy Janakiraman, Entity Identification Using Data Mining Techniques, 2001. Casie Phipps, Migrating an Operational Database Schema to Data Warehouse Schemas, 2002. Ashima Gupta, Performance Comparison of Property Map Indexing and Bitmap Indexing for Data Warehousing, 2002. Sirisha Machiraju, Space Allocation for Materialized Views and Indexes Using Genetic Algorithms, 2002. Ravi Darira, A Design Framework for Property Maps, 2006. Micheale Brant, Binding Hash Technique for XML Query Optimization, 2006. Janet Rajan, A Framework for Medical Acronym Disambiguation, 2007. My Students
8
Thesis/Dissertation Organization
title page, abstract, dedication, table of contents, list of figures, list of tables Introduction Related Research Foundations Results (may be several chapters) Conclusions and Future Work Appendices
9
Sample Table of Contents
10
Introduction introduce the general topic area
narrow the focus to specific topic motivate the research why is it needed? who will benefit from the research? conclude with a clear statement of the problem give a statement of the work provide an overview of the thesis (one sentence per chapter)
11
Sample Introduction conventional database systems are increasingly leveraged for organizational decision-making analysis systems are different than conventional operational systems because … … because of these differences, designing a data warehouse has challenges … this thesis addresses specific phases of design
12
Research Objectives general research objective: one sentence describing what you hope to accomplish (not how!)
13
Parallel Sections: Statement of the Work
specific research objectives: partition the general objective into sub-goals research plan/methodology/tasks/approach: revisit the objectives your approach to solving the problem each objective has an associated task or approach to satisfy the objective expected contributions: revisit the methodology what will you know or have when you’ve done the task? potential impact of your work
14
Sample Parallel Sections
specific research objectives accomplish the general research objective methodology defines approach to accomplishing the objectives expected contributions describe what will be accomplished by executing the methodology
15
Related Research focused around your topic; not a tutorial!
compare/contrast to your approach tables with features/research efforts are concise, readable way to summarize
16
Examples of Summary Tables
17
Foundations work you build on (your own or someone else’s)
definitions, theorems, models, system
18
Research Discuss conventions, setup, hypotheses of experiments, proofs
why did you do it? what did you learn from it? Presenting figures algorithms tables graphs Sample! Don’t do a dump of everything … put everything in appendices and discuss representative results in the body of the thesis or dissertation
19
Example Experiment Setup
Goals What are the comparative storage and retrieval cost of REBSI and PMaps in different scenarios? How is individual and relative performance affected by parameters such as blocksize, database size, selectivity of queries and cardinality of attributes, kind of queries, property ordering? Can PMaps design and performance be improved using this knowledge? In what conditions is it better to use either index? Query Set PMap PMap Performance [1…6] properties and Storage Cost Word Size (ws) (pu, pstring) {16, 32} Tuple size (t) {1,000,000, 50,000} Blocksize (SB) {2048, 4096, 8192} REBSI Performance and Storage Cost Scaling Factor (sf) {min, …, 10} PMap Creator PMap storage and performance measurement simulator REBSI storage and performance measurement simulator query set (which fixes the dimensionality of the index, d, and cardinality, c, we have 2 values of ws (word size) Creation: input to the PMap creation algos determine the PMap properties & pu (pstring utilization) Query processing: pfilterl, pfilterh, Sq and Sqt for each of the queries
20
Example Presentation of Results
Observations: REBSI performance improves as the sf becomes larger. REBSI performance improves as cardinality becomes smaller. PAvg performance deteriorates as cardinality becomes smaller. PAvg is better than REBSI min_sf (4) for all queries. PMin << pages retrieved by any REBSI. PMap retrieves fewer pages for multi-attribute queries than single attribute queries. Queries are ordered by the difference of REBSI with min_sf and PAvg. number figures (e.g., Figure 3.2) refer to the figures in text “In Figure 3.2, results for the HCAQS are shown.” “Figure 3.2 shows HCAQS results.” explain the conventions “The x-axis shows individual queries and the y-axis shows index pages retrieved. The queries are ordered by decreasing cardinality.” offer observations to help the reader see what is important or interesting “REBSI performance improves as the cardinality decreases.” discuss possible reasons for the observed results give general conclusions x-axis – queries; y-axis - index pages. Low number of pages retrieved for a query indicates good performance (PMin), maximum (PMax) and average (PAvg) number of index pages shown. Queries ordered in decreasing order of difference of PAvg and pages retrieved by REBSI with the min_sf.
21
Conclusions and Future Work
revisit objectives what was accomplished? what was learned? topics for future work extensions open questions
22
Conclusions: Future Work:
BH method work well for deeply nested queries with few branches (non-bushy) BH Indexing technique requires further optimization BindingCollection is a flexible data structure Can be used in to generate witness trees for processing embedded Xpath expressions Used to process Xpath expression directly Can use a different indexing schemes Future Work: Modify indexing technique to increase performance and perform inequality matching Expand Post-order Traversal to support more TAX pattern tree features: e.g., value-based joins Expand more extensive performance study
23
Citations allow the reader to follow up on the topic
fill in background information judge what you’ve said by reading original sources relieve you of the burden of going over all territory on a subject strengthen/justify your point respect your peers by acknowledging their contributions [vL78]
24
Citations not a part of speech!
#11: never, ever, use a bracketed number as if it were the name of an author or a work [vL78]: BAD: “In [23], algorithms are presented …” GOOD: “Jones presents algorithms … [23].”
25
Writing Style avoid vague words (e.g., “deals with,” “handles”)
avoid contractions be consistent in spelling, punctuation, capitalization style use the same grammatical style for items in a list develop flow/transitions between paragraphs, sections, chapters avoid empty sections merge/eliminate single item sublists or subsections place punctuation inside quotes avoid second person (“you …”) try to write in only one verb tense, preferably the present tense use “including …” instead of “etc.” use “such as” instead of “like” put math in definitions, theorems, proofs; explain in English to build the reader’s intuition Use “:” instead of “-” in technical writing Space after “)” and “:” (not before!) Use “that” instead of “which” when not counting things
26
References [s99] Strunk, The Elements of Style, New York: bartleby.com, 1999, [vL78] van Leunen, M.-C., A Handbook for Scholars, Alfred A. Knopf, 1978.
27
Current Research Work Sandipto Banerjee, Ph.D.
Bartley Richardson, Ph.D. Lydia Fitzgerald, M.S. Bill Nicholson, M.S./Ph.D
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.