Advance Database Systems and Applications COMP 6521

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Path-Sensitive Analysis for Linear Arithmetic and Uninterpreted Functions SAS 2004 Sumit Gulwani George Necula EECS Department University of California,
Chapter 13: Query Processing
2. Getting Started Heejin Park College of Information and Communications Hanyang University.
Visible-Surface Detection(identification)
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Chapter 11: Structure and Union Types Problem Solving & Program Design.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Year 6 mental test 5 second questions
Chapter 9 continued: Quicksort
ZMQS ZMQS
Acceleration of Cooley-Tukey algorithm using Maxeler machine
Addison Wesley is an imprint of © 2010 Pearson Addison-Wesley. All rights reserved. Chapter 10 Arrays and Tile Mapping Starting Out with Games & Graphics.
Parallel List Ranking Advanced Algorithms & Data Structures Lecture Theme 17 Prof. Dr. Th. Ottmann Summer Semester 2006.
David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Chapter 4 Memory Management Basic memory management Swapping
ABC Technology Project
Multilevel Page Tables
Hash Tables.
Association Rule Mining
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
Squares and Square Root WALK. Solve each problem REVIEW:
ITEC200 Week10 Sorting. pdp 2 Learning Objectives – Week10 Sorting (Chapter10) By working through this chapter, students should: Learn.
Processes Management.
Unit 1:Parallel Databases
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
Performance Tuning for Informer PRESENTER: Jason Vorenkamp| | October 11, 2010.
© 2006 Pearson Addison-Wesley. All rights reserved10 A-1 Chapter 10 Algorithm Efficiency and Sorting.
Week 1.
We will resume in: 25 Minutes.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 12, Part A.
Choosing an Order for Joins
Sequential PAttern Mining using A Bitmap Representation
Topic 16 Sorting Using ADTs to Implement Sorting Algorithms.
How Cells Obtain Energy from Food
12-Apr-15 Analysis of Algorithms. 2 Time and space To analyze an algorithm means: developing a formula for predicting how fast an algorithm is, based.
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Data Structures Data Structures Topic #13. Today’s Agenda Sorting Algorithms: Recursive –mergesort –quicksort As we learn about each sorting algorithm,
Association Analysis: Basic Concepts and Algorithms.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Performance and Scalability: Apriori Implementation.
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
HOW TO SOLVE IT? Algorithms. An Algorithm An algorithm is any well-defined (computational) procedure that takes some value, or set of values, as input.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Liang, Introduction to Java Programming, Seventh Edition, (c) 2009 Pearson Education, Inc. All rights reserved Chapter 26 Sorting.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
1 5. Abstract Data Structures & Algorithms 5.6 Algorithm Evaluation.
Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.
Association Analysis (3)
CS 440 Database Management Systems Lecture 5: Query Processing 1.
© 2006 Pearson Addison-Wesley. All rights reserved15 A-1 Chapter 15 External Methods.
CS 540 Database Management Systems
CS 440 Database Management Systems
Data Mining Association Analysis: Basic Concepts and Algorithms
Lecture 2- Query Processing (continued)
Presentation transcript:

Advance Database Systems and Applications COMP 6521 Professor: Dr. Gosta Grahne Lab Instructor: ashkan azarnik Group 15 Aditya Dewal Mohammad Iftekharul Hoque Saleh Ahmed

PROJECT 1 Develop a program which sort numbers in ascending order using 2 Phase Multiway Merge Sort(2PMMS) with limitation of 5MB of virtual memory. External sorting is required when the data being sorted do not fit into the main memory of a computing device and instead they must reside in slower external memory (usually hard drive).

Our approached to solve the problem External sorting typically uses a sort-merge technique. In the sorting phase, chunks of data small enough to fit in main memory are read, sorted in ascending order using quick sort algorithm and written out to a temporary file. In the merge phase, the sorted temporary files are combined using 2 phase multiway merge sort into a single larger file.

Challenges Which algorithm to choose ? Quicksort is one of the fastest and simplest sorting algorithm because its inner loop can be efficiently implemented on most architectures. Efficient average case compared to other sort algorithms. The complexity of quick sort in the average case is O(n log(n)

List of Data Structures Primitive Types: Boolean, Integer, Long Abstract Types: Array, String Arrays (Linear Data Structure) Integer Array, Boolean Array, Long Array I/O: Scanner, PrintWriter

Buffer Size Experiments

Conclusion After our buffer size experiments we concluded that for 160000 number of data which occupying 2.5mb of memory gives best execution time for us.

Results from Demo The execution time to run our program during the demo was 3 minutes. The reason for taking too much time was the way we were taking our input and writing output in our program.

Project 2 Mining Frequent Itemsets from Secondary Memory Build an application that will compute the frequent itemsets of all sizes (Pairs, Triples, Quadruples, etc.) from a set of transactions based on input support threshold percentage.

Algorithms Considered Apriori Eclat Horizontal Data Layout Vertical Data Layout

Algorithms Considered Apriori Eclat Breadth-First Traversal Depth-First Traversal

ECLAT Better Execution Time Memory Efficient Explore the unexplored Execution time is better than Apriori Memory Efficient Require less amount of memory compare to Apriori if itemsets are small in number Depth-First Search Explore the unexplored

ECLAT Algorithm For each item, store a list of transaction ids (tids) TID-list

  ECLAT Algorithm Determine support of any k-itemset by intersecting tid-lists of two of its (k-1) subsets. 3 traversal approaches: top-down bottom-up hybrid  

ECLAT Algorithm

List of Data Structures ECLAT Implementation List of Data Structures Primitive Types Arrays (Linear Data Struc.) Boolean, Integer, Double Hash Map (Hash Table) Abstract Types Hash Set (Hash Map) Map, Set, List, Array, Array List (Dynamic Array) String Bit Set (Bit Array) String Array Trees Search Tree

ECLAT Implementation Our implementation denotes the set of transactions as a bit set. Intersects rows to determine the support of item sets. The search follows a depth first traversal of a prefix tree as it is shown in Figure 1.

Divide and Conquer Phase ECLAT Implementation Divide and Conquer Phase Divide the file in N partitions. If an item is frequent in one partition we don’t check it again. Merge Phase Suppose an item is not frequent in any partition but it is frequent globally, it is going to come when we would merge. In the merge part we would run the algorithm again with the infrequent items.

ECLAT Implementation File size = 10000, Threshold = 2% An item is frequent if it occurs >= 200 times We would get intermediate results by checking all the partitions. Merge part we would work with the infrequent items for each partition, and then merge the results to get the final output list of frequent items

Eclat Execution Time Execution time of Eclat for Small and Medium datasets:

Eclat VS Apriori We have compared the execution time for Apriori and Eclat for Small and Medium datasets and found the following:

Benefits of Divide and Conquer Program executes for Large files. Gives better performance.

Results from Demo Execution time was 35 seconds.

REFERENCES Project 1 Project 2 Database Systems, the complete book by Hector Gracia-Molina, Jeff Ullman, and Jennifer widom http://en.wikipedia.org/wiki/Quicksort Project 2 http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=846291&userType=inst http://www.ece.northwestern.edu/~yingliu/papers/para_arm_cluster.pdf http://ceur-ws.org/Vol-90/borgelt.pdf http://www.isca.in/COM_IT_SCI/Archive/v1i1/2.ISCA-RJCITS-2013-001.pdf http://www.intsci.ac.cn/shizz/fimi.pdf