Introduction to Data Structures Vamshi Ambati

Slides:



Advertisements
Similar presentations
Chapter 7. Binary Search Trees
Advertisements

Data Structures A data structure is a collection of data organized in some fashion that permits access to individual elements stored in the structure This.
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Queues1 Part-B2 Queues. Queues2 The Queue ADT (§4.3) The Queue ADT stores arbitrary objects Insertions and deletions follow the first-in first-out scheme.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Stacks, Queues, and Deques. 2 A stack is a last in, first out (LIFO) data structure Items are removed from a stack in the reverse order from the way they.
Maps. Hash Tables. Dictionaries. 2 CPSC 3200 University of Tennessee at Chattanooga – Summer 2013 © 2010 Goodrich, Tamassia.
CS 307 Fundamentals of Computer Science 1 Abstract Data Types many slides taken from Mike Scott, UT Austin.
Dictionaries and Hash Tables1  
1 L43 Collections (3). 2 OBJECTIVES  To use the collections framework interfaces to program with collections polymorphically.  To use iterators to “walk.
1 CS 430 / INFO 430 Information Retrieval Lecture 4 Searching Full Text 4.
1/51 Dictionaries, Tables Hashing TCSS 342 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.
ECOMMERCE TECHNOLOGY SUMMER 2002 COPYRIGHT © 2002 MICHAEL I. SHAMOS Lecture 5: Search Engines.
Priority Queues. Container of elements where each element has an associated key A key is an attribute that can identify rank or weight of an element Examples.
ECOMMERCE TECHNOLOGY FALL 2003 COPYRIGHT © 2003 MICHAEL I. SHAMOS Lecture 5: Search Engines.
Liang, Introduction to Java Programming, Tenth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 20 Lists, Stacks, Queues, and Priority.
Marc Smith and Jim Ten Eyck
1 CS 430 / INFO 430 Information Retrieval Lecture 4 Searching Full Text 4.
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
Dictionaries CS 105. L11: Dictionaries Slide 2 Definition The Dictionary Data Structure structure that facilitates searching objects are stored with search.
October 18, Algorithms and Data Structures Lecture V Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Chapter 3 List Stacks and Queues. Data Structures Data structure is a representation of data and the operations allowed on that data. Data structure is.
Java™ How to Program, 9/e Presented by: Dr. José M. Reyes Álamo © Copyright by Pearson Education, Inc. All Rights Reserved.
Jan 12, 2012 Introduction to Collections. 2 Collections A collection is a structured group of objects Java 1.2 introduced the Collections Framework Collections.
© 2011 Pearson Addison-Wesley. All rights reserved 11 B-1 Chapter 11 (continued) Trees.
CS 162 Intro to Programming II Searching 1. Data is stored in various structures – Typically it is organized on the type of data – Optimized for retrieval.
111 © 2002, Cisco Systems, Inc. All rights reserved.
CSS446 Spring 2014 Nan Wang.  Java Collection Framework ◦ Set ◦ Map 2.
Some Other Collections: Bags, Sets, Queues and Maps COMP T2 Lecture 4 School of Engineering and Computer Science, Victoria University of Wellington.
CPSC 102: Computer Science II Dr. Roy P. Pargas 408 Edwards Hall Office Hours 10:00-11:00 am MWF 2:00-3:00 pm TTh.
Data structures Abstract data types Java classes for Data structures and ADTs.
CSS446 Spring 2014 Nan Wang  Java Collection Framework ◦ LinkedList ◦ Set ◦ Map 2.
HIT2037- HIT6037 Software Development in Java 22 – Data Structures and Introduction.
Binary Search Trees (10.1) CSE 2011 Winter November 2015.
13-1 Sets, Bags, and Tables Exam 1 due Friday, March 16 Wellesley College CS230 Lecture 13 Thursday, March 15 Handout #23.
1 Searching the dictionary ADT binary search binary search trees.
© 2004 Goodrich, Tamassia Hash Tables1  
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
The ADT Table The ADT table, or dictionary Uses a search key to identify its items Its items are records that contain several pieces of data 2 Figure.
Collections Data structures in Java. OBJECTIVE “ WHEN TO USE WHICH DATA STRUCTURE ” D e b u g.
Ordered Linked Lists using Abstract Data Types (ADT) in Java Presented by: Andrew Aken.
1/16/20161 Introduction to Graphs Advanced Programming Concepts/Data Structures Ananda Gunawardena.
Abstract Data Type EnhanceEdu.
Priority Queues. Priority Queue ADT A priority queue stores a collection of entries Each entry is a pair (key, value) Main methods of the Priority Queue.
CSCE 3110 Data Structures & Algorithm Analysis Rada Mihalcea Dictionaries. Reading Weiss Chap. 5, Sec
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Frank Carrano, © 2012.
Hash Tables ADT Data Dictionary, with two operations – Insert an item, – Search for (and retrieve) an item How should we implement a data dictionary? –
 Packages:  Scrapy, Beautiful Soup  Scrapy  Website  
C++ Review STL CONTAINERS.
Dictionaries CS 110: Data Structures and Algorithms First Semester,
Collections Dwight Deugo Nesa Matic
3-1 Java's Collection Framework Another use of polymorphism and interfaces Rick Mercer.
COMP 103 Maps and Queues. RECAP  Iterators (for-each loop)  Bag, Sets, and Stacks - a class, not interface TODAY  Maps and Queues 2 RECAP-TODAY QUICK.
Information Retrieval Inverted Files.. Document Vectors as Points on a Surface Normalize all document vectors to be of length 1 Define d' = Then the ends.
9/27/2016IT 1791 Abstraction A tool (concept) to manage complexity Hide irrelevant details; focus on the features needed Primitive date types are already.
Why indexing? For efficient searching of a document
Trees Chapter 11 (continued)
Trees Chapter 11 (continued)
Efficiency of in Binary Trees
Ordered Maps & Dictionaries
Lecture 5: Search Engines
"He's off the map!" - Eternal Sunshine of the Spotless Mind
Queues Jyh-Shing Roger Jang (張智星)
Introduction to Data Structure
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
CS210- Lecture 15 July 7, 2005 Agenda Median Heaps Adaptable PQ
Presentation transcript:

Introduction to Data Structures Vamshi Ambati

Overview  Java you need for the Project  Search Engine and Data Structures  THIS Code Structure  On the Data Structure front Dictionaries (Dictionary Structures) Java Collections Linked List Queue [c] Vamshi Ambati2

Java you will need for the Project  Core Programming + I/O and Files  OOPS Inheritance Packages Encapsulation  Java API Collections [c] Vamshi Ambati3

What is a Search Engine?  A sophisticated tool for finding information on the web  An Index for the World Wide Web Analogous to the Index on a textbook Just Imagine a world without Search Engine! [c] Vamshi Ambati4

Why Index in the first place?  Which list is easier to search?  sow fox pig eel yak hen ant cat dog hog  ant cat dog eel fox hen hog pig sow yak  A Sorted list always helps Permits binary search. About log2n probes into list  log2(1 billion) ~ 3 [c] Vamshi Ambati5

How search engines work  The search engines maintain data of web sites in its database.  Use programs (often referred to as "spiders" or "robots") to collect information.  The information is then indexed by the search engine.  It allows users to look for the words or combination of words found in the index

Inverted Files A file is a list of words and this file contains words at various positions. Each entry of the word is associated with a position. [c] Vamshi Ambati8 POS FILE a (1, 4, 24…) entry (17…) file (2, 10) contains(11,….) position (25…) positions (15…) word (20….) words (6,12..). INVERTED FILE

Inverted Files for Multiple Documents [c] Vamshi Ambati9 DOCID OCCUR POS 1 POS “jezebel” occurs 6 times in document 34, 3 times in document 44, 4 times in document LEXICON WORD INDEX

A comprehensive form of Inverted Index [c] Vamshi Ambati10 SOURCE:

THIS  Search engine for the website  Website for the news paper The Hindu  Not for the entire web  Results are confined to only one web site [c] Vamshi Ambati11

Index Structure for our Project (THIS) t.pl?file= htm&date=2004/09 /15/&prd=blhttp:// t.pl?file= htm&date=2004/09 /15/&prd=bl :: 4 t.pl?file= htm&date=2002/10 /27/&prd=maghttp:// t.pl?file= htm&date=2002/10 /27/&prd=mag :: 7.. … htmhttp:// htm :: htmhttp:// htm :: 3.. …. gallery/0166/ htmhttp:// gallery/0166/ htm :: 2 gallery/0048/ htmhttp:// gallery/0048/ htm :: 1.. … … … … … [c] Vamshi Ambati12 India ManMoh an Cricket Bollywo Sharukh Sachin … ….

Search Engines

Search Engine Differences  Coverage (What part of the web do they really cover?)  Crawling algorithms Frequency of crawl depth of visits  Depth -0   Depth -1  Indexing policies Data Structures Representation  Search interfaces  Ranking [c] Vamshi Ambati14

[c] Vamshi Ambati15 Search Engine

Index [c] Vamshi Ambati16 Crawl Search

Index [c] Vamshi Ambati17 Query retrieve ResultSet FinalResult Sort by Rank ResultPage makePage TheWeb Spider Parser URLList crawl parse getNextUrl addUrls addPage Indexer store retrieve

Index [c] Vamshi Ambati18 Query retrieve ResultSet FinalResult Sort by Rank ResultPage makePage TheWeb Spider Parser URLList crawl parse getNextUrl addUrls addPage Indexer store retrieve Where are our data structures and algorithms lying? Queue Priority Queue Hashtable BinaryTree LinkedList MergeSort& InsertionSort

Code Structure (THIS) [c] Vamshi Ambati19 PageImgPageHref PageElement Spider WebSpider PageWord Queue SearchDriver PageLexer HttpTokenizerURLTextReader CrawlerDriver TreeDictionary Query addPage ListDictionary Indexer Index HashDictionary Index Save Restore Crawl Parse DictionaryInterface Inheritance Uses Calls DictionaryDriver

Dictionary Structures (Lexicon)  A Dictionary is an unordered container that contains key- element pairs Ordered Dictionary has the elements in sorted order  Keys are unique, but the values could be any [c] Vamshi Ambati20

Dictionary ADT  size(): returns the number of items in D Output: Integer  isEmpty(): Test whether D is empty. Output: Boolean  elements(): Return the elements stored in D. Output: iterator of elements (objects)  keys(): Return the keys stored in D. Output: iterator of keys (objects)  findElement(k): if D contains an item with key == k, then return the element of that item, else return NO_SUCH_KEY. Output: Object  findAllElements(k): Output: Iterator of elements with key k  insertItem(k,e): Insert an Item with element e and key k into D.  removeElement(k): Remove an item with key == k and return it. If no such element, return NO_SUCH_KEY Output: Object (element)  removeAllElements(k): Remove from D the items with key == k. Output: iterator of elements [c] Vamshi Ambati21 Also see the Java Standard API for Dictionary

Dictionary ADT in THIS Project  size(): returns the number of items in D Output: Integer  isEmpty(): Test whether D is empty. Output: Boolean  getKeys(): Return all the keys of the elements stored in D. Output: String array (Ideally it should be Vector!!)  getValue(k): if D contains an item with key == k, then return the element of that item, else return NULL. Output: Object  insertItem(k,e): Insert an Item with element e and key k into D.  remove(k): Remove an Item with key k from D.  We have customized the Dictionary a bit as we would be inserting only elements of the type !! [c] Vamshi Ambati22

Java Collections  java.util.* (A quite helpful library) Has implementations for most of the Data Structures They make life really easy You can not use the data structures inbuilt unless specified (Eg:Task1 Tasklet-A)  Use them for non-data structural purposes - Collections Eg: Arrays,Vectors, Iterators,Lists, Sets etc You would definitely be using “Iterator” atleast as you would be dealing with many Objects at a time! or.html. or.html [c] Vamshi Ambati23 See:

Other Data structures  Queue  LinkedList Beware! there are no Pointers in Java However there are “references”  Learn more about References in Java  Do not use the java.util package for DataStructures or Sorting Algorithms! You are expected to code them [c] Vamshi Ambati24

Summary  Learn data structures by implementing THIS  Mini version of a real search engine  Frame work is provided  More details in the next video [c] Vamshi Ambati25

THANK YOU [c] Vamshi Ambati26