Bridget Thomson McInnes 10 October 2003

Slides:



Advertisements
Similar presentations
Chapter 7 Strings F To process strings using the String class, the StringBuffer class, and the StringTokenizer class. F To use the String class to process.
Advertisements

ECE 353 Introduction to Microprocessor Systems Michael G. Morrow, P.E. Week 6.
Modified from notes by Saeid Nooshabadi COMP3221: Microprocessors and Embedded Systems Lecture 25: Cache - I Lecturer:
NetCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002.
© iPerimeter Ltd Unix and IBM i  AIX and Linux run natively on Power Systems  IBM i can do Unix type things in two ways:  Posix/QShell  Ordinary.
Handling Lists F. Duveau 16/12/11 Chapter 9.2. Objectives of the session: Tools: Everything will be done with the Python interpreter in the Terminal Learning.
Data Representation A series of eight bits is called a byte. A byte can be used to represent a number or a character. As you’ll see in the following table,
Cosc 2150: Computer Organization Chapter 2 Part 1 Integers addition and subtraction.
Introduction to Perl Part III By: Bridget Thomson McInnes 6 Feburary 2004.
Computer Organization – Memory Access David Monismith Jan. 30, 2015 Based upon notes by Dr. Bill Siever and the Patterson and Hennessy Text.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Arrays in MIPS Assembly Computer Organization and Assembly Language: Module 6.
CS 31 Discussion, Week 7 Faisal Alquaddoomi, Office Hours: BH 2432, W 4:30-6:30pm, F 12:30-1:30pm.
C Programming Day 2. 2 Copyright © 2005, Infosys Technologies Ltd ER/CORP/CRS/LA07/003 Version No. 1.0 Union –mechanism to create user defined data types.
7-Nov Fall 2001: copyright ©T. Pearce, D. Hutchinson, L. Marshall Oct lecture23-24-hll-interrupts 1 High Level Language vs. Assembly.
Coming up ArrayList ArrayList vs Array – Declaration – Insertion – Access – Removal Wrapper classes Iterator object.
Web Database Programming Using PHP
Cosc 2150: Computer Organization
Chapter 3 Data Representation
Creating Database Objects
Pointers What is the data type of pointer variables?
4. Java language basics: Function
Module 11: File Structure
Array, Strings and Vectors
Web Database Programming Using PHP
University of Central Florida COP 3330 Object Oriented Programming
TMF1414 Introduction to Programming
Foundations of Programming: Arrays
5. Function (2) and Exercises
Numeric Arrays Numeric Arrays Chapter 4.
Other Kinds of Arrays Chapter 11
Overview of the Lab 3 Assignment: Kernel Module Concurrent Memory Use
JavaScript: Functions.
Programmazione I a.a. 2017/2018.
Hash functions Open addressing
Stack Data Structure, Reverse Polish Notation, Homework 7
Clear1 and Clear2 clear1(int array[], int size) { int i; for (i = 0; i < size; i += 1) array[i] = 0; } clear2(int *array, int size) {
Array List Pepper.
Advanced Associative Structures
CS313D: Advanced Programming Language
CPE/EE 422/522 Advanced Logic Design L15
Tutorial - 6 Module 4.
Can store many of the same kind of data together
Chapter 21 Hashing: Implementing Dictionaries and Sets
Introduction to Database Systems
Lecture 22: Cache Hierarchies, Memory
Array-Based Sequences
Handles disk file 0000: array of file-offsets 0001: 0002: 0003: 0: …
Memory Allocation CS 217.
Arrays Strings and Parameter Passing CSCI N305
Data Structures and Database Applications Arrays And Lists
Chapter 8 Collection Types.
Can store many of the same kind of data together
Introduction C is a general-purpose, high-level language that was originally developed by Dennis M. Ritchie to develop the UNIX operating system at Bell.
CS2011 Introduction to Programming I Arrays (I)
Overall Kernel Module Design
Building Java Programs
CS-447– Computer Architecture Lecture 20 Cache Memories
Can store many of the same kind of data together
Homework Reading Programming Assignments Finish K&R Chapter 1
Python Basics with Jupyter Notebook
Binary System.
EXPRESSIONS, PAUSES AND SOUNDS
ECE 103 Engineering Programming Chapter 38 C Pointers, Part 2
Minwise Hashing and Efficient Search
Creating Database Objects
14 – Sequential Containers
A type is a collection of values
INTRODUCING PYTHON PANDAS:-SERIES
Introduction to Pointers
Presentation transcript:

Bridget Thomson McInnes 10 October 2003 The vec Perl Function Bridget Thomson McInnes 10 October 2003

What is vec Perl function that provides compact storage of lists of unsigned integers Integers are packed as tightly as possible within an ordinary Perl string

Why we are interested in vec When using suffix arrays as a data storage mechanism we need to store the entire corpus into an array. This will not work with a corpus of 50 million tokens due to memory constraints. The idea then is to convert the tokens to unique integers and then store the integers in an array. Again, this will not work due to memory constraints which will be discussed briefly later. Therefore, we are trying to find a way to efficiently store a set of approximately 50 million integers.

The vec Function EXPR OFFSET BIT vec EXPR, OFFSET, BIT Is the string that the integers are packed into OFFSET Specifies the index of the particular element that is to be retrieved BIT Specifies how wide each element is in bits

BITS Must be a power of two: 1, 2, 4, 8, 16, ect Example When BITS = 1, there are 8 elements per byte When BITS = 2, there are 4 elements per byte When BITS = 4, there are 2 elements per byte

Quick Example Program $bitstring = “”; $offset = 0; for $i(0..20) { vec($bitstring, $offset++, 4) = $i; }

Retrieving the Data $a = vec($bitstring, 3, 4); $offset = 0; for $i(0..20) { vec($bitstring, $offset++, 4) { } $a = vec($bitstring, 3, 4); $b = vec($bitstring, 6, 4); print “a = $a\n”; print “b = $b\n”;

Output Output: So each offset represents an array indice. csirh012% perl vec.pl a = 3 b = 6 csirh013% So each offset represents an array indice.

Experiments I ran three different experiments: Loaded an array with 50 million integers Loaded a vec with 50 million integers Loaded a vec with 100 million integers

Expirements 1 Loading an array with 50 million integers Results (performed on marengo) Ran out of memory at the 19,857,659 Used approximately 700 MB of memory Memory was determined by using the Perl Module Devel::Size (total_size command)

Experiment 2 Loaded a vec array using a bit parameter of 32 with 50 million integers Results (performed on csirh0) Memory : 192 M Time : 2.78 s

Experiment 3 Loaded a vec array with a bit parameter of 32 with 60 million integers Results (performed on csirh0) Memory : 230 M Time : 2.98s

Notes on Experiments A bit parameter less than 32 will not return all of the integers inserted into it for our experiments Loading a vec array of 100 million integers runs out of memory

Modules that use vec Tie::VecArray Class::Bits Bit::Vector An array interface to a bit vector Class::Bits Class wrapper around bit vectors Bit::Vector Efficient bit vector, set of integers and math library Widely used