Download presentation
Presentation is loading. Please wait.
1
Bridget Thomson McInnes 10 October 2003
The vec Perl Function Bridget Thomson McInnes 10 October 2003
2
What is vec Perl function that provides compact storage of lists of unsigned integers Integers are packed as tightly as possible within an ordinary Perl string
3
Why we are interested in vec
When using suffix arrays as a data storage mechanism we need to store the entire corpus into an array. This will not work with a corpus of 50 million tokens due to memory constraints. The idea then is to convert the tokens to unique integers and then store the integers in an array. Again, this will not work due to memory constraints which will be discussed briefly later. Therefore, we are trying to find a way to efficiently store a set of approximately 50 million integers.
4
The vec Function EXPR OFFSET BIT vec EXPR, OFFSET, BIT
Is the string that the integers are packed into OFFSET Specifies the index of the particular element that is to be retrieved BIT Specifies how wide each element is in bits
5
BITS Must be a power of two: 1, 2, 4, 8, 16, ect Example
When BITS = 1, there are 8 elements per byte When BITS = 2, there are 4 elements per byte When BITS = 4, there are 2 elements per byte
6
Quick Example Program $bitstring = “”; $offset = 0; for $i(0..20) {
vec($bitstring, $offset++, 4) = $i; }
7
Retrieving the Data $a = vec($bitstring, 3, 4);
$offset = 0; for $i(0..20) { vec($bitstring, $offset++, 4) { } $a = vec($bitstring, 3, 4); $b = vec($bitstring, 6, 4); print “a = $a\n”; print “b = $b\n”;
8
Output Output: So each offset represents an array indice.
csirh012% perl vec.pl a = 3 b = 6 csirh013% So each offset represents an array indice.
9
Experiments I ran three different experiments:
Loaded an array with 50 million integers Loaded a vec with 50 million integers Loaded a vec with 100 million integers
10
Expirements 1 Loading an array with 50 million integers
Results (performed on marengo) Ran out of memory at the 19,857,659 Used approximately 700 MB of memory Memory was determined by using the Perl Module Devel::Size (total_size command)
11
Experiment 2 Loaded a vec array using a bit parameter of 32 with 50 million integers Results (performed on csirh0) Memory : 192 M Time : s
12
Experiment 3 Loaded a vec array with a bit parameter of 32 with 60 million integers Results (performed on csirh0) Memory : 230 M Time : 2.98s
13
Notes on Experiments A bit parameter less than 32 will not return all of the integers inserted into it for our experiments Loading a vec array of 100 million integers runs out of memory
14
Modules that use vec Tie::VecArray Class::Bits Bit::Vector
An array interface to a bit vector Class::Bits Class wrapper around bit vectors Bit::Vector Efficient bit vector, set of integers and math library Widely used
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.