EWAHBoolArray 0.8.0
A compressed bitmap class in C++
|
This class is a compressed bitmap. More...
#include <ewah.h>
Public Types | |
enum | { RESERVEMEMORY = true } |
enum | { wordinbits = sizeof(uword) * 8 } |
typedef EWAHBoolArraySetBitForwardIterator< uword > | const_iterator |
Public Member Functions | |
void | trim () |
Recover wasted memory usage. | |
bool | get (const size_t pos) const |
Query the value of bit i. | |
bool | empty () const |
Returns true if no bit is set. | |
bool | set (size_t i) |
Set the ith bit to true (starting at zero). | |
operator std::string () const | |
Transform into a string that presents a list of set bits. | |
void | makeSameSize (EWAHBoolArray &a) |
Make sure the two bitmaps have the same size (padding with zeroes if necessary). | |
const_iterator | begin () const |
Returns an iterator that can be used to access the position of the set bits. | |
const_iterator & | end () const |
Basically a bogus iterator that can be used together with begin() for constructions such as for(EWAHBoolArray<uword>::iterator i = b.begin(); i!=b.end(); ++i) {}. | |
std::vector< size_t > | toArray () const |
Retrieve the set bits. | |
void | logicaland (const EWAHBoolArray &a, EWAHBoolArray &container) const |
computes the logical and with another compressed bitmap answer goes into container Running time complexity is proportional to the sum of the compressed bitmap sizes. | |
EWAHBoolArray | logicaland (const EWAHBoolArray &a) const |
computes the logical and with another compressed bitmap Return the answer Running time complexity is proportional to the sum of the compressed bitmap sizes. | |
EWAHBoolArray | operator& (const EWAHBoolArray &a) const |
calls logicaland | |
void | logicalandnot (const EWAHBoolArray &a, EWAHBoolArray &container) const |
computes the logical and with another compressed bitmap answer goes into container Running time complexity is proportional to the sum of the compressed bitmap sizes. | |
EWAHBoolArray | operator- (const EWAHBoolArray &a) const |
calls logicalandnot | |
EWAHBoolArray | logicalandnot (const EWAHBoolArray &a) const |
computes the logical and not with another compressed bitmap Return the answer Running time complexity is proportional to the sum of the compressed bitmap sizes. | |
bool | intersects (const EWAHBoolArray &a) const |
tests whether the bitmaps "intersect" (have at least one 1-bit at the same position). | |
void | logicalor (const EWAHBoolArray &a, EWAHBoolArray &container) const |
computes the logical or with another compressed bitmap answer goes into container Running time complexity is proportional to the sum of the compressed bitmap sizes. | |
size_t | logicalorcount (const EWAHBoolArray &a) const |
computes the size (in number of set bits) of the logical or with another compressed bitmap Running time complexity is proportional to the sum of the compressed bitmap sizes. | |
size_t | logicalandcount (const EWAHBoolArray &a) const |
computes the size (in number of set bits) of the logical and with another compressed bitmap Running time complexity is proportional to the sum of the compressed bitmap sizes. | |
size_t | logicalandnotcount (const EWAHBoolArray &a) const |
computes the size (in number of set bits) of the logical and not with another compressed bitmap Running time complexity is proportional to the sum of the compressed bitmap sizes. | |
size_t | logicalxorcount (const EWAHBoolArray &a) const |
computes the size (in number of set bits) of the logical xor with another compressed bitmap Running time complexity is proportional to the sum of the compressed bitmap sizes. | |
EWAHBoolArray | logicalor (const EWAHBoolArray &a) const |
computes the logical or with another compressed bitmap Return the answer Running time complexity is proportional to the sum of the compressed bitmap sizes. | |
EWAHBoolArray | operator| (const EWAHBoolArray &a) const |
calls logicalor | |
void | logicalxor (const EWAHBoolArray &a, EWAHBoolArray &container) const |
computes the logical xor with another compressed bitmap answer goes into container Running time complexity is proportional to the sum of the compressed bitmap sizes. | |
EWAHBoolArray | logicalxor (const EWAHBoolArray &a) const |
computes the logical xor with another compressed bitmap Return the answer Running time complexity is proportional to the sum of the compressed bitmap sizes. | |
EWAHBoolArray | operator^ (const EWAHBoolArray &a) const |
calls logicalxor | |
void | reset () |
clear the content of the bitmap. | |
size_t | addWord (const uword newdata, const uint32_t bitsthatmatter=8 *sizeof(uword)) |
convenience method. | |
void | printout (std::ostream &o=std::cout) |
void | debugprintout () const |
Prints a verbose description of the content of the compressed bitmap. | |
size_t | sizeInBits () const |
Return the size in bits of this bitmap (this refers to the uncompressed size in bits). | |
size_t | sizeInBytes () const |
Return the size of the buffer in bytes. | |
size_t | addStreamOfEmptyWords (const bool v, size_t number) |
same as addEmptyWord, but you can do several in one shot! returns the number of words added (storage cost increase) | |
size_t | addStreamOfDirtyWords (const uword *v, const size_t number) |
add a stream of dirty words, returns the number of words added (storage cost increase) | |
size_t | addStreamOfNegatedDirtyWords (const uword *v, const size_t number) |
add a stream of dirty words, each one negated, returns the number of words added (storage cost increase) | |
size_t | padWithZeroes (const size_t totalbits) |
make sure the size of the array is totalbits bits by padding with zeroes. | |
size_t | sizeOnDisk (const bool savesizeinbits=true) const |
Compute the size on disk assuming that it was saved using the method "write". | |
size_t | write (std::ostream &out, const bool savesizeinbits=true) const |
Save this bitmap to a stream. | |
size_t | write (char *out, size_t capacity, const bool savesizeinbits=true) const |
same as write(std::ostream...), except that you provide a char pointer and a "capacity" (in bytes). | |
void | writeBuffer (std::ostream &out) const |
This only writes the content of the buffer (see write()) method. | |
size_t | bufferSize () const |
size (in words) of the underlying STL vector. | |
size_t | read (std::istream &in, const bool savesizeinbits=true) |
this is the counterpart to the write method. | |
size_t | read (const char *in, size_t capacity, const bool savesizeinbits=true) |
same as read(std::istream...), except that you provide a char pointer and a "capacity" (in bytes). | |
void | readBuffer (std::istream &in, const size_t buffersize) |
read the buffer from a stream, see method writeBuffer. | |
bool | operator== (const EWAHBoolArray &x) const |
We define two EWAHBoolArray as being equal if they have the same set bits. | |
bool | operator!= (const EWAHBoolArray &x) const |
We define two EWAHBoolArray as being different if they do not have the same set bits. | |
bool | operator== (const BoolArray< uword > &x) const |
bool | operator!= (const BoolArray< uword > &x) const |
EWAHBoolArrayIterator< uword > | uncompress () const |
Iterate over the uncompressed words. | |
EWAHBoolArrayRawIterator< uword > | raw_iterator () const |
To iterate over the compressed data. | |
void | append (const EWAHBoolArray &x) |
Appends the content of some other compressed bitmap at the end of the current bitmap. | |
BitmapStatistics | computeStatistics () const |
For research purposes. | |
BoolArray< uword > | toBoolArray () const |
For convenience, this fully uncompresses the bitmap. | |
template<class container > | |
void | appendSetBits (container &out, const size_t offset=0) const |
Convert to a list of positions of "set" bits. | |
std::vector< size_t > | toVector () const |
Returns a vector containing the position of the set bits in increasing order. | |
size_t | numberOfOnes () const |
Returns the number of bits set to the value 1. | |
void | swap (EWAHBoolArray &x) |
Swap the content of this bitmap with another bitmap. | |
const std::vector< uword > & | getBuffer () const |
EWAHBoolArray (const EWAHBoolArray &other) | |
Please don't copy your bitmaps! The running time complexity of a copy is the size of the compressed bitmap. | |
EWAHBoolArray & | operator= (const EWAHBoolArray &x) |
Copies the content of one bitmap onto another. | |
EWAHBoolArray (EWAHBoolArray &&other) | |
Move constructor. | |
EWAHBoolArray & | operator= (EWAHBoolArray &&x) |
Move assignment operator. | |
void | expensive_copy (const EWAHBoolArray &x) |
This is equivalent to the operator =. | |
void | logicalnot (EWAHBoolArray &x) const |
Write the logical not of this bitmap in the provided container. | |
EWAHBoolArray< uword > | logicalnot () const |
Write the logical not of this bitmap in the provided container. | |
void | inplace_logicalnot () |
Apply the logical not operation on this bitmap. | |
void | setSizeInBits (const size_t size) |
set size in bits. | |
void | fastaddStreamOfEmptyWords (const bool v, size_t number) |
Like addStreamOfEmptyWords but addStreamOfEmptyWords but does not return the cost increase, does not update sizeinbits. | |
void | fastaddStreamOfDirtyWords (const uword *v, const size_t number) |
LikeaddStreamOfDirtyWords but does not return the cost increse, does not update sizeinbits. | |
Static Public Member Functions | |
static EWAHBoolArray | bitmapOf (size_t n,...) |
This class is a compressed bitmap.
This is where compression happens. The underlying data structure is an STL vector.
Definition at line 197 of file runninglengthword.h.
EWAHBoolArraySetBitForwardIterator<uword> ewah::EWAHBoolArray< uword >::const_iterator |
|
inline |
|
inline |
|
inline |
size_t ewah::EWAHBoolArray< uword >::addStreamOfDirtyWords | ( | const uword * | v, |
const size_t | number ) |
add a stream of dirty words, returns the number of words added (storage cost increase)
Definition at line 1002 of file ewah-inl.h.
size_t ewah::EWAHBoolArray< uword >::addStreamOfEmptyWords | ( | const bool | v, |
size_t | number ) |
same as addEmptyWord, but you can do several in one shot! returns the number of words added (storage cost increase)
Definition at line 898 of file ewah-inl.h.
size_t ewah::EWAHBoolArray< uword >::addStreamOfNegatedDirtyWords | ( | const uword * | v, |
const size_t | number ) |
add a stream of dirty words, each one negated, returns the number of words added (storage cost increase)
Definition at line 1069 of file ewah-inl.h.
|
inline |
convenience method.
returns the number of words added (storage cost increase)
Definition at line 475 of file ewah-inl.h.
void ewah::EWAHBoolArray< uword >::append | ( | const EWAHBoolArray< uword > & | x | ) |
Appends the content of some other compressed bitmap at the end of the current bitmap.
Definition at line 777 of file ewah-inl.h.
void ewah::EWAHBoolArray< uword >::appendSetBits | ( | container & | out, |
const size_t | offset = 0 ) const |
Convert to a list of positions of "set" bits.
The recommended container is vector<size_t>.
See also toArray().
Definition at line 845 of file ewah-inl.h.
|
inline |
Returns an iterator that can be used to access the position of the set bits.
The running time complexity of a full scan is proportional to the number of set bits: be aware that if you have long strings of 1s, this can be very inefficient.
It can be much faster to use the toArray method if you want to retrieve the set bits.
|
inlinestatic |
|
inline |
BitmapStatistics ewah::EWAHBoolArray< uword >::computeStatistics | ( | ) | const |
For research purposes.
This computes the number of dirty words and the number of compressed words.
Definition at line 1635 of file ewah-inl.h.
void ewah::EWAHBoolArray< uword >::debugprintout | ( | ) | const |
Prints a verbose description of the content of the compressed bitmap.
Definition at line 1651 of file ewah-inl.h.
|
inline |
|
inline |
|
inline |
|
inline |
LikeaddStreamOfDirtyWords but does not return the cost increse, does not update sizeinbits.
Definition at line 1036 of file ewah-inl.h.
|
inline |
Like addStreamOfEmptyWords but addStreamOfEmptyWords but does not return the cost increase, does not update sizeinbits.
Definition at line 953 of file ewah-inl.h.
|
inline |
|
inline |
void ewah::EWAHBoolArray< uword >::inplace_logicalnot | ( | ) |
Apply the logical not operation on this bitmap.
Running time complexity is proportional to the compressed size of the bitmap. The current bitmap is not modified.
This function takes into account the sizeInBits value. You may need to call "padWithZeroes" to adjust the sizeInBits.
Definition at line 294 of file ewah-inl.h.
bool ewah::EWAHBoolArray< uword >::intersects | ( | const EWAHBoolArray< uword > & | a | ) | const |
tests whether the bitmaps "intersect" (have at least one 1-bit at the same position).
This function does not modify the existing bitmaps. It is faster than calling logicaland.
Definition at line 1594 of file ewah-inl.h.
|
inline |
computes the logical and with another compressed bitmap Return the answer Running time complexity is proportional to the sum of the compressed bitmap sizes.
The sizeInBits() of the result is equal to the maximum that of the current bitmap's sizeInBits() and that of a.sizeInBits().
void ewah::EWAHBoolArray< uword >::logicaland | ( | const EWAHBoolArray< uword > & | a, |
EWAHBoolArray< uword > & | container ) const |
computes the logical and with another compressed bitmap answer goes into container Running time complexity is proportional to the sum of the compressed bitmap sizes.
The sizeInBits() of the result is equal to the maximum that of the current bitmap's sizeInBits() and that of a.sizeInBits().
Definition at line 1381 of file ewah-inl.h.
size_t ewah::EWAHBoolArray< uword >::logicalandcount | ( | const EWAHBoolArray< uword > & | a | ) | const |
computes the size (in number of set bits) of the logical and with another compressed bitmap Running time complexity is proportional to the sum of the compressed bitmap sizes.
Definition at line 1555 of file ewah-inl.h.
|
inline |
computes the logical and not with another compressed bitmap Return the answer Running time complexity is proportional to the sum of the compressed bitmap sizes.
The sizeInBits() of the result should be equal to that of the current bitmap irrespective of a.sizeInBits().
void ewah::EWAHBoolArray< uword >::logicalandnot | ( | const EWAHBoolArray< uword > & | a, |
EWAHBoolArray< uword > & | container ) const |
computes the logical and with another compressed bitmap answer goes into container Running time complexity is proportional to the sum of the compressed bitmap sizes.
The sizeInBits() of the result should be equal to that of the current bitmap irrespective of a.sizeInBits().
Definition at line 1436 of file ewah-inl.h.
size_t ewah::EWAHBoolArray< uword >::logicalandnotcount | ( | const EWAHBoolArray< uword > & | a | ) | const |
computes the size (in number of set bits) of the logical and not with another compressed bitmap Running time complexity is proportional to the sum of the compressed bitmap sizes.
Definition at line 1505 of file ewah-inl.h.
|
inline |
void ewah::EWAHBoolArray< uword >::logicalnot | ( | EWAHBoolArray< uword > & | x | ) | const |
Write the logical not of this bitmap in the provided container.
This function takes into account the sizeInBits value. You may need to call "padWithZeroes" to adjust the sizeInBits.
Definition at line 418 of file ewah-inl.h.
|
inline |
computes the logical or with another compressed bitmap Return the answer Running time complexity is proportional to the sum of the compressed bitmap sizes.
If you have many bitmaps, see fast_logicalor.
The sizeInBits() of the result is equal to the maximum that of the current bitmap's sizeInBits() and that of a.sizeInBits().
void ewah::EWAHBoolArray< uword >::logicalor | ( | const EWAHBoolArray< uword > & | a, |
EWAHBoolArray< uword > & | container ) const |
computes the logical or with another compressed bitmap answer goes into container Running time complexity is proportional to the sum of the compressed bitmap sizes.
If you have many bitmaps, see fast_logicalor_tocontainer.
The sizeInBits() of the result is equal to the maximum that of the current bitmap's sizeInBits() and that of a.sizeInBits().
Definition at line 1193 of file ewah-inl.h.
size_t ewah::EWAHBoolArray< uword >::logicalorcount | ( | const EWAHBoolArray< uword > & | a | ) | const |
computes the size (in number of set bits) of the logical or with another compressed bitmap Running time complexity is proportional to the sum of the compressed bitmap sizes.
Definition at line 1243 of file ewah-inl.h.
|
inline |
computes the logical xor with another compressed bitmap Return the answer Running time complexity is proportional to the sum of the compressed bitmap sizes.
The sizeInBits() of the result is equal to the maximum that of the current bitmap's sizeInBits() and that of a.sizeInBits().
void ewah::EWAHBoolArray< uword >::logicalxor | ( | const EWAHBoolArray< uword > & | a, |
EWAHBoolArray< uword > & | container ) const |
computes the logical xor with another compressed bitmap answer goes into container Running time complexity is proportional to the sum of the compressed bitmap sizes.
The sizeInBits() of the result is equal to the maximum that of the current bitmap's sizeInBits() and that of a.sizeInBits().
Definition at line 1288 of file ewah-inl.h.
size_t ewah::EWAHBoolArray< uword >::logicalxorcount | ( | const EWAHBoolArray< uword > & | a | ) | const |
computes the size (in number of set bits) of the logical xor with another compressed bitmap Running time complexity is proportional to the sum of the compressed bitmap sizes.
Definition at line 1332 of file ewah-inl.h.
|
inline |
size_t ewah::EWAHBoolArray< uword >::numberOfOnes | ( | ) | const |
Returns the number of bits set to the value 1.
The running time complexity is proportional to the compressed size of the bitmap.
This is sometimes called the cardinality.
Definition at line 365 of file ewah-inl.h.
|
inline |
bool ewah::EWAHBoolArray< uword >::operator!= | ( | const BoolArray< uword > & | x | ) | const |
Definition at line 892 of file ewah-inl.h.
bool ewah::EWAHBoolArray< uword >::operator!= | ( | const EWAHBoolArray< uword > & | x | ) | const |
We define two EWAHBoolArray as being different if they do not have the same set bits.
Alternatively, B1!=B2 if and only if cardinality(B1 XOR B2) >0.
Definition at line 881 of file ewah-inl.h.
|
inline |
|
inline |
|
inline |
|
inline |
bool ewah::EWAHBoolArray< uword >::operator== | ( | const BoolArray< uword > & | x | ) | const |
Definition at line 886 of file ewah-inl.h.
bool ewah::EWAHBoolArray< uword >::operator== | ( | const EWAHBoolArray< uword > & | x | ) | const |
We define two EWAHBoolArray as being equal if they have the same set bits.
Alternatively, B1==B2 if and only if cardinality(B1 XOR B2) ==0.
Definition at line 720 of file ewah-inl.h.
|
inline |
|
inline |
size_t ewah::EWAHBoolArray< uword >::padWithZeroes | ( | const size_t | totalbits | ) |
make sure the size of the array is totalbits bits by padding with zeroes.
returns the number of words added (storage cost increase).
This is useful when calling "logicalnot" functions.
This can an adverse effect of performance, especially when computing intersections.
Definition at line 635 of file ewah-inl.h.
|
inline |
EWAHBoolArrayRawIterator< uword > ewah::EWAHBoolArray< uword >::raw_iterator | ( | ) | const |
To iterate over the compressed data.
Can be faster than any other iterator. Running time complexity of a full scan is proportional to the compressed size of the bitmap.
Definition at line 715 of file ewah-inl.h.
size_t ewah::EWAHBoolArray< uword >::read | ( | const char * | in, |
size_t | capacity, | ||
const bool | savesizeinbits = true ) |
same as read(std::istream...), except that you provide a char pointer and a "capacity" (in bytes).
The function never reads at or beyond "in+capacity". If the detected storage exceeds the given capacity, the value zero is returned: it should be considered an error. Otherwise, the number of bytes read is returned.
Definition at line 581 of file ewah-inl.h.
size_t ewah::EWAHBoolArray< uword >::read | ( | std::istream & | in, |
const bool | savesizeinbits = true ) |
this is the counterpart to the write method.
if you set savesizeinbits=false, then you are responsible for setting the value fo the attribute sizeinbits (see method setSizeInBits).
Returns how many bytes were queried from the stream.
Definition at line 556 of file ewah-inl.h.
|
inline |
read the buffer from a stream, see method writeBuffer.
this is for advanced users.
Definition at line 495 of file ewah-inl.h.
|
inline |
bool ewah::EWAHBoolArray< uword >::set | ( | size_t | i | ) |
Set the ith bit to true (starting at zero).
Auto-expands the bitmap. It has constant running time complexity. Note that you must set the bits in increasing order: set(1), set(2) is ok; set(2), set(1) is not ok. set(100), set(100) is also not ok.
Note: by design EWAH is not an updatable data structure in the sense that once bit 1000 is set, you cannot change the value of bits 0 to 1000.
Returns true if the value of the bit was changed, and false otherwise. (In practice, if you set the bits in strictly increasing order, it should always return true.)
Definition at line 257 of file ewah-inl.h.
|
inline |
|
inline |
Return the size in bits of this bitmap (this refers to the uncompressed size in bits).
You can increase it with padWithZeroes()
|
inline |
size_t ewah::EWAHBoolArray< uword >::sizeOnDisk | ( | const bool | savesizeinbits = true | ) | const |
Compute the size on disk assuming that it was saved using the method "write".
Definition at line 1674 of file ewah-inl.h.
void ewah::EWAHBoolArray< uword >::swap | ( | EWAHBoolArray< uword > & | x | ) |
Swap the content of this bitmap with another bitmap.
No copying is done. (Running time complexity is constant.)
Definition at line 766 of file ewah-inl.h.
std::vector< size_t > ewah::EWAHBoolArray< uword >::toArray | ( | ) | const |
Retrieve the set bits.
Can be much faster than iterating through the set bits with an iterator.
Definition at line 383 of file ewah-inl.h.
BoolArray< uword > ewah::EWAHBoolArray< uword >::toBoolArray | ( | ) | const |
For convenience, this fully uncompresses the bitmap.
Not fast!
Definition at line 833 of file ewah-inl.h.
|
inline |
|
inline |
EWAHBoolArrayIterator< uword > ewah::EWAHBoolArray< uword >::uncompress | ( | ) | const |
Iterate over the uncompressed words.
Can be considerably faster than begin()/end(). Running time complexity of a full scan is proportional to the uncompressed size of the bitmap.
Definition at line 710 of file ewah-inl.h.
size_t ewah::EWAHBoolArray< uword >::write | ( | char * | out, |
size_t | capacity, | ||
const bool | savesizeinbits = true ) const |
same as write(std::ostream...), except that you provide a char pointer and a "capacity" (in bytes).
The function never writes at or beyond "out+capacity". If the storage needed exceeds the given capacity, the value zero is returned: it should be considered an error. Otherwise, the number of bytes copied is returned.
Definition at line 525 of file ewah-inl.h.
size_t ewah::EWAHBoolArray< uword >::write | ( | std::ostream & | out, |
const bool | savesizeinbits = true ) const |
Save this bitmap to a stream.
The file format is | sizeinbits | buffer lenth | buffer content| the sizeinbits part can be omitted if "savesizeinbits=false". Both sizeinbits and buffer length are saved using the uint64_t data type. Returns how many bytes were handed out to the stream.
Definition at line 503 of file ewah-inl.h.
|
inline |
This only writes the content of the buffer (see write()) method.
It is for advanced users.
Definition at line 488 of file ewah-inl.h.