Binary Fuse Filters: Fast and Tiny Immutable Filters

bits per element	hash functions	fpp
9	6	1.3%
10	7	0.8%
12	8	0.3%
13	9	0.2%
15	10	0.07%
16	11	0.04%

number of hash functions	cache misses (miss)	cache misses (hit)
8	3.5	7.5
11	3.8	10.5

number of hash functions	all out	all in
8	0.95	0.0
11	0.95	0.0

number of hash functions	always out (cycles/entry)	always in (cycles/entry)
8	135	170
11	140	230

bool contain(uint64_t key, const binary_fuse_t *filter) {
  uint64_t hash = mix_split(key, filter->Seed);
  uint8_t f = fingerprint(hash);
  binary_hashes_t hashes = hash_batch(hash, filter);
  f ^= filter->Fingerprints[hashes.h0] ^ filter->Fingerprints[hashes.h1] ^
       filter->Fingerprints[hashes.h2];
  return f == 0;
}

	cache misses	mispredictions
3-wise binary fuse	2.8	0.0
4-wise binary fuse	3.7	0.0

	always out (cycles/entry)	always in (cycles/entry)	bits per entry
Bloom	135	170	12
3-wise bin. fuse	85	85	9.0
4-wise bin. fuse	100	100	8.6

	ns/query (all out)	ns/query (all in)	fpp	bits per entry
Bloom	17	14	0.32%	12.0
Blocked Bloom (NEON)	3.8	3.8	0.6%	12.8
3-wise bin. fuse	3.5	3.5	0.39%	9.0
4-wise bin. fuse	4.0	4.0	0.39%	8.6

	ns/query (all out)	ns/query (all in)	fpp	bits per entry
Bloom	38	33	0.32%	12.0
Blocked Bloom (NEON)	11	11	0.6%	12.8
4-wise bin. fuse	17	17	0.39%	9.0
4-wise bin. fuse	20	20	0.39%	8.6

	bits per entry (raw)	bits per entry (zstd)
Bloom	12.0	12.0
3-wise bin. fuse	9.0	8.59
4-wise bin. fuse	8.60	8.39
theory	8.0	8.0

Binary Fuse Filters: Fast and Tiny Immutable Filters

Probabilistic filters?

Usage scenario?

Theoretical bound

Usual constraints

Hash function

Conventional Bloom filter

Adding an element

Checking an element

Checking an element: implementation

False positive rate

Bloom filters: upsides

Bloom filters: downsides

Memory accesses

Mispredicted branches

Performance

Blocked Bloom filters

Blocked Bloom filters: pros/cons

Binary fuse filters

Arity : 3-wise, 4-wise

Queries are silly

Construction 1

Construction 2

Construction 3

Construction 4

Construction 5

Construction: Performance

How does the performance scale with size?

10M entries

100M entries

Compressibility (zstd)

Sending compressed filters

Some links

Other Links