We have some time working with RedisBloom Filter with a large amount of data, let’s say over 250 million of records.
In this context, we have found an issue in latest version published: 2.2.0. When we tried to reserve space for more than 300 million using an error rate of 0.000001, the size was unexpectedly smaller compared to a reservation for 250 million. However, there is no error message, the only thing you can notice that something is wrong is after inserting some data. Then, everything becomes available and the results are erratic.
To avoid this issue, we had to downgrade the service and continue working with the previous version 2.0.3. So please, if you can take a look to this behavior, it will be appreciated by our team.
V2.2 brings several advances including a bug fix that corrects/reduces the number of hashes used (== lower memory usage) and an option to make the filter unscalable (if you know your total size) which will reduce memory footprint as well. If you are interested in discussing your use case in order to get some feedback, I will be happy to assist.
A fix was merged into the master branch on RedisBloom repository.
The same issue existed prior to V2.2.0 and the reason you were not aware is probably the lack of detailed TS.INFO command. I would recommend you move to v2.2.1 (will be tagged soon).
Sorry for the late reply. First of all, many thanks for the great application.
Regarding Bloom filter bug, I haven’t experienced any issue until now with the same data. In fact, the way we notice the issue was the amount of RAM taken by the process. In the version 2.0.3 the app takes approximately 1.3 GB of RAM to allocate 250.000.000 of records. However, in the new version that was not happening (it takes 100 MB as much).
Please, count on me to get feedback of new releases in the future.
Many thanks again, I will be waiting for the new version.
Please note, with this version you can use NONSCALING flag with BF.RESERVE if you know the size of your filter. This will save you 1 hash function (20 vs 21) and 1.44 bit per entry since you are using less hashing.