I’m encountering a bug where a certain request I’m sending to RedisGraph seems to be giving back different answers if I have concurrency turned on (i.e. THREAD_COUNT > 1) non-deterministically. I’m asking for a large amount of data from RedisGraph and sometimes the same query gives back 8035 rows and other times ~5000 (i.e. 5020 or 5212). The correct answer to the query is 8035 rows.
Another oddity is that this does not seem to happen when I send the requests one-by-one, in which case I always get the correct output. When I instead send two or more requests at once (on different threads), some will come back correctly while others yield an incorrect response.
I turned the CACHE_SIZE to 1 and THREAD_COUNT to 1 and every request gives back 8035 rows. I turned the THREAD_COUNT to 2 and CACHE_SIZE to 1 and sometimes I get back ~5000 rows with the same query. Is there some other nob I can try to turn?
It seems like something in the RedisGraph code may be caching a matrix or node set but it’s getting cut off or evicted. Any help would be much appreciated!
Ok, in case you’re a Redis Enterprise customer you can share it via our customer service
Another option might be to try and alter / simply the query to a point which it still recreates the issue but it is OK to share.
Apologies for the delay. We have now fully anonymized our data so we can share the query.
Script to run the query (test.py):
import redis
from redis.commands.graph import Graph
import sys
import random
import string
uid = ''.join(random.choices(string.digits, k=5))
r = redis.Redis(unix_socket_path='/test_socket.sock')
g = Graph(r, name="tg")
q = """
MATCH (r :N1 {path: $path_""" + uid + """})
-[:N1PARENT*0..99]->(fr :N1)
<-[:N1PARENT*0..]-(:N1)
<-[e :N2_TO_N1]-(fi)
<-[ :N3_TO_N2]-(:N3 {name: $name})
RETURN DISTINCT fr, fi
"""
params = {"name": "test", f"path_{uid}": "/"}
def run(name):
res = g.query(q, params)
print(name, len(res.result_set))
run(sys.argv[1])
To trigger the bug, we need to launch several concurrent requests. Here’s a simple driver script to do that (test.sh):
#!/bin/bash
for i in {1..50}
do
python test.py "r$i" &
done
wait < <(jobs -p)
Graph building with dummy data:
In case it helps, here is our configuration file:
A few notes:
Sorry about the mix of raw code & pastebin links; as I new user I was only allowed to have at most 2 links in my post
The strange variable mangling with uid in test.py is only done to circumvent any caching that Redis would otherwise do for our queries
To make the bug easier to reproduce, I added THREAD_COUNT 2 to our config file, but we can trigger it with our full dataset quite easily on a higher thread count as well
With these scripts, on RedisGraph 2.10.4, we can consistently trigger the bug every single run
Please let me know if there is anything else I can provide to make debugging easier. Thank you!
Would you mind sharing the output from the MONITOR command running while your test is being conducted? simply ran redis-cli MONITOR prior to the test execution.
Thank you,
Can you share additional details on how you’ve obtained RedisGraph V2.10.4?
Did you build it from source yourself? are you using Docker to run it?
What is the underline hardware used to run the server ?
We are using a singularity container with some basic scaffolding for python development on CentOS 7 to run RedisGraph. Just to verify that singularity itself is not the source of the problem, I similarly deployed using the podman container runtime and was able to observe the same bug.
We are building Redis from source (from https://download.redis.io/redis-stable.tar.gz) and have copied the redisgraph.so from /usr/lib/redis/modules/redisgraph.so in the docker://redisfab/redisgraph:2.10.4-x64-centos7 image.
Is the docker image you’re using to test this issue publicly available?
if so, can you please provide a link to it?
You’ve mentioned that you’re building Redis from source (unlike RedisGraph which you’re copying from redisfab/redisgraph:2.10.4) is the compilation done on the same machine type you’re running your tests on?
I want to make sure the same Os and same architecture is used for both building and running Redis and RedisGraph.
I’m not familiar with Singularity
Can you please explain this command: cp /usr/lib/redis/modules/redisgraph.so /
The redisfab/redisgraph docker container already comes with a pre built redisgraph module, are you copying a different RedisGraph module built on a different system into testcontainer ?
Apologies for the confusion, this is just me copying that redisgraph.so to the root directory as the Redis config I posted earlier assumes this location:
Good news, I’m able to reproduce the issue using the redisfab/redisgraph:2.10.4-x64-centos7 container given the CACHE_SIZE 1 and THREAD_COUNT 2 configuration.