Hey folks, RedisGraph seems to be a great solution for resolving some of our runtime metadata lookup needs. The query performance for finding indirect relationships between nodes in or graphs has been great, often sub-millisecond and usually less than 2-3 milliseconds. That part is pretty amazing.
The part that is less than amazing is now that we’ve added RedisGraph module to our redis services (1 master, two replicas) is its become much much slower to recover / restart and be accessible since we’ve done this. We are using appendonly AOF persistance so if a redis node in kubernetes goes down, nothing should be lost. We are also using Amazon spot instances which periodically drop but due to our infrastructure replacements are immediately spun up and attached to the prior storage.
When we had redis that was ONLY key values, it would take only 10-15 seconds upon startup for it to traverse its entire appendonly file and be ready for queries. Now, having added graphs, it takes several minutes (the graphs writes we do seem to take on average ~15 ms to execute as it involves several merges and a delete). We’ve loaded about 47,000 nodes into a proof of concept graph and it makes many minutes to startup, but if its running statements from AOF file, it takes a while for all those graph statements to be reloaded. That’s a definite challenge and I am wondering what we can do to speed up recovery times. If we switch to snapshots only, I suspect it will recover faster. In a couple tests it was queryable within 45 seconds using RDB instead of AOF. We could switch to that, but we’ve stick with AOF because we don’t wanna drop any transactions.
Another MADDENING issue is I’ve seen this appear with redislabs/redisgraph docker image is even running the data locally on my laptop it will never be ready for queries and will show “module is currently replicating”. The one I am running on my desktop doesn’t have replicas (?!). I launched the image in our kubernetes cluster and the logs say the master and replicas are in good states but a graph where, even a read-only graph where reports “module is currently replicating” hours after I have spun them up.
I see another post on this forum about the replicating issue but that is IT, I cannot find other places talking about that error and while that posted posted a solution, its not super obvious what the solution is or what the cause of the original issue was.
I’m trying really hard to make a case for RedisGraph to my customer but stuff like this really puts a damper in my advocacy. We’ve not had an issue with redis replicas before adding RedisGraph module. I need it to be stable and I need to be able to use replicas without them getting in a stuck “replicating” situation.
Any deeper insight on how to improve upon startup times and make replicas responsible again is deeply appreciated. Thanks! -Michael
Update:
Yeah, the replicas have been up for over 24 hours and never get into a state where they are “ready” for RedisGraph. They can serve other key/values just fine…
$ redis-cli
127.0.0.1:6379> GRAPH.QUERY SourceSets "MATCH (n) RETURN n LIMIT 10"
(error) READONLY You can't write against a read only replica.
127.0.0.1:6379> GRAPH.RO_QUERY SourceSets "MATCH (n) RETURN n LIMIT 10"
(error) ERR RedisGraph module is currently replicating
If I look at this log output for Redis, this is all I’ve seen since we brought it online:
1:S 26 Feb 2021 17:44:19.121 * Connecting to MASTER our-redis-master.svc.cluster.local:6379
1:S 26 Feb 2021 17:44:19.122 * MASTER <-> REPLICA sync started
1:S 26 Feb 2021 17:44:19.122 * Non blocking connect for SYNC fired the event.
1:S 26 Feb 2021 17:44:19.125 * Master replied to PING, replication can continue...
1:S 26 Feb 2021 17:44:19.127 * Partial resynchronization not possible (no cached master)
1:S 26 Feb 2021 17:44:19.130 * Full resync from master: e9c2d80c2ac20bb99606c56c554a2bf24958b1e8:0
1:S 26 Feb 2021 17:44:19.515 * MASTER <-> REPLICA sync: receiving 13937159 bytes from master to disk
1:S 26 Feb 2021 17:44:19.594 * MASTER <-> REPLICA sync: Flushing old data
1:S 26 Feb 2021 17:44:19.594 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 26 Feb 2021 17:44:19.625 * Loading RDB produced by version 6.0.11
1:S 26 Feb 2021 17:44:19.625 * RDB age 0 seconds
1:S 26 Feb 2021 17:44:19.625 * RDB memory usage when created 54.74 Mb
1:S 26 Feb 2021 17:44:20.225 * <graph> Done decoding graph SourceSets
1:S 26 Feb 2021 17:44:20.225 * <graph> Done decoding graph Tests
1:S 26 Feb 2021 17:44:55.836 * <graph> Done decoding graph SourceSets
1:S 26 Feb 2021 17:44:55.836 * MASTER <-> REPLICA sync: Finished with success
1:S 26 Feb 2021 17:44:55.837 * Background append only file rewriting started by pid 201
201:C 26 Feb 2021 17:44:55.838 * <graph> Created 1 virtual keys for graph SourceSets
201:C 26 Feb 2021 17:44:55.838 * <graph> Created 0 virtual keys for graph Tests
201:C 26 Feb 2021 17:44:56.096 * <graph> Done encoding graph SourceSets
201:C 26 Feb 2021 17:44:56.267 * <graph> Done encoding graph Tests
201:C 26 Feb 2021 17:44:56.351 * <graph> Done encoding graph SourceSets
1:S 26 Feb 2021 17:44:56.462 * AOF rewrite child asks to stop sending diffs.
201:C 26 Feb 2021 17:44:56.462 * Parent agreed to stop sending diffs. Finalizing AOF...
201:C 26 Feb 2021 17:44:56.462 * Concatenating 0.00 MB of AOF diff received from parent.
201:C 26 Feb 2021 17:44:56.462 * SYNC append only file rewrite performed
201:C 26 Feb 2021 17:44:56.462 * <graph> Deleted 1 virtual keys for graph SourceSets
201:C 26 Feb 2021 17:44:56.463 * <graph> Deleted 0 virtual keys for graph Tests
201:C 26 Feb 2021 17:44:56.465 * AOF rewrite: 2 MB of memory used by copy-on-write
1:S 26 Feb 2021 17:44:56.540 * Background AOF rewrite terminated with success
1:S 26 Feb 2021 17:44:56.540 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
1:S 26 Feb 2021 17:44:56.540 * Background AOF rewrite finished successfully
I’ve seen this behavior with Redis 6.0.5 and 6.0.11 now… :-/