I’ve been experiencing memory problems running redis graph for long periods of time.
What tools can I use to diagnose and try to find the source of this?
The amount of data in the graph is not changing much (although I am deleting + creating ALOT every few minutes). However, I see the memory used by redis-server increase from 200MB → 8GB over 2 days steadily.
I am also using redisgraph-bulk-loader. I am quickly deleting and then creating graphs using redisgraph-bulk-loader. When we call GRAPH.DELETE does that delete the graph entirely or does it queue an operation to do that?
There is no RedisGraph-specific tooling for introspecting on memory usage; I personally use Valgrind.
In some areas, deleting entities from a graph bookmarks space for reuse instead of freeing it outright, so if your deletions far outpace your creations, you might see increased memory consumption! I doubt that would account for a 7GB increase, however.
By default, GRAPH.DELETE asynchronously marks a graph for deletion, so it’s possible that there are leaks in a workflow that calls GRAPH.DELETE then overwrites the graph key with the bulk loader. This behavior can be overwritten by building RedisGraph with the command make clean && make memcheck. If successful, the server log should include a line like:
156483:M 01 Feb 2021 13:13:51.022 * <graph> Graph deletion will be done synchronously.
But this approach is only recommended for debugging.
I used synchronous graph deletions and then ran valgrind and I think there might be a leak in GraphBlas Matrix resize? Or maybe this is how it’s supposed to run and I’m misunderstanding.
Although the blocks are still reachable, every bulk insert results in a larger used memory (I saw this by running INFO in redis-cli).
I ran redis with valgrind (with synchronous graph deletions) and ran my program. I stopped after a few minutes and looked at the log there were alot of entries that referred to GrB_Matrix_resize:
==1061== 1,824 bytes in 6 blocks are possibly lost in loss record 226 of 233
==1061== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1061== by 0x40149CA: allocate_dtv (dl-tls.c:286)
==1061== by 0x40149CA: _dl_allocate_tls (dl-tls.c:532)
==1061== by 0x49AF322: allocate_stack (allocatestack.c:622)
==1061== by 0x49AF322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==1061== by 0x702EDDA: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==1061== by 0x70268E0: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==1061== by 0x5CF5399: GB_convert_hyper_to_sparse (GB_convert_hyper_to_sparse.c:93)
==1061== by 0x5CF2962: GB_sparse_or_bitmap (GB_conform.c:72)
==1061== by 0x5CF2962: GB_sparse_or_bitmap (GB_conform.c:53)
==1061== by 0x5CF2962: GB_conform (GB_conform.c:357)
==1061== by 0x5C4B6B6: GB_Matrix_wait (GB_Matrix_wait.c:239)
==1061== by 0x5D0E788: GB_resize (GB_resize.c:77)
==1061== by 0x5C38BC6: GrB_Matrix_resize (GrB_Matrix_resize.c:32)
==1061== by 0x5C38BC6: GrB_Matrix_resize (GrB_Matrix_resize.c:12)
==1061== by 0x5B5E4C7: _MatrixSynchronize (graph.c:252)
==1061== by 0x5B61EDC: Graph_GetLabelMatrix (graph.c:1310)
Is this a bug or is this expected?
I can give you the complete valgrind log if that would help further.
To my knowledge, there are no leaks pertaining to GrB_Matrix_resize. GrB_Matrix_resize is called by RedisGraph every time a matrix is retrieved and its dimensions don’t conform to the expectation (X and Y axes being as long as the graph’s node count). This consistency is required to perform traversals with multiplications.
This memory does not get freed until the matrices themselves are freed, which typically means when the entire graph is deleted.
When I stopped redis and got this log every Graph had been deleted synchronously.
Redis Graph still held references (that were possibly lost according to valgrind) to memory allocated by GnB resize even when all the graphs were deleted.
From this summary, many of the possibly lost bytes are allocated by GrB_Matrix_resize.
To be precise, 7904 bytes on Thread 1, which I assume is the Redis main thread where redisgraph-bulk-loader runs.
==1061== LEAK SUMMARY:
==1061== definitely lost: 0 bytes in 0 blocks
==1061== indirectly lost: 0 bytes in 0 blocks
==1061== possibly lost: 20,976 bytes in 69 blocks
==1061== still reachable: 57,610 bytes in 515 blocks
==1061== suppressed: 0 bytes in 0 blocks
==1061== Reachable blocks (those to which a pointer was found) are not shown.
==1061== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1061==
==1061== For lists of detected and suppressed errors, rerun with: -s
==1061== ERROR SUMMARY: 58 errors from 18 contexts (suppressed: 0 from 0)