RedisGraph 72k nodes, 400k relationships - slow query performance help

abe · April 9, 2021, 8:52pm

Hello,

I’ve got what I think is a fairly simple graph of Person nodes tagged with an id attribute and some nodes optionally tagged with a special attribute. There is a :contributesTo self-relation between Person nodes.

Here are two Cypher files for loading the nodes and relations:

I’m trying to find all the Person nodes who ultimately have an ancestral :contributesTo relationship to another Person with the special attribute set, decided recursively (so if a person contributes to a non-special person, but that person then contributes to a “special” person, that initial person would still show up in the result).

So I wrote the query like this:

MATCH (d)-[:contributesTo*1..]->(r) WHERE r.special = 'T' RETURN DISTINCT d.id

But that just seems to hang indefinitely.

So then I tested specific cutoff depths.

This took 2039.9ms according to SLOWLOG:

MATCH (d)-[:contributesTo*0..1]->(r) WHERE r.special = 'T' RETURN DISTINCT d.id

This took 6712ms:

MATCH (d)-[:contributesTo*0..2]->(r) WHERE r.special = 'T' RETURN DISTINCT d.id

And this took 6.4243e+05ms:

MATCH (d)-[:contributesTo*0..3]->(r) WHERE r.special = 'T' RETURN DISTINCT d.id

So it’s obviously getting slower with increasing depth.

I do have indexes on the id and special node attributes:

CREATE INDEX ON :Person(id)
CREATE INDEX ON :Person(special)

But I’m pretty new to graph databases so not sure what to do next to make this faster. Any help would be greatly appreciated!

Edit: I just realized my queries weren’t using labels so I tried that -

MATCH (d:Person)-[:contributesTo*1..]->(r:Person) WHERE r.special = 'T' RETURN DISTINCT d.id

but it doesn’t seem to have made a perceptible difference.

Edit2: Forgot to mention this is on Redis 6.2.1, RedisGraph compiled from git SHA 92046602a775dcba2396d493597b60043f0e39bb, on an Ubuntu 20.04 machine with 32GB RAM and an Intel i7 CPU. Also, the queries are still slow without using DISTINCT.

jeffreylovitz · April 12, 2021, 1:43pm

Hi @abe,

The bulk of your query’s time will be spent in expanding paths in the variable-length traversal. Given your sample graph, this is a rather large job! You have 13,731 source nodes to consider. When traversing 1 edge this expands to 152,291 destinations, and when traversing up to 2 edges this expands to 9,264,429 destinations (and so forth):

GRAPH.QUERY Donations "MATCH (d:Person)-[:contributesTo*..1]->(r:Person) WHERE r.special = 'T' RETURN COUNT(d)"
1) 1) "COUNT(d)"
2) 1) 1) (integer) 152291
3) 1) "Cached execution: 0"
   2) "Query internal execution time: 117.677467 milliseconds"

GRAPH.QUERY Donations "MATCH (d:Person)-[:contributesTo*..2]->(r:Person) WHERE r.special = 'T' RETURN COUNT(d)"
1) 1) "COUNT(d)"
2) 1) 1) (integer) 9264429
3) 1) "Cached execution: 0"
   2) "Query internal execution time: 59246.093526 milliseconds"

To improve performance, it would be best to introduce additional constraints to reduce the search space of your queries.

Topic		Replies	Views
RedisGraph 200+M nodes, 1B+ relationships, slow subgraph query performance RedisGraph redisgraph	1	921	October 20, 2021
How Jredisgraph is slower RedisGraph	5	1108	June 15, 2020
Recursive query on Graph RedisGraph	4	1178	December 24, 2020
RedisGraph "Recommended Products" query has a huge execution time RedisGraph	5	774	June 23, 2022
Query variable length relationships with filter RedisGraph	5	1151	February 15, 2021

RedisGraph 72k nodes, 400k relationships - slow query performance help

Related topics