Hello,
I’ve got what I think is a fairly simple graph of Person
nodes tagged with an id
attribute and some nodes optionally tagged with a special
attribute. There is a :contributesTo
self-relation between Person
nodes.
Here are two Cypher files for loading the nodes and relations:
- https://gist.github.com/abevoelker/6aff5b3bec49968c82e877b3943f75b3/raw/a4a4db68b9533f295e988dec8fc5e07fcf23cccd/nodes.cypher
- https://gist.github.com/abevoelker/6aff5b3bec49968c82e877b3943f75b3/raw/a4a4db68b9533f295e988dec8fc5e07fcf23cccd/relations.cypher
I’m trying to find all the Person nodes who ultimately have an ancestral :contributesTo
relationship to another Person
with the special
attribute set, decided recursively (so if a person contributes to a non-special person, but that person then contributes to a “special” person, that initial person would still show up in the result).
So I wrote the query like this:
MATCH (d)-[:contributesTo*1..]->(r) WHERE r.special = 'T' RETURN DISTINCT d.id
But that just seems to hang indefinitely.
So then I tested specific cutoff depths.
This took 2039.9ms according to SLOWLOG:
MATCH (d)-[:contributesTo*0..1]->(r) WHERE r.special = 'T' RETURN DISTINCT d.id
This took 6712ms:
MATCH (d)-[:contributesTo*0..2]->(r) WHERE r.special = 'T' RETURN DISTINCT d.id
And this took 6.4243e+05ms:
MATCH (d)-[:contributesTo*0..3]->(r) WHERE r.special = 'T' RETURN DISTINCT d.id
So it’s obviously getting slower with increasing depth.
I do have indexes on the id and special node attributes:
CREATE INDEX ON :Person(id)
CREATE INDEX ON :Person(special)
But I’m pretty new to graph databases so not sure what to do next to make this faster. Any help would be greatly appreciated!
Edit: I just realized my queries weren’t using labels so I tried that -
MATCH (d:Person)-[:contributesTo*1..]->(r:Person) WHERE r.special = 'T' RETURN DISTINCT d.id
but it doesn’t seem to have made a perceptible difference.
Edit2: Forgot to mention this is on Redis 6.2.1, RedisGraph compiled from git SHA 92046602a775dcba2396d493597b60043f0e39bb, on an Ubuntu 20.04 machine with 32GB RAM and an Intel i7 CPU. Also, the queries are still slow without using DISTINCT
.