Unexpected count() behaviour in "case when" clause

Hi,

I’m getting unexpected results when using count() in case when clause in conjunction with exists().
Example:
Create 1 Author and 4 Book nodes.
Create 4 WROTE relations with year property. One of the relations does not have year.

graph.query test "CREATE
  (kavish:Author {name: 'Kavish'}),
  (book1:Book {name: 'book1'}),
  (book2:Book {name: 'book2'}),
  (book3:Book {name: 'book3'}),
  (book4:Book {name: 'book4'}),
  (kavish)-[:WROTE{year:2001}]->(book1),
  (kavish)-[:WROTE{year:2002}]->(book2),
  (kavish)-[:WROTE{year:2003}]->(book3),
  (kavish)-[:WROTE]->(book4)"

Count of all relations

graph.query test "match (a:Author{name:'Kavish' })-[r:WROTE]->(n) return count(r) as cnt"

this returns 4 as expected.

However when I try to count the relations which have year property set with query:

graph.query test "match (a:Author{name:'Kavish' })-[r:WROTE]->(n) return case when exists(r.year) then count(r) else 0 end as cnt"

this is returning 0 as opposed to 3.

1) 1) "cnt"
2) 1) 1) "0"

The following query returns 3 true and 1 false values as expected

graph.query test "match (a:Author{name:'Kavish'})-[r:WROTE]->(n) return exists(r.year)"
1) 1) "exists(r.year)"
2) 1) 1) "true"
   2) 1) "true"
   3) 1) "true"
   4) 1) "false"

but somehow the count() is not returning correct value.

I got it working using sum() instead of count() as:

graph.query test "match (a:Author{name:'Kavish' })-[r:WROTE]->(n) return sum(case when exists(r.year) then 1 else 0 end) as cnt"
1) 1) "cnt"
2) 1) 1) "3"

however this won’t work when I want to get distinct counts.
Could someone please help me on this? Or point out if I’m missing anything.

Thanks in advance

I’ll suggest refraining from mixing none aggregated functions with aggregation functions

Please try the following:

GRAPH.QUERY test "MATCH (a:Author{name:'Kavish' })-[r:WROTE]->(n)
WHERE exists(r.year)
RETURN count(distinct r)"

Thanks. Yeah applying where works for the above simple query. But if I have a long list of columns in the RETURN statement with some being selected based on some conditions (hence case when), then I guess we can’t use where in the outer query.

Let’s say for above example there can be multiple WROTE relations between same Author and Book node to imply author worked on the book across years.

graph.query test "CREATE
  (kavish:Author {name: 'Kavish'}),
  (book1:Book {name: 'book1'}),
  (book2:Book {name: 'book2'}),
  (book3:Book {name: 'book3'}),
  (book4:Book {name: 'book4'}),
  (kavish)-[:WROTE{year:2001}]->(book1),
  (kavish)-[:WROTE{year:2002}]->(book1),
  (kavish)-[:WROTE{year:2003}]->(book1),
  (kavish)-[:WROTE{year:2002}]->(book2),
  (kavish)-[:WROTE{year:2003}]->(book3),
  (kavish)-[:WROTE{year:2004}]->(book3),
  (kavish)-[:WROTE]->(book4)"

Now if I want to get list of all book name for the author and number of years he worked on the particular book and if information is not available on the year property then return say -1. Then I’ll need some query of the following form I guess, where I’ll have to combine exists() and count()

graph.query test "match (a:Author{name:'Kavish' })-[r:WROTE]->(b) 
    return b.name,
    case when exists(r.year) then count(r) else -1 end as num_years"

with expected output as

1) 1) "b.name"
   2) "num_years"
2) 1) 1) "book4"
      2) "-1"
   2) 1) "book2"
      2) "1"
   3) 1) "book1"
      2) "3"
   4) 1) "book3"
      2) "2"

however it currently returns -1 for all books

1) 1) "b.name"
   2) "num_years"
2) 1) 1) "book4"
      2) "-1"
   2) 1) "book2"
      2) "-1"
   3) 1) "book1"
      2) "-1"
   4) 1) "book3"
      2) "-1"

If I apply where before then book4 isn’t returned at all since it is already filtered

graph.query test "match (a:Author{name:'Kavish' })-[r:WROTE]->(b) 
    WHERE exists(r.year)
    return b.name, count(r)"
1) 1) "b.name"
   2) "count(r)"
2) 1) 1) "book2"
      2) "1"
   2) 1) "book1"
      2) "3"
   3) 1) "book3"
      2) "2"

Actually we have a requirement of creating large graph and certain query patterns which will be executed on that. We got the graph created and are exploring running the queries. Many of our queries are using case when, distinct and aggregations in conjunction which seems to be returning irregular results (correct result with sum() but incorrect with count()) so was wondering if it’s a known issue.

Is that the query you’re after:

127.0.0.1:6379> GRAPH.QUERY test "match (a:Author{name:'Kavish' })-[r:WROTE]->(b) RETURN b.name, collect(case when exists(r.year) then r.year else -1 end)"
1) 1) "b.name"
   2) "collect(case when exists(r.year) then r.year else -1 end)"
2) 1) 1) "book2"
      2) "[2002]"
   2) 1) "book4"
      2) "[-1]"
   3) 1) "book3"
      2) "[2004, 2003]"
   4) 1) "book1"
      2) "[2003, 2002, 2001]"
3) 1) "Cached execution: 0"
   2) "Query internal execution time: 1.830709 milliseconds"

Generally speaking I’ll suggest using the WITH clause as an intermediate step collecting data leading into aggregations.

1 Like

Ok thanks, will explore on re-structuring our queries using WITH. Had one more quick question which is regarding full text indexes. The default scoring function in RediSearch in TF-IDF. It supports overriding the default using the SCORER attribute (Scoring | Redis). But it is added outside of the query string:

FT.SEARCH myIndex "foo" SCORER BM25

So I assume it is not possible to override it when making queryNodes call through redisgraph… and trying to change the default in RediSearch should be the only way? Kindly confirm on this.

Looking at the docs it doesn’t seems like RedisGraph full-text index can change the underline index (RediSearch) scoring function. sorry.

1 Like