Inconsistent Results with similar results and similar queries (Exact, prefix and Fuzzy search over multiple terms)

jacksoncalvert · August 17, 2023, 7:24am

Hey Community,

I am using Redisearch in a system that has about 70 Textfields of varying weights. Im trying to create a query that has Exact, Prefix and Fuzzy matching, all with different weights so you priorities the results based on the way its been matched.

Im seeing that, when all my documents are loaded into redis, queries with sloghtly different search terms are producing different scores when it should be producing the same.

An example is as follows
Creating the Schema I have this (there is a lot more but I am making this simple)

> ft.CREATE myIndex on hash prefix 1 "idx:" SCHEMA suburb TEXT WEIGHT 10

Inserting data is as follows

> hset idx:segment:bay suburb "Bayswater"
> hset idx:segment:bay2 suburb "Bayswater North"
> hset idx:suburb:bay1 suburb bays
> hset idx:suburb:bay3 suburb bass

And then querying. Here is the explaincli first

> ft.explaincli myIndex "((((bays => { $weight: 71; })|(bays* => { $weight: 31; }))|%bays%))" SCORER DISMAX VERBATIM  withscores
1) UNION {
2)   bays => {$weight: 71;}
3)   PREFIX{bays*} => { $weight: 31; }
4)   FUZZY{bays}
5) }
6)

ft.search myIndex "((((bays => { $weight: 71; })|(bays* => { $weight: 31; }))|%bays%))" SCORER DISMAX VERBATIM  withscores
 1) (integer) 4
 2) "idx:suburb:bay1"
 3) "710"
 4) 1) "suburb"
    2) "bays"
 5) "idx:segment:bay2"
 6) "310"
 7) 1) "suburb"
    2) "Bayswater North"
 8) "idx:segment:bay"
 9) "310"
10) 1) "suburb"
    2) "Bayswater"
11) "idx:suburb:bay3"
12) "10"
13) 1) "suburb"
    2) "bass"

What I am trying to do with this is separate out the data based on how it has matched. If a term matches at an Exact, im applying a 71x modify to it. Partial(prefix) is 31 and fuzzy is 1. This is so you can add additional search terms and rank results based on how it matched.
IE
2 Exact matches are better than an exact and a partial.

The issue is, when I am doing this on a large scale, additional data is coming back without the score modification that we are seeing here, and just coming back as 1x the field weight, pushing a lof of results to the bottom.

Here is an example in a developement environment with all data loaded into redisearch:
Doing two searches, First for ‘bays’ produces this result

"data": {
    "getSearchEngineResults": {
      "results": [
        {
          "objectName": "Bayswater",
          "objectDescription": "LOCALITY",
          "objectType": "segment",
          "score": 496
        },
        {
          "objectName": "Bayswater North",
          "objectDescription": "LOCALITY",
          "objectType": "segment",
          "score": 310
        },
        {
          "objectName": "Bass",
          "objectDescription": "LOCALITY",
          "objectType": "segment",
          "score": 16
        },
        {
          "objectName": "Pioneer Bay",
          "objectDescription": "LOCALITY",
          "objectType": "segment",
          "score": 10
        },
    }
}

Which is as expected,
Next, searching for “baysw”, only adding an additional character

"data": {
    "getSearchEngineResults": {
      "results": [
        {
          "objectName": "Bayswater",
          "objectDescription": "LOCALITY",
          "objectType": "segment",
          "score": 496
        },
        {
          "objectName": "Bayswater North",
          "objectDescription": "LOCALITY",
          "objectType": "segment",
          "score": 10
        }
      ]
    }
  }

While we are seeing the correct documets being returned, the score is way off, and the multiplier that was present in the previous search is not being applied.

I previous though it was due to expanding a term with exact matching, so I added the Verbatim. however this has not helped the scoring.

This seems to be consistent with search for anything with when you reach over half of the characters for that term.
For ‘Bayswater’
‘Bays’ is 4 characters, less than half of the matched term and works
‘Baysw’ is 5 and over half, doesnt work .

I can replicate this with ‘Fit’ and ‘Fitz’ when looking for a suburb called ‘Fitzroy’. same case as above, just replays ‘bayswater’ with ‘fitzroy’

Any advice on this would be appreciated.

About the system
We have 3.5mil documents each using a different set of TextFields.
We are on the old Redisearch 2.0.3 image.

The only chanegs to the base config are

MAXPREFIXEXPANSIONS: 1000
MAXDOCTABLESIZE: 4000000

Topic		Replies	Views
Search Query Syntax - complex query in one - exact phrase, prefix and fuzzy RediSearch	9	2692	July 10, 2020
Query does not return exact match Redis commands & data structures redisearch	5	1903	April 14, 2021
How to implement phrase match in redisearch? RediSearch redisearch	7	1034	November 7, 2022
First time using Redis Redis commands & data structures	4	1172	August 31, 2020
Tuning redisearch for FT.SEARCH (prefix matching) performance for MINPREFIX 1, 2 or 3 any suggestions? RediSearch redisearch	0	1569	September 18, 2020

Inconsistent Results with similar results and similar queries (Exact, prefix and Fuzzy search over multiple terms)

Related topics