Hey Community,
I am using Redisearch in a system that has about 70 Textfields of varying weights. Im trying to create a query that has Exact, Prefix and Fuzzy matching, all with different weights so you priorities the results based on the way its been matched.
Im seeing that, when all my documents are loaded into redis, queries with sloghtly different search terms are producing different scores when it should be producing the same.
An example is as follows
Creating the Schema I have this (there is a lot more but I am making this simple)
> ft.CREATE myIndex on hash prefix 1 "idx:" SCHEMA suburb TEXT WEIGHT 10
Inserting data is as follows
> hset idx:segment:bay suburb "Bayswater"
> hset idx:segment:bay2 suburb "Bayswater North"
> hset idx:suburb:bay1 suburb bays
> hset idx:suburb:bay3 suburb bass
And then querying. Here is the explaincli first
> ft.explaincli myIndex "((((bays => { $weight: 71; })|(bays* => { $weight: 31; }))|%bays%))" SCORER DISMAX VERBATIM withscores
1) UNION {
2) bays => {$weight: 71;}
3) PREFIX{bays*} => { $weight: 31; }
4) FUZZY{bays}
5) }
6)
ft.search myIndex "((((bays => { $weight: 71; })|(bays* => { $weight: 31; }))|%bays%))" SCORER DISMAX VERBATIM withscores
1) (integer) 4
2) "idx:suburb:bay1"
3) "710"
4) 1) "suburb"
2) "bays"
5) "idx:segment:bay2"
6) "310"
7) 1) "suburb"
2) "Bayswater North"
8) "idx:segment:bay"
9) "310"
10) 1) "suburb"
2) "Bayswater"
11) "idx:suburb:bay3"
12) "10"
13) 1) "suburb"
2) "bass"
What I am trying to do with this is separate out the data based on how it has matched. If a term matches at an Exact, im applying a 71x modify to it. Partial(prefix) is 31 and fuzzy is 1. This is so you can add additional search terms and rank results based on how it matched.
IE
2 Exact matches are better than an exact and a partial.
The issue is, when I am doing this on a large scale, additional data is coming back without the score modification that we are seeing here, and just coming back as 1x the field weight, pushing a lof of results to the bottom.
Here is an example in a developement environment with all data loaded into redisearch:
Doing two searches, First for ‘bays’ produces this result
"data": {
"getSearchEngineResults": {
"results": [
{
"objectName": "Bayswater",
"objectDescription": "LOCALITY",
"objectType": "segment",
"score": 496
},
{
"objectName": "Bayswater North",
"objectDescription": "LOCALITY",
"objectType": "segment",
"score": 310
},
{
"objectName": "Bass",
"objectDescription": "LOCALITY",
"objectType": "segment",
"score": 16
},
{
"objectName": "Pioneer Bay",
"objectDescription": "LOCALITY",
"objectType": "segment",
"score": 10
},
}
}
Which is as expected,
Next, searching for “baysw”, only adding an additional character
"data": {
"getSearchEngineResults": {
"results": [
{
"objectName": "Bayswater",
"objectDescription": "LOCALITY",
"objectType": "segment",
"score": 496
},
{
"objectName": "Bayswater North",
"objectDescription": "LOCALITY",
"objectType": "segment",
"score": 10
}
]
}
}
While we are seeing the correct documets being returned, the score is way off, and the multiplier that was present in the previous search is not being applied.
I previous though it was due to expanding a term with exact matching, so I added the Verbatim. however this has not helped the scoring.
This seems to be consistent with search for anything with when you reach over half of the characters for that term.
For ‘Bayswater’
‘Bays’ is 4 characters, less than half of the matched term and works
‘Baysw’ is 5 and over half, doesnt work .
I can replicate this with ‘Fit’ and ‘Fitz’ when looking for a suburb called ‘Fitzroy’. same case as above, just replays ‘bayswater’ with ‘fitzroy’
Any advice on this would be appreciated.
About the system
We have 3.5mil documents each using a different set of TextFields.
We are on the old Redisearch 2.0.3 image.
The only chanegs to the base config are
MAXPREFIXEXPANSIONS: 1000
MAXDOCTABLESIZE: 4000000