I wonder if the query parser could be improved for querying TAG fields.
For storing the data (FT.ADD) I don’t have to escape anything in the tags except for the delimiter I supply myself.
For searching the tagged data currently needs escaping, which feels awkward.
From Redis-Lua, I had to write a function to do the special escaping needed, which is cumbersome and prone to fall over when custom tokenization or other additions might be added to RediSearch.
Here’s the function:
local function EscapeFtPunctuation(cRet)
– Escape RediSearch FT (full text) search punctuation characters, like ‘-’ (becomes ‘-’)
– Also escape spaces, see: http://redisearch.io/Tags/
– For punctuation characters, see: https://github.com/RedisLabsModules/RediSearch/blob/master/src/toksep.h
– From the C code:
–[[ [’ ‘] = 1, [’\t’] = 1, [’,’] = 1, [’.’] = 1, [’/’] = 1, [’(’] = 1, [’)’] = 1,
[’{’] = 1, [’}’] = 1, [’[’] = 1, [’]’] = 1, [’:’] = 1, [’;’] = 1, [’\’] = 1,
[’~’] = 1, [’!’] = 1, [’@’] = 1, [’#’] = 1, [’$’] = 1, [’%’] = 1, [’^’] = 1,
[’&’] = 1, [’*’] = 1, [’-’] = 1, [’=’] = 1, [’+’] = 1, [’|’] = 1, [’’’] = 1,
[’`’] = 1, [’"’] = 1, [’<’] = 1, [’>’] = 1, [’?’] = 1,
]]
– Lua gsub: The lua magic characters are ( ) . % + - * ? [ ^ $
– So: prepend with ‘%’ in the gsub pattern string (first parameter)
return (cRet:gsub(’[ \t,%./%(%){}%[%]:;\~!@#%$%%%^&%*%-=%+|’`"<>%?_]’, {
**** [’ ’ ]=’\ ’ ,
**** [’\t’]=’\\t’ ,
**** [’,’ ]=’\,’ ,
**** [’.’ ]=’\.’ ,
**** [’/’ ]=’\/’ ,
**** [’(’ ]=’\(’ ,
**** [’)’ ]=’\)’ ,
**** [’{’ ]=’\{’ ,
**** [’}’ ]=’\}’ ,
**** [’[’ ]=’\[’ ,
**** [’]’ ]=’\]’ ,
**** [’:’ ]=’\:’ ,
**** [’;’ ]=’\;’ ,
**** [’\’]=’\\’ ,
**** [’~’ ]=’\~’ ,
**** [’!’ ]=’\!’ ,
**** [’@’ ]=’\@’ ,
**** [’#’ ]=’\#’ ,
**** [’$’ ]=’\$’ ,
**** [’%’ ]=’\%’ ,
**** [’^’ ]=’\^’ ,
**** [’&’ ]=’\&’ ,
**** [’’ ]=’\’ ,
**** [’-’ ]=’\-’ ,
**** [’=’ ]=’\=’ ,
**** [’+’ ]=’\+’ ,
**** [’|’ ]=’\|’ ,
**** [’’’]=’\’’ ,
**** [’' ]='\\
’ ,
**** [’"’ ]=’\"’ ,
**** [’<’ ]=’\<’ ,
**** [’>’ ]=’\>’ ,
**** [’?’ ]=’\?’ ,
-- Add underscore as well, seems needed
**** [’’ ]=’\’ ,
}))
end