Pytorch on gears - one more way to fry RedisGears cluster

Now I have 3 steps in the pipeline and functioning , I want to add 4th - tokenisation using Bert Model.
Unfortunately, tokeniser depends on Pytorch which is ~800 MB download.
It seems after installation pytorch cluster becomes unstable:

161808:M 22 May 2020 15:01:13.516 * <module> GEARS: Successfully spellchecked sentence sentences:bafab6b3dd88dcdefe111698d02f81998c9accdb:236:{1x3}
161783:S 22 May 2020 15:03:42.420 * <module> Processing ./torch-1.4.0-cp37-cp37m-manylinux1_x86_64.whl

161783:S 22 May 2020 15:03:51.325 * <module> Installing collected packages: torch

161783:S 22 May 2020 15:04:02.674 * <module> Successfully installed torch-1.4.0

161783:S 22 May 2020 15:04:09.381 # <module> disconnected :, status : -1, will try to reconnect.

161783:S 22 May 2020 15:04:09.402 # <module> disconnected :, status : -1, will try to reconnect.

161783:S 22 May 2020 15:04:09.422 # <module> disconnected :, status : -1, will try to reconnect.

161783:S 22 May 2020 15:04:09.443 # <module> disconnected :, status : -1, will try to reconnect.

161783:S 22 May 2020 15:04:09.464 # <module> disconnected :, status : -1, will try to reconnect.

The command I am trying to run gears-cli --host --port 30001 --requirements requirements_tokenizer.txt

where requirements:


and the code

tokenizer = None 

def loadTokeniser():
    global tokenizer
    from transformers import AutoTokenizer
    tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
    return tokenizer

def tokenise_sentence(record):
    global tokenizer
    if not tokenizer:
    # sentence_key=record['value']['sentence_key']
    # sentence_orig=record['value']['content']
    log(f"Tokeniser received {sentence_key} and my {shard_id}")
    tokens = tokenizer.tokenize(record['value']['content'])
    key = "tokenized:bert:%s:{%s}" % (sentence_key,shard_id)
    for token in tokens:
        execute('lpush', key, token)
        execute('SADD','processed_docs_stage3_tokenized', sentence_key)

bg = GearsBuilder()

I don’t think it reaches point where it runs code.
gears-cli times out with


%d)     %s (1, 'Execution max idle reached')
@AlexMikhalev can you share the full logs of all the shards? I guess its just takes to long to install this requirement and we are reaching execution Max idle timeout (by the way I already have a PR that set the requirement installation idle timeout to longer value by default because it make sense it might take a while Notice that you can increase this timeout

@meirsh is any way to get debug log out of shards?
I am running ./create-cluster tailall and excerpt is above - nothing else, no failures.

I increased execute --addr --master-only RG.CONFIGSET ExecutionMaxIdleTime 30 and tried to re-submit the same script as above. Resulted in segfault - see gist.

OK @AlexMikhalev the issue is also this:
Downloading torch-1.4.0-cp37-cp37m-manylinux1_x86_64.whl (753.4 MB)

The size 753.4 MB is more then the default redis bulk size. Try increase it with:
CONFIG SET proto-max-bulk-len 2048mb
Make sure to do it on all the shards and do not forget to increase ExecutionMaxIdleTime

I just tried it and it worked for me.

Regarding the crash, do you mind opening and issue on github?

Tried to increase memory bulk (after create-cluster clean&restart refresh). Still failed with idle timeout even on empty cluster.
I will try to replicate crash and file bug report on github/redisgears.

@AlexMikhalev as I said this config set is not enough you need to also increase the ExecutionMaxIdleTime, when you load the module you can give it as a parameter, set it to something like 5 minutes to be on the safe side (300000 ms).

@meirsh is ExecutionMaxIdleTime in seconds or in ms?

ms (some more chars to reach 20 chars so it will allow me to send the message :slight_smile: )

For others, fix is to run: execute --addr ip:30001 RG.REFRESHCLUSTER execute --addr ip:30001 CONFIG SET proto-max-bulk-len 2048mb execute --addr  ip:30001 RG.CONFIGSET ExecutionMaxIdleTime 300000
