Replication fails from Master to Replica with sudden crash of redis container by SIGTERM

Hi,

Redis Version: 6.2.6

We are facing an issue with replication between Master and Replica DBs.

The issue here seems that as soon as the replication process is about to be finished at the ‘Flushing old data’ phase, the container Receives a SIGTERM and thus restarts.

1:S 28 Sep 2023 11:15:37.936 * MASTER ↔ REPLICA sync started
1:S 28 Sep 2023 11:15:37.937 * Non blocking connect for SYNC fired the event.
1:S 28 Sep 2023 11:15:37.938 * Master replied to PING, replication can continue…
1:S 28 Sep 2023 11:15:37.938 * Partial resynchronization not possible (no cached master)
1:S 28 Sep 2023 11:22:18.453 * Full resync from master: e6e32c3e347703188de021cf19d91a06a1bdc7ae:315366382
1:S 28 Sep 2023 11:22:20.131 * MASTER ↔ REPLICA sync: receiving streamed RDB from master with EOF to disk
1:S 28 Sep 2023 11:30:49.402 * MASTER ↔ REPLICA sync: Flushing old data
1:signal-handler (1695900708) Received SIGTERM scheduling shutdown…

# k get pods | grep redis
redis-node-0 3/3 Running 2 (165m ago) 2d
redis-node-1 2/3 Running 10 (19s ago) 47h

Can this be at Kubernetes level sending SIGTERM to redis container due to some issue like memory overuse ?

We see the container exiting with error code 137 which points towards a memory issue

In the POD events we see the below:

Events:
Type Reason Age From Message


Normal Killing 28m (x10 over 26h) kubelet Container redis failed liveness probe, will be restarted
Warning FailedPreStopHook 28m (x5 over 26h) kubelet Exec lifecycle hook ([/bin/bash -c /opt/bitnami/scripts/start-scripts/prestop-redis.sh]) for Container “redis” in Pod “redis-node-1_hs(a2a1a500-2264-42a9-8651-267445271f41)” failed - error: command ‘/bin/bash -c /opt/bitnami/scripts/start-scripts/prestop-redis.sh’ exited with 137: , message: “”
Normal Created 28m (x11 over 47h) kubelet Created container redis
Normal Started 28m (x11 over 47h) kubelet Started container redis
Normal Pulled 28m (x11 over 47h) kubelet Container image “##redis:6.2.6-debian-10-r178” already present on machine
Warning Unhealthy 28m (x8 over 26h) kubelet Readiness probe errored: rpc error: code = Unknown desc = failed to exec in container: container is in CONTAINER_EXITED state
Warning Unhealthy 12m (x51 over 26h) kubelet Liveness probe failed: Timed out
Warning Unhealthy 12m (x142 over 26h) kubelet Readiness probe failed: Timed out

We checked the definitions at the Statefulset and Pods level, no resource limits were defined.

Additional logs if it helps:

1:S 28 Sep 2023 11:48:58.072 * Loading RDB produced by version 6.2.6
1:S 28 Sep 2023 11:48:58.072 * RDB age 97602 seconds
1:S 28 Sep 2023 11:48:58.072 * RDB memory usage when created 39478.59 Mb
1:S 28 Sep 2023 11:53:18.147 # Done loading RDB, keys loaded: 165616753, keys expired: 0.
1:S 28 Sep 2023 11:53:18.147 * DB loaded from disk: 260.074 seconds