Documentation Index
Fetch the complete documentation index at: https://mintlify.com/redis/redis/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Redis Cluster provides automatic sharding across multiple Redis nodes, enabling horizontal scaling and high availability without external coordination services.
┌─────────────────────────────────────────────────────┐
│ Redis Cluster (16384 slots) │
│ │
│ Node 1 (Master) Node 2 (Master) Node 3 (M) │
│ Slots: 0-5460 Slots: 5461-10922 10923-16383 │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Node 4 (Replica) Node 5 (Replica) Node 6 (R) │
│ (backup for N1) (backup for N2) (backup N3) │
└─────────────────────────────────────────────────────┘
Hash Slots
Slot Distribution
From cluster.c:29-61, Redis Cluster divides the key space into 16,384 slots:
#define CLUSTER_SLOTS 16384
int keyHashSlot(char *key, int keylen) {
int s, e; /* start-end indexes of { and } */
for (s = 0; s < keylen; s++)
if (key[s] == '{') break;
/* No '{' ? Hash the whole key. */
if (s == keylen) return crc16(key,keylen) & 0x3FFF;
/* '{' found. Look for '}' after it. */
for (e = s+1; e < keylen; e++)
if (key[e] == '}') break;
/* Empty '{}' ? Hash the whole key. */
if (e == keylen || e == s+1) return crc16(key,keylen) & 0x3FFF;
/* Hash the part between { } */
return crc16(key+s+1,e-s-1) & 0x3FFF;
}
Hash Slot Calculation:
slot = CRC16(key) mod 16384
16,384 slots were chosen as a balance between granularity and cluster metadata size. This number allows fine-grained distribution while keeping cluster bus messages reasonable.
From cluster.c:36-60, hash tags ensure related keys map to the same slot:
Examples:
{user:123}:profile → hash "user:123"
{user:123}:orders → hash "user:123"
user:456:profile → hash "user:456:profile"
{user}:789:profile → hash "user"
{}user:999:profile → hash entire key (empty tag)
Use hash tags for multi-key operations like MGET, MSET, or transactions that need atomic guarantees across multiple keys.
Cluster Architecture
Node Types
Master Nodes:
- Handle read and write operations
- Own a subset of hash slots (0-16383)
- Replicate data to replica nodes
Replica Nodes:
- Maintain copies of master’s data
- Serve read queries (if enabled)
- Promote to master on failure
Cluster Bus
Nodes communicate via a binary protocol on the cluster bus:
- Port:
client_port + 10000 (e.g., 6379 → 16379)
- Protocol: Binary gossip protocol
- Purpose: Node discovery, failure detection, configuration propagation
From redis.conf:276-277:
# Enable TLS on cluster bus
tls-cluster yes
Slot Assignment
Initial Setup
# Create cluster with 3 masters and 3 replicas
redis-cli --cluster create \
127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \
127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \
--cluster-replicas 1
Manual Assignment
# Assign slots to a node
CLUSTER ADDSLOTS 0 1 2 3 4 5 ...
# Assign range
for slot in {0..5460}; do
redis-cli -p 7000 CLUSTER ADDSLOTS $slot
done
Viewing Assignments
From cluster.c:963-972:
# View cluster topology
CLUSTER SHARDS
# View nodes and slots
CLUSTER NODES
# View slots for specific range
CLUSTER SLOTS
Sharding and Resharding
How Resharding Works
- Mark slot as migrating on source node
- Mark slot as importing on target node
- Move keys one by one from source to target
- Update slot assignment in cluster configuration
- Propagate new configuration to all nodes
Resharding Commands
# Prepare source node (moving FROM)
CLUSTER SETSLOT <slot> MIGRATING <target-node-id>
# Prepare target node (moving TO)
CLUSTER SETSLOT <slot> IMPORTING <source-node-id>
# Get keys in slot
CLUSTER GETKEYSINSLOT <slot> <count>
# Move key
MIGRATE <host> <port> <key> <dest-db> <timeout> REPLACE
# Complete migration
CLUSTER SETSLOT <slot> NODE <target-node-id>
Automated Resharding
# Automatic rebalancing
redis-cli --cluster rebalance 127.0.0.1:7000 \
--cluster-threshold 2 \
--cluster-use-empty-masters
# Reshard specific slots
redis-cli --cluster reshard 127.0.0.1:7000 \
--cluster-from <source-node-id> \
--cluster-to <target-node-id> \
--cluster-slots <count>
Resharding is online but can impact performance. During migration, keys in transitioning slots experience slight latency as the cluster determines their location.
Client Redirection
Redirection Types
From cluster.c:1161-1184:
#define CLUSTER_REDIR_NONE 0
#define CLUSTER_REDIR_CROSS_SLOT 1 // Keys in different slots
#define CLUSTER_REDIR_UNSTABLE 2 // Slot is migrating
#define CLUSTER_REDIR_DOWN_UNBOUND 3 // Slot not assigned
#define CLUSTER_REDIR_ASK 4 // Check target during migration
#define CLUSTER_REDIR_MOVED 5 // Slot permanently moved
MOVED Redirection
When a slot is reassigned:
Client: GET mykey
Node 1: -MOVED 3999 127.0.0.1:7001
Client: [connects to 127.0.0.1:7001]
Client: GET mykey
Node 2: "value"
ASK Redirection
During slot migration:
Client: GET mykey
Node 1: -ASK 3999 127.0.0.1:7001
Client: [connects to 127.0.0.1:7001]
Client: ASKING
Client: GET mykey
Node 2: "value"
ASK is temporary during migration. MOVED is permanent. Smart clients cache MOVED responses to avoid future redirections.
Multi-Key Operations
Same-Slot Requirement
From cluster.c:1133-1149:
int extractSlotFromKeysResult(robj **argv, getKeysResult *keys_result) {
if (keys_result->numkeys == 0 || !server.cluster_enabled)
return INVALID_CLUSTER_SLOT;
int first_slot = INVALID_CLUSTER_SLOT;
for (int j = 0; j < keys_result->numkeys; j++) {
int this_slot = keyHashSlot(argv[keys_result->keys[j].pos]->ptr,
sdslen(argv[keys_result->keys[j].pos]->ptr));
if (first_slot == INVALID_CLUSTER_SLOT)
first_slot = this_slot;
else if (first_slot != this_slot)
return CLUSTER_CROSSSLOT; // Error!
}
return first_slot;
}
Commands requiring same slot:
- MGET, MSET, DEL (with multiple keys)
- All commands in MULTI/EXEC transaction
- SUNION, SINTER, SDIFF
- Lua scripts accessing multiple keys
Solution: Use hash tags
# These keys are guaranteed same slot
MSET {user:123}:name "Alice" {user:123}:age 30 {user:123}:city "NYC"
# Transaction on same-slot keys
MULTI
INCR {counter}:page:views
INCR {counter}:page:unique
EXEC
Failure Detection and Failover
Failure Detection
Nodes detect failures through gossip protocol:
- Heartbeat: Nodes ping each other regularly
- PFAIL: Node marks unresponsive peer as “possibly failing”
- FAIL: When majority marks node as PFAIL, promoted to FAIL
- Propagation: FAIL state propagated via gossip
Automatic Failover
When master fails:
- Replica election: Replicas of failed master start election
- Vote collection: Other masters vote for best replica
- Promotion: Winning replica promotes itself to master
- Takeover: New master claims failed master’s slots
- Announcement: New configuration propagated
Selection criteria:
- Most recent replication offset (least data loss)
- Lower replica ID (tie-breaker)
Manual Failover
# Graceful failover (no data loss)
CLUSTER FAILOVER
# Force failover (may lose data)
CLUSTER FAILOVER FORCE
# Takeover (no voting, for automation)
CLUSTER FAILOVER TAKEOVER
Use CLUSTER FAILOVER for planned maintenance. It synchronizes replica before promoting, ensuring zero data loss.
Cluster Configuration
cluster-enabled
# Enable cluster mode
cluster-enabled yes
# Cluster configuration file (auto-generated)
cluster-config-file nodes-6379.conf
# Node timeout (milliseconds)
cluster-node-timeout 15000
cluster-require-full-coverage
# Reject queries if any slots uncovered
cluster-require-full-coverage yes
From cluster.c:1296-1299:
When yes: Cluster goes to FAIL state if any slot is unassigned
When no: Cluster serves requests for covered slots even if some are down
Setting cluster-require-full-coverage no allows partial availability but risks serving stale data if split-brain occurs.
cluster-replica-validity-factor
# Replica won't failover if too far behind
cluster-replica-validity-factor 10
Replica won’t participate in election if:
replication_lag > (node_timeout * replica_validity_factor) + repl_ping_time
Monitoring and Administration
Cluster Info
Output:
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_sent:12345
cluster_stats_messages_received:12345
Cluster Nodes
From cluster.c:1014-1019:
Output format:
<id> <ip:port@bus-port> <flags> <master> <ping> <pong> <epoch> <link> <slot>...
Example:
07c37dfeb235213a872192d90877d0cd55635b91 127.0.0.1:7000@17000 myself,master - 0 0 1 connected 0-5460
67ed2db8d677e59ec4a4cefb06858cf2a1a89fa1 127.0.0.1:7001@17001 master - 0 1426238316232 2 connected 5461-10922
Slot Statistics
From cluster.c:997:
Returns per-slot statistics:
- CPU usage per slot
- Network bytes per slot
- Memory usage per slot
Best Practices
Minimum Node Count
Recommended minimum: 6 nodes (3 masters + 3 replicas) for production. This ensures:
- High availability (can tolerate 1 master failure per shard)
- Proper quorum for split-brain protection
- Reasonable slot distribution
Key Design
Do:
- Use hash tags for related keys:
{user:123}:*
- Keep key names reasonably short
- Plan for even slot distribution
Don’t:
- Use random prefixes that prevent hash tag benefits
- Create hotspot keys (overwhelm single node)
- Use very long key names (increase memory usage)
Scaling Strategy
Scale out (add nodes):
# Add node to cluster
redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000
# Rebalance slots
redis-cli --cluster rebalance 127.0.0.1:7000
Scale in (remove nodes):
# Reshard away from node
redis-cli --cluster reshard 127.0.0.1:7000 \
--cluster-from <node-id> \
--cluster-to <other-node-id> \
--cluster-slots <all-slots>
# Remove node
redis-cli --cluster del-node 127.0.0.1:7000 <node-id>
Backup Strategy
Cluster-aware backups:
# Backup each master
for node in master1 master2 master3; do
redis-cli -h $node BGSAVE
# Wait for completion
# Copy RDB file
done
Backup replicas instead of masters to avoid impacting production traffic. Replicas have identical data.
Troubleshooting
Check:
- Cluster bus port accessible (client port + 10000)
- All nodes have
cluster-enabled yes
- No conflicting
cluster-config-file
- Nodes can resolve each other’s IPs
CLUSTERDOWN Error
Causes:
- Some slots unassigned
cluster-require-full-coverage yes and node down
- Split-brain (network partition)
Fix:
# Check slot coverage
redis-cli --cluster check 127.0.0.1:7000
# Fix slot coverage
redis-cli --cluster fix 127.0.0.1:7000
Hot spots:
- Monitor
CLUSTER SLOT-STATS for uneven load
- Redesign key schema to distribute better
- Consider splitting hot keys across slots
Network latency:
- Reduce
cluster-node-timeout carefully
- Ensure cluster bus has adequate bandwidth
- Monitor gossip message rate