Debugging multipath problems

When your kubernetes cluster hangs, you are having a bad day. One symptom you might notice is you can no longer create or delete PVC's or Pods.

Checking if the command sudo multipath -ll hangs will tell you if you are having a problem with your SAN.

Workaround

Open a second shell and run this:

sudo multipath -ll -v4

The command will hang on a line like this containing the device name next to the timestamp:

...
Jan 08 HH:MM:SS | sdaw: get_state
Jan 08 HH:MM:SS | sdaw: detect_checker = yes (setting: multipath internal)

Now in the first shell become root with sudo -i and prepare this little helper function:

del_dev() { echo 1 > /sys/block/${1}/device/delete; };

Now you can delete the hanging device like this:

del_dev sdaw

Then multipath will continue.

Finally repeat the process for every hanging LUN until multipath returns.

links

social