Restarting Storage Space Direct Cluster Nodes managed by SCVMM 2016

So most of my experience recently with Storage Spaces Direct (S2D) have been deploying stand alone clusters that are not managed by System Center Virtual Machine Manager (SCVMM) 2016.  I have another blog in the works that documents how I built out a 4 node S2D cluster using VMM and PowerShell.  It piggy backs on my last blog on how to Deploy A Storage Spaces Direct 4 Node Cluster.

This blog specifically deals with the need to restart a node due to patching or some other task that may require the node to be restarted.  Normally within VMM you would just right click a node in the cluster, select Start Maintenance Mode, then wait for the VM’s on that node to migrate off to the other nodes in that cluster.  Restart the node then wait for the node to come up, right click and select Stop Maintenance Mode and bingo you are ready to put the next node into Maintenance mode and restart that one.  With S2D clusters, you can’t do that and shouldn’t do that!

If you go to the Microsoft Doc’s page, Taking a Storage Spaces Direct server offline for maintenance they do explain how to properly restart a node in the cluster as a standalone S2D cluster.  On the Microsoft’s Doc page for VMM the only thing I could find is how to put a node into maintenance mode.

Step 1:  Check the health of your Virtual Disk before putting your node into maintenance mode.

  1. Open up PowerShell from within one of your Nodes or do a remote connection using Enter-PSSession.
  2. Run the Get-VirtualDisk cmdlet.
  3. Verify that all your volumes are in the Healthy status.



Step 2:  Put Node Into Maintenance mode and restart.

  1. Now, from within the VMM Console, right click the node and select Start Maintenance Mode.  Select the option to have all the VM’s migrated off that node.
  2. At this point your node should be in maintenance mode, which in turn has paused the role, and has suspended disk IO to those disk on that node.
  3. You can now patch your node if you haven’t already, and then restart the system.


Step 3:  Monitor Storage Jobs and Virtual Disk Health

  1. Once the system has come back online, you can then take it out of Maintenance mode from within the VMM console.
  2. Open up Powershell from within a node in your cluster or connect to a remote PowerShell session using Enter-PSSession.
  3. Monitoring the Health of the Virtual Disks by using the following PowerShell cmdlet.  Get-VirtualDisk.

    Do not worry about the Healthstatus showing warning during this time.  This is normal.  Your data is still accessible.


  4. Monitor the status of the re-sync (repair) jobs by using the following PowerShell cmdlet.  Get-StorageJob

    The re-sync of the storage may take some time depending on how much new data was written and how long the server was paused for.


    NOTE:  You must wait for the re-syncing to complete before taking another server in the cluster offline.

  5. Once the storage jobs are complete, you can verify that your volumes now show Healthy again with the Get-VirtualDisk cmdlet.  At this point you can now put another node in your cluster into maintenance mode in order to restart from patching.


Final Thoughts:

I have not tried using the Cluster Aware Updating feature on a S2D cluster.  I might try that next and see if the feature is intelligent enough to wait for the re-sync process to complete before starting another node.

So to summarize:

  1. Check Volume Health using Get-VirtualDisk
  2. Put node into Maintenance Mode in VMM Console.
  3. Restart Node
  4. Take Node out of Maintenance mode
  5. Monitor Volume Health and Storage jobs using Get-VirtualDisk and Get-StorageJob PowerShell cmdlets.
  6. Once Volumes are all Healthy you may place another node into Maintenance Mode.

In theory you could just follow the Taking a Storage Spaces Direct server offline for maintenance documents site from Microsoft and leave VMM out of it.  The only issue you will see is within the VMM console, the node will not show it is in maintenance mode but show that the node is in a problem state.  This may or may not have some unwanted alerts from your monitoring software like OpsMgr.





Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s