Today at 10:01 AM CEST the iwStack orchestrator received a disconnection event from several hosts which triggered a massive High Availability recover procedure for more than 600 instances.
At 10:50 while most instances were back running, a couple hundreds were stuck in starting state waiting for the network setup to complete.
At 11:10 in the attempt to speed up the process we forced a network restart (including VR rebuild), but this turned to be a wrong solution causing more delay.
Finally at 13:00 all the queued instances were started.
If your instances are still in stopped state, just start them. Please open a ticket if some instance don't start.
At present we have disabled the HA flag for all the instances while we're investigating on the incident.
We are sorry for any inconvenience this issue may have caused.
Prometeus support team