As the IT world recovers from the massive outage triggered by CrowdStrike’s Falcon update, CISOs and CIOs would be wise to keep a running ledger of lessons learned. Here are some initial considerations.

Whether you’ve survived the CrowdStrike incident or didn’t use CrowdStrike and are merely seeing the impact to others, taking time to learn lessons from this event is vital. After all, if you couldn’t recover easily from this, then you may be lost trying to recover from a ransomware attack. At issue are potential shifts you might want to consider making to your staffing strategies, technical processes, and communication channels and culture, as well as your approach to ensuring hardened assets overall. The list of lessons learned from CrowdStrike will likely grow as more information comes to light about the impacts the outage has had on organizations around the globe, but for now, the following look at the recovery process provides insights into how you might want reconsider or reinforce your strategy around key processes and resources to ensure a more robust response going forward.

Staffing rethink Recovering from CrowdStrike has been an all-hands-on-deck event. In some instances, companies have needed humans to be able to touch and reboot impacted machines in order to recover — an arduous process, especially at scale. If you have outsourced IT operations to managed service providers, consider that those MSPs may not have enough staff on hand to mitigate your issues along with those of their other clients, especially when a singular event has widespread fallout. Instead, you may have only your existing staff to call on to remedy a situation — and to train folks not used to technology tasks to perform key steps in order to help get your network back online as soon as possible. Alternatively, you may need to consider shipping replacement equipment or alternative ways that you can reinstall or refresh operating systems, as was the case with CrowdStrike — all of which requires personnel.

Thinned staffs over-reliant on service providers are at risk of poor recovery from incidents, no matter the source. Tighten up your technical resources As Microsoft points out in response to CrowdStrike, besides getting into safe more and being able to enter commands, your next hurdle may be getting access to something intended to protect your device: Bitlocker. When the computer reboots after entering safe mode, if Bitlocker is enabled you will be asked to enter a recovery key. I speak from experience that, more often than not, accessing Bitlocker recovery keys can take time. They may be backed up in your local Active Directory. They may be printed out and saved in a location that, in the initial moments, you may forget where they have been stored.

Ensure you review recovery steps and processes on a regular basis to guarantee that your team knows exactly where those recovery keys are and what processes are necessary to obtain them. While Bitlocker is often mandated for compliance reasons, it also adds a layer of complications you may not be prepared for. During this event, we’ve seen interesting workarounds for getting systems operational. Via social media, people such as LetheForgot shared the following: “We went into advanced restart options to launch the command prompt, skip the bitlocker key ask which then brought us to drive X and ran ‘bcdedit /set {default} safeboot minimal’ which let us boot into safemode and delete the sys file causing the bsod.”

Another poster recommended “Even in safe mode, crowdstrike folder access was denied. Used cacls to give more rights to user (bypassing admin) and deleted file.” If you are wondering why this works and doesn’t demand a Bitlocker recovery key, when the computer is booting in safe mode by default this is not something that should be encrypted. You still need to provide valid user credentials to access the C drive, bringing up the next roadblock in recovering access. Do you have access to the domain controller, or will you need access to a local username to get to the C drive and delete the file you need to remove to restore to a functional machine? If you have used LAPS or software that randomizes the Local Administrator password, you will need access to that resource as well. Once you get access to the machine, then you can delete with the following command:

del C-00000291*.sys The lesson here is not only to review recovery steps often but also to follow community discussions closely for creative technical solutions when collective IT disaster unfolds. Build a culture of communication That brings up another key resource needed during any incident: clear information regarding what is happening and what to do. Late on the evening of Thursday, July 18, it was clear from comments on social media that something was happening. It was also quickly identified what the underlying culprit was, a CrowdStrike update that went faulty. In other incident situations, you may not be so quickly informed. It may not be clear what has happened and what assets have been impacted. Often, you’ll need to reach out to staff who are closely working with impacted assets to determine what is going on and what actions to take. Often what you first think the issue is and what actions to take may not ultimately be the actions you need to take. Or you may find easier steps to take.

In addition, you may need to determine whether a Plan B may be more beneficial as a plan of action. In this instance, I’ve seen companies decide to move up plans to redeploy computer systems to replace impacted machines. Since a hardware refresh was planned in the coming weeks, they merely moved up plans to redeploy hardware rather than attempt to fix the machines. All of that requires clear communication among all parties involved — a culture you need to build, in addition to having incident communication strategies and processes in place. Reassess strategies in wake of lessons learned Just as with any incident, clean up and follow up are essential.

For those who have machines back up and recovered post-CrowdStrike, there are certain items you should review. First is consider reissuing Bitlocker recovery keys. If you handed out the recovery key manually, consider reissuing and rotating keys. If you are considering changes to your infrastructure, rather than ripping out your technology and replacing it with a different operating system, consider the alternative of changing how you deploy software and restrict what software is allowed to run on these special-purpose machines. We use antivirus because we don’t have a limit on what we allow to run on our systems. If we spent the time and resources limiting what is allowed to run, machines would be more secure. Of course, you do need to reconsider what operating system is used for what purpose. We’ve seen too many social media posts of bluescreens on what are merely overgrown notification screens. Do you truly need a full operating system to merely provide information? Or are there alternative ways that you can provide that same information?