Cybersecurity Incident Recovery Plans with Matt Eshleman Artwork

Community IT Innovators Nonprofit Technology Topics

Community IT offers free webinars monthly to promote learning within our nonprofit technology community. Our podcast is appropriate for a varied level of technology expertise. Community IT is vendor-agnostic and our webinars cover a range of topics and discussions. Something on your mind you don’t see covered here? Contact us to suggest a topic! http://www.communityit.com

All Episodes

Community IT Innovators Nonprofit Technology Topics

Cybersecurity Incident Recovery Plans with Matt Eshleman

August 02, 2024 • Community IT Innovators • Season 5 • Episode 30

0:00 | 17:39

Whether or not your nonprofit faced Crowdstrike impacts, the aftermath of a national or worldwide incident is a great time to gather your cybersecurity team and run the exercise: what will your organization do during the next outage or cyber attack?

Nonprofit Cybersecurity expert Matt Eshleman shares his thoughts in this podcast on the importance of

reviewing your incident response and business continuity plans regularly,
practicing your response regularly – what will you do if a critical person is absent?
involving stakeholders outside the domain of the IT team to weigh the recovery options and order of importance. Payroll first? Email? Securing a backup of your data?

As the Chief Technology Officer at Community IT, Matthew Eshleman leads the team responsible for strategic planning, research, and implementation of the technology platforms used by nonprofit organization clients to be secure and productive. With a deep background in network infrastructure, he fundamentally understands how nonprofit tech works and interoperates both in the office and in the cloud. With extensive experience serving nonprofits, Matt also understands nonprofit culture and constraints, and has a history of implementing cost-effective and secure solutions at the enterprise level.

Matt has over 23 years of expertise in cybersecurity, IT support, team leadership, software selection and research, and client support. Matt is a frequent speaker on cybersecurity topics for nonprofits and has presented at NTEN events, the Inside NGO conference, Nonprofit Risk Management Summit and Credit Builders Alliance Symposium, LGBT MAP Finance Conference, and Tech Forward Conference. He is also the session designer and trainer for TechSoup’s Digital Security course, and our resident Cybersecurity expert.

Learn how to recover better from a cybersecurity incident with some key takeaways from this podcast.

_______________________________
Start a conversation :)

Register to attend a webinar in real time, and find all past transcripts at https://communityit.com/webinars/
email Carolyn at cwoodard@communityit.com
on LinkedIn

Thanks for listening.

Carolyn Woodard: Welcome, everyone, to the Community IT Podcast. My name is Carolyn Woodard, and I’m the Outreach Director for Community IT. And today, I’m really excited to be talking with our cybersecurity expert, Matt, who is going to tell us more about the recent outages and how you can be more prepared. Matt, do you want to introduce yourself?

Matt Eshleman: Sure. Thanks, Carolyn. My name is Matthew Eshleman, and I’m the Chief Technology Officer at Community IT. And in my role here, I have two primary responsibilities.

One is to manage all of our back-end systems and cybersecurity platforms that we use to keep our clients secure.

I also work on a client-by-client basis, working on technology and cybersecurity strategy. Really looking forward to the conversation today.

Carolyn Woodard: I heard that CrowdStrike didn’t really impact a lot of our clients.

Matt Eshleman: Yes, that’s correct.

Carolyn Woodard: Yay!

Disaster Recovery Planning

I thought maybe you could give us what to do in case of, to have your plan in place, but then what’s typical, what should you do first? How do you respond when you find out that there’s some kind of issue going on and it does impact you?

Matt Eshleman: I do think disaster recovery and disaster response has really evolved quite a lot from whenever we were originally developing these plans, back in the day when there was a server in the closet down the hall. The disaster response was really focused on, “how do we recover this server?”

Carolyn Woodard: How do you get back into your building.

Matt Eshleman: Back into our building, or how we’re going to restore things that we’re in charge of, or we have the ability to respond to.

I think now a lot of the disaster recovery or exposure to risk and downtime has really been outsourced as part of the migration to the cloud. Now a lot of the incidents you need to respond to are a data breach at a vendor that you may use or in the case of CrowdStrike, a bad update gets pushed from a product that you rely on to provide security or remote support or whatever it happens to be.

And so, with the transition of the cloud, we’ve gotten a lot of benefits, but I think we’ve also outsourced a lot of the risk and then associated response time because if CrowdStrike hadn’t been so quick to identify and fix that issue, it could have been a lot worse than it turned out to be.

CrowdStrike specifically, that’s not the endpoint security tool that we use as our main endpoint platform. We have a number of clients who get it, who pay for it, or who receive it as a donation. I think CrowdStrike is very focused on providing security to NGOs and policy groups specifically. Maybe 10% of the endpoints were impacted at clients who had it. So frustrating, to be sure, but not completely debilitating.

But for those organizations that were impacted, it really was a chance, unfortunately, to test out their disaster recovery plan in terms of how they’re responding and fixing endpoints. I think for a lot of organizations, particularly small to mid-sized that we’re working with, they’re using the cloud a lot. I think the big headache was really on the bitlocker and decrypting the hard drive so that you could get in and make that fix and delete the offending file.

So, for organizations that are cloud-centric, and those keys are stored in the cloud, that was relatively easy to get to. I know for larger enterprises that maybe had the keys stored on servers that were also impacted, that was a real challenge, right? That the recovery environment was also impacted or the ability to recover endpoints was also impacted.

I think whenever there’s any of these big high-profile outages or issues, it is a good opportunity for organizations to use that, even if you weren’t impacted, just to review your own disaster recovery plan and say, “if we were impacted or if this software tool is impacted, how would we have responded?” And I think it can be a good exercise to identify gaps in that coverage.

Reviewing Your Disaster Recovery Plan

Carolyn Woodard: How often do you recommend that an organization should review their disaster recovery plan or their response plan? Is that something you do once a year?

Matt Eshleman: Yeah, I think once a year is probably sufficient. I think probably more helpful is making sure that you can carve out time to have that debrief and lessons learned after something happened. So again, when it’s really top of mind, for example right now, it’s a good time to just talk through it because I think the pain is fresh or at least watching other people go through it is helpful.

That’s something at Community IT that we did, not related to CrowdStrike, but previously whenever a managed services provider tool was impacted. That caused us to make some changes in terms of how we design our system to build in some more resiliency so that we don’t have all of our eggs in one basket, so to speak. So, I think having that time to talk about adjusting and reviewing an organization’s business continuity or disaster recovery plan is a good thing to do on the heels of an accident because there’s lots of focus and attention on it.

And I think it can be a time where you can actually get that attention internally to have those conversations.

How to Put the Disaster Recovery Planning Committee Together

Carolyn Woodard: And who should be on that team, ideally? Is that something that the IT department or whoever is in charge of IT will take to the executive team? Is it something where maybe you have a committee where there’s someone from the executive team and someone from IT, and maybe some other stakeholders on a committee?

What’s a good best practice for putting that disaster recovery plan together and reviewing it?

Matt Eshleman: I think it’s a good question. Every organization is going to be a little bit different.

I think for big scenarios like this, this goes maybe beyond what we would typically call backup and disaster recovery, which I think of is mostly getting data back after kind of a single system failure, into business continuity, which is how do we continue to operate in the face of some bigger event, pandemic, major platform outage, that kind of thing.

That differentiation aside, I think IT owns the operational aspects of it. I think for many organizations, the chief financial officer is the person who owns business continuity for the organization.

And so that’s where I would say most organizations would be. I think it is important to have potentially some leadership level insight. And this kind of intersects a little bit with organizations that are able to do some tabletop exercises to test out their business continuity plan in the face of a cyber-attack or some other thing.

Those conversations really can help illuminate gaps that may exist in the plan, because you have a really broad and executive perspective on the operations of the organization. And so, things that IT may not think about get surfaced in those conversations with more senior executives in the organization. I think certainly IT owns the operational aspect, but then should be getting input from typically the chief financial or operational executives in the company, and maybe extending up into the board.

The board is really a good place to initiate those conversations and to make sure that they’re happening. And then a lot of the work ends up getting carried out by the finance operations executives in the organizations that we’re working with, which is going to be small to mid-sized nonprofit organizations from 30 to 500 staff. That’s probably a reasonable size to have those parameters.

Carolyn Woodard: I love your suggestion to game out different scenarios. It reminded me that yesterday I was reading a thread on care for employees who have had a death in the family, and being able to give an employee like a month off to be able to handle everything that you have to do when that sort of thing happens.

And what if that were your IT director, who’s not there when something like this happens? How would you cover for that? How do you have redundancy? Who do you call next? Do you have a phone tree? Is everyone scrambling for what the next step is? Do you even have people’s phone numbers somewhere that you can call them if they were on your laptop and you can’t get into your laptop, for example?

Matt Eshleman: Yeah, I think in those tabletop exercises are good because it does illuminate maybe some gaps in the process. We went through a tabletop exercise with one of our clients, and it was interesting in that going through that process, the answer to many questions was, “oh, this person does it, this person does it, this person does it.” And it’s always the same person who’s really bearing a lot of the responsibility for their incident response plan. And so, then the question became, “okay, what happens if this person is out, sick, unavailable, you know, their own vacation, unreachable?”

If you’ve got your whole business continuity plan dependent on one key person in the organization and they’re not available for some reason, then that can be a real challenge. Now maybe it’s not practical to build in super redundancies for all these areas, but I think it’s also a good idea to have another backup layer in place and understand it’s not maybe going to be ideal.

But yes, if the primary person who’s responsible for running all of this is unavailable, here’s where we go next and make that person aware or make that company aware that they’re going to be responsible for those activities.

Carolyn Woodard: Yeah, I love that.

Creating a Cybersecurity Recovery Plan

Can you maybe walk us through some of the steps and in what order you would go through to make your recovery plan? We’ve talked a little bit about the first thing is you should have a plan, and the people that should be involved in gaming out what that plan is going to look like.

Then what are the things that you do? What should be in the plan? What do you do first? What do you do next? What do you do after that?

Matt Eshleman: I think it really starts with having a good understanding of the systems that you have and the priority and importance of those systems. And make sure that that’s something that is agreed upon broadly.

IT may have one perspective on what system is the most important. Executive leadership may have a different perspective on the most important system and the priority. I think having a great understanding of the systems that you need to have access to, to do your work and how you’re going to go about responding to them is important. Because you are going to need to make choices in terms of where you’re focusing resources.

You can’t fix everything all at once.

Have a pretty good sense of “system A is our most critical because that’s how we pay our bills or receive funding,” for example. And so even though that system A only impacts maybe five people or 10 people, that is the most business critical application, and we need to focus on getting that up first.

It’s important to have that playbook and the list of systems that you are going to respond to and the order in which you’re going to do it.

Once you agree to that, when you have a business continuity plan, making sure that the priority order is clear, and you have a good understanding of what it means and what systems are connected so that you can restore those first is a good process to go through.

IT can’t just decide that on their own. I think that’s really the benefit of having those conversations at the higher organizational level, because the priorities of IT might be different than the finance department or the program staff. Having a place where those conversations can be had, you can build a consensus on what you’re going to focus on first, that would be the next place to start, or the next place to go.

Carolyn Woodard: Great. So, making those decisions ahead of time, for example “we’re going to get payroll up first, and then everybody can have their email abilities back” or something like that. Just making hard choices.

Matt Eshleman: Yeah.

What Do You Do After Cybersecurity Recovery?

Carolyn Woodard: And then, do you have any advice on, so assuming everything goes well, you follow your plan, there’s some little glitches, because you didn’t anticipate something or the other, but then you’re back. What do you do?

How do you know that everything is back the way it’s supposed to be, or better, hopefully? And what do you do after you’ve recovered? Are there some best practices for that?

Matt Eshleman: Yeah, I think that’s a really good question.

How do you know you’re back? Confirming that the systems are up, that you are able to confirm that the configurations are the same as what you had before.

The importance of good documentation and going back and making sure that maybe some shortcuts that were taken to get systems back up, maybe you had to give more permissions to a certain user or group. You know, I think after the dust has settled and you’re operating for a little bit, it is an important process to go back and make sure that all the steps that were taken to restore a system back into functional status, you undo maybe some shortcuts or maybe non-best practices that were required there.

And then I think as we’ve been discussing, just having that formal process to go and have, an incident report and review what went well, what needs to be changed, what gaps existed, so that you are able to take those lessons learned and improve the plan.

Those business continuity plans should not be static documents that just sit on a shelf. They need to be updated and revised pretty regularly. This is a good time to go back and make those edits so that you’re better prepared for when the inevitable next different outage is going to impact the organization.

Carolyn Woodard: Yeah, we’re all sometimes vulnerable to just random things. This company CrowdStrike had the bad update, and then it just cascaded through all these different industries. Well, that all sounds great.

I wanted to say, we just did a webinar on being a learning organization. And one of the suggestions in that webinar was to be the kind of organization where you have a postmortem, talk about what happened, learn the lessons. For example, to realize “I needed to have this person’s phone number so that I could call them quickly when something happened.” Those sorts of lessons.

I think it’s good to be an organization that makes time for that and sits down and talks about it.

Matt Eshleman: Yes, yes, 100% agree.

Carolyn Woodard: Well, thank you so much for your time today, Matt. I really appreciate it. And just you always have so much that so much wisdom and information about cybersecurity systems.

Thank you for helping us understand more.

Matt Eshleman: Thanks, Carolyn. I appreciate being able to get to talk about this.