How CISA dealt with infrastructure failure


When the AT&T national network crashed a couple of weeks back, it showed just how dependent the nation has become on wireless networks and the wired internet. The Cybersecurity and Infrastructure Security Agency (CISA) was on top of the situation and still assessing it. For an update,  the Federal Drive with Tom Temin spoke with CISA’s executive assistant director for emergency communications, Billy Bob Brown, Jr.

Tom Temin And what are you doing now? I mean, this is still something studied pretty carefully. AT&T originally said, well, I’ll paraphrase, we screwed up in doing a software update. They put it in fancy words, but is that basically what we can accept that happened at this point?

Billy Bob Brown Jr. The Cybersecurity Infrastructure Security Agency, we primarily build partnerships across the nation, across every level of government with our critical infrastructure partners, across industry, non-governmental organizations, academia, to ensure that we develop a robust team that is working together collaboratively to improve the security and the resilience of our nation’s cyber and critical infrastructure. That demands teamwork, that demands that we’re all working together. And certainly, as we think about the United States of America, we recognize that it takes all of us working together to ensure that we are providing for the common defense, promoting the ability of our nation to succeed and our posterity to ensure these blessings that we have here continue. So it really is about working together and not pointing fingers necessarily. So as we go back and we look at the outage, it reminds me of the challenges that occur in emergency communications every single day. There’s disruptive and destructive weather that challenge the ability of connections to stay operable. And those are things that communicators plan around all the time. So this outage that was described in the press about AT&T as a software process that they were working through, they were following procedures and they were doing those late at night in order to effect the least number of possible customers as they were making that update, and as their press report indicated, some procedural issues developed, causing them to have to overcome those facts and things out of the system and get customers back up online. So, in effect, the way that the communications are challenged and upgraded over time as we continue to innovate, it did work the way that we have certainly learned over the last 200 years, and the transitional technologies that we expect as communicators.

Tom Temin Right. But carriers update their networks, add segments to their networks, add towers, and so forth all the time. It’s not supposed to be an interruption when they do that. In this case, something got out of order or whatever. But my question then is, are you still working with them in the cooperation spirit, just to find out precisely what happened so they can learn something and CISA can learn something?

Billy Bob Brown Jr. Sure. Yeah, that’s a great observation. But just to make sure that we’re clear, every single network across the globe experiences challenges. And most often we don’t hear anything about those, because those challenges are being discovered in a testing environment not connected to the live production network. But sometimes the complexity of upgrades are missed in the testing environment, and they’re discovered during the production environment, which is a challenge that was discovered, which is why normally all these updates are done at night time to cause the least disruption to customers, so that if something that wasn’t discovered in the testing environment, it turns out to be an impact in the production environment, they’re able to back that out in order to restore the network to its previous setting, so that the least amount of customer interruptions occur. So that’s exactly what we saw here. Yes, there are lessons learned, certainly the procedures that were being used by AT&T, certainly there’s lesson to be learned, I think, by every carrier across the globe about doing detailed analysis of what your production environment looks like, so that that is fully replicated in the testing environment, so that all of those kinds of a challenges are discovered.

Tom Temin And of course, AT&T is a company just like all the carriers have their intellectual property that belongs to them for how they operate and upgrade their networks. Is there anything from this that can be put into the open source so that it doesn’t happen to T-Mobile or Verizon or somebody like that?

Billy Bob Brown Jr. Yeah. So what we’ve seen. And again, here at CISA we really focusing kind of on collaboration. How are we working together? How are we learning lessons together when we see challenges? And we’ve seen all of our carrier partners, certainly at meetings that we sponsor here, both the National Security Telecommunications Advisory Council, certainly in our communications sector, part of this sector risk management agency, a process. We see carriers sharing information and insights all the time. We certainly see that as well. I was just in Santa Fe meeting with all of the manufacturers for radio systems. They were sharing ideas, they were sharing discoveries that they’re finding in their engineering environments that are general in nature. There’s secret sauce that every single carrier has. So some things will not apply to everyone because there’s some differences in the way that they’re actually designed and implemented. But there are some generalities that can be shared and they are being shared collaboratively with their neighbors. Because at the end of the day, we really all want to do is to ensure that information moves seamlessly to support what I care most about, which is the safety of the citizen.

Tom Temin We are speaking with Billy Bob Brown Jr., executive assistant director for emergency communications at the Cybersecurity and Infrastructure Security Agency. And AT&T, of course, is the prime contractor for FirstNet. How did that situation play out?

Billy Bob Brown Jr. Well, that situation with the First Data Authority’s oversight of their critical partner in this public private partnership worked beautifully. It worked exactly the way that really the law that was created in 2012 envisioned the idea that government and private industry could work together to seamlessly create the nationwide public safety broadband network, giving access to first responders to broadband technologies, but giving greater level of insight to the commercial enterprise of what the actual government needs are. That is the real distinguisher here in this venture that AT&T, in their close partnership with the First Aid Authority, has the ability to more deeply understand, what it is that first responders needs. And they’re able to go out there and deliver that. And then AT&T has just done a tremendous job in sharing insights that they’ve learned with carriers and governments around the world on how public safety requirements are different than what we see in the general civilian user. So the partnership with the authority and AT&T has worked super well. There’s a report that the authority just put up that talked about some contractual requirements that AT&T has to provide more insight in the lessons learned from this actual event. So from our perspective, certainly from my perspective, in a proprietary corporate environment, there is no other opportunity to gain that. So the first net authority has this unique relationship and this public private partnership that allows it to gain additional insights and then in turn, potentially provide new instructions or new insights to their partner on how the government’s requirements can be met, certainly as we go forward into new technologies, because that’s the thing that we’re thinking about every single day.

Tom Temin But during that outage, emergency responders did have ways around if AT&T was down in their area.

Billy Bob Brown Jr. Yeah, absolutely. Now, interestingly, and I believe AT&T highlighted that, I know the authority highlighted that. As AT&T discovered the challenge and they started working on the restoration, they prioritized restoration first to the FirstNet network. So they restored the connections, the 6 million connections that the authority has first. And then they work to improve and restore connections for the other 200 million across the nation. So that is another example of the beauty of this public private partnership delivering capability to the first responder community, working just the way that that we had hoped that it would. But yes, you’re right. We talked about in communications plan, the idea of having a page plan, moving from primary access to an alternate access to contingency access, and then to emergency access. Certainly, we have seen some of our interagency partners this week discuss the requirement and the criticality of having multiple means of moving information, whether that’s using land mobile radio systems, whether it’s using point to point connections, wired connections, as opposed to wireless, all of those play into the idea of having redundancy to ensure resiliency for emergency communications.

Tom Temin Yeah. Don’t send those old radios to the crusher just quite yet. You might need to flip them on occasionally. And from a standpoint, I mean, emergency communications is largely wireless. But you just mentioned to sometimes you might need a point to point over cable or fiber or copper, that type of thing. Maybe talk about how you view the infrastructure writ large, because we still have the Potts system. Not that many people use it relative to years ago. Can’t even find a phonebook anymore. But it still exists. And then there is voice over IP which gets from the potts over to the internet, and then you’ve got cable, there’s a lot to it. And the wireless is connected to the ground based systems and vice versa. It’s really one infrastructure with many, many components of different ages. How does CISA consider that when you’re looking at the totality of security of it?

Billy Bob Brown Jr. One of the things that we think about in terms of resiliency is multiple modes of moving information. As you mentioned, certainly in a broadband environment, we have user devices that we use the wireless access for network to connect about using radio frequency to cell towers,  getting the information into the cell tower. And then from there it becomes wired. So wired into the core or network wired across the entire exchange network, and then a wireless, depending on who they’re reaching to, a wireless connection on the other end, whether that’s a WiFi wireless connection, a cell, a wireless connection, all those wireless options to include satellite, certainly, as we’ve thought about and seen the advent of low Earth orbit access. It’s tremendously opened up opportunities to move signals, even in some of those areas that have no connections because of economies of scale, that certainly all the commercial carriers or others think about,  building out to the last mile in the farthest rural reaches, low-Earth orbit presents an opportunity that we can actually reach every square inch of the globe to deliver signals to ensure that, emergency information can be, submitted and forms of request for assistance or coordinated in the form of emergency operations.

Tom Temin So in some sense, then if you look at it from a risk management standpoint, a wireless link, you don’t want to go down, you don’t want it to be hacked. But it’s really the trunk system, the core system that is more like the family jewel that has to be protected at all costs.

Billy Bob Brown Jr. That’s exactly right. At the end of the day, for me as we think about how do you build a robust plan that ensures that information can dynamically get from point A or from operator one to operator two, to save citizens, protect property, there’s got to be a multiple means of doing that. Even if it goes to, and I hate to say it this way, sneakers. We jokingly say sneakernet. If somebody’s carrying a handwritten signal to someone else to convey information. So that’s the beauty of the resilience across the nation. And at some level, I wonder if we have forgotten how resilient we really are. We’ve experienced, certainly in the last ten years, challenging power outages across wide areas. But we have survived. We’ve survived 24 hour power loss, 12 hour power loss, 36 hour power loss. And we’ve been able to survive and thrive, so we have more resilience as a nation. We have more resilience as a people than sometimes I think we give ourselves credit for. So when challenges or disruptions happen, there’s an inadvertent or a heightened sense of concern that is a little bit askew. And so at some level, I am always pushing the idea of improving our understanding of the importance of resilience, thinking about resilience in the day to day formation of the good planning process. Then when we think about potentially catastrophic events, certainly as we think about New Madrid, our earthquake is one that we’ve thought about that hundred year event, which will be challenging, but we can survive that challenge as a nation. But we just have to think from the perspective of teamwork, wanting to pull together to save lives and property, wanting to ensure the safety of each and every citizen. That’s where it really fundamentally starts thinking as a team. Not pointing fingers. Not pushing others down, but working together, pulling each other up in order to ensure that certainly our nation and I would argue that every good government on the globe is thinking the same way. How do we keep citizens safe? Fundamentally starts about thinking how can emergency information move to seamlessly help those operations be conducted expeditiously. That’s resilient, interoperable, with priority. That’s how we ensure that there’s good security for all of our emergency information systems. That’s how we ensure the safety of the system.

© 2024 Federal News Network. All rights reserved. This website is not intended for users located within the European Economic Area.


Leave a Reply

Your email address will not be published. Required fields are marked *