The Hidden Risk in Your Stack: Open Source Supply Chain Exposure

Transcript

>> Hayden Smith: We need to start thinking about -- especially with the, like, onset of AI, now it's really easy to go and scale and create 30 fake accounts at once. You need to start thinking about, who's this user account, right? How long have they been around? What's their history with open source? Are they a legitimate contributor, or are they just here to, you know, cause a ruckus? And you need to start inspecting those starting today.

>> Caleb Tolin: Hello, and welcome to another episode of Data Security Decoded. I'm your host, Caleb Tolin. And if this is your first time joining us, welcome to the show. Make sure you hit that Subscribe button so you're notified when new episodes go live. And if you're already a subscriber, thanks for coming back. Give us a rating. Drop a comment below. This really helps us reach listeners like you who are eager to learn more about reducing risk across their business. Now, a couple of years ago, I read a book by Randi Zuckerberg. Yes, Mark Zuckerberg's sister. It's called "Pick Three," and it's all about being well lopsided in your life. Without giving away too much of the book, there are five core areas that she identifies. And typically, most people highly index on one of those five areas. And that is where your passionista lives. I, for example, am a sleep passionista. Now, I don't know if our guests would self-describe themselves as a passionista in any way, but I'm going to bestow this title upon them. And so, today, I sat down with our third-party risk passionista, Hayden Smith. Hayden is the CEO of Hunted Labs, and we did a deep dive into supply chain attacks. I know we've covered this topic at a high level before, but we really wanted to deep dive into how they operate, where they come from, and what organizations can do to get a grip on this issue. Hayden has a lot of deep expertise in this space, and we had a fascinating conversation about it. Let's dive into it. Thank you, Hayden, for joining us. I'm really excited for you to join us on the Data Security Decoded podcast. Before we dive into the meat of the conversation, what is something not related to cyber that you are completely obsessed with lately? For me, I'm going to go with crystals and rare minerals. You can see my amethyst here in the background. I have a little fluorite here that I like to keep on my desk, too. I'm a crystal fanatic to some extent. Maybe not a fanatic. I've seen some people who are much bigger fans than myself, or have better collections. But what are you obsessed with that's not related to cyber lately?

>> Hayden Smith: Yeah. So I think what I'm obsessed with right now -- I mean, I'm really obsessed with college football. So right now, my USC Trojans are on the up and up. So I follow that pretty closely. So, definitely consider myself a fanatic. And then fueling that fire, I just got done coaching my son's flag football team for the first time, which I got really into and was a whole lot of fun. When I took the volunteer position, I didn't understand how into it I would get. And it was very, very exciting and very fun to do. So those are definitely the two things, I think, that consume me outside of my day-to-day job.

>> Caleb Tolin: Awesome. Well, let's dive into the meat of the conversation here. Like I said, we're going to talk all about supply chain attacks. And this is something we've definitely touched on at a high level in other conversations, but we're really going to drill into it this time. I know this is an issue that is most prevalent in government and critical infrastructure, but it's certainly -- you know, commercial businesses are not exempt from that. So diving a little bit deeper into this, I know this is what you specialize in, and this is what your business does. One of the biggest risk factors for third-party applications is the reliance on open-source software. Let's start there. Can you break down why that's such a critical problem?

>> Hayden Smith: Yeah, absolutely. So open-source software really powers everything we know and love today, right? Even this recording that we're doing now. Like, everything about this is powered by open-source software. It powers every application. You know, it underpins a lot of AI infrastructure, which has been such a reliance on us in the past few years. And really, when you scope it down to, you know, what are, like, some data points around this? So, like, most enterprise applications, about 70, 80%, sometimes higher, is all composed of open-source software. And that's awesome because it allows us to move really fast. So we can take components that have been built by wonderful, talented developers and reuse those in our applications very quickly and start to build and iterate on software very fast. One of the downsides, though, is that every single developer that maintains each single library or component from an open-source perspective that your enterprise relies on, that's maintained to a different standard. So their standard on security and on compliance may not be the same as your organization. So when you're looking at it from an enterprise perspective and saying, "You know, we consume these open-source bits," it's really important to understand which ones are your critical kind of open-source pieces that you need to focus on in order for your application to be reliable and resilient when you ship that, you know, software to your end user, right? And it's also kind of like that's to the right of the problem. To the left of the problem, you need to be looking at, you know, what kind of vulnerabilities or what kind of weaknesses are in this code before I decide to consume it responsibly, right? So talking about responsible consumption of open-source, you know, there's no argument that we can do away with open-source. That's just not realistic, right? But really, it's about managing the risk around that as you choose to adopt it and as you choose to ship it, you know, embedded in your enterprise applications that you're producing to your customers. And so it's in -- you know, to kind of answer your question, though, it's in everything, right? And so we really have to be thinking about what are some next-generation kind of like techniques that we could use to counter a lot of these attacks that we're seeing lately. Some of them are just getting pretty out of hand. If I were, like, a CISO, I would think of them as almost intimidating because the, you know, evolution of their tactics, techniques, procedures, and executing these attacks has increased exponentially. So it's an awesome space to be in. The reliance on open-source software and how it gets tied up into software supply chain attacks is super fascinating. And, you know, I could talk about that for hours.

>> Caleb Tolin: Awesome. Well, something I do want to kind of drill into that you just talked about there was the idea of these TTPs. So, can you break down from the entry point of really where these supply chain attacks typically happen?

>> Hayden Smith: Yeah.

>> Caleb Tolin: What does that entry point typically look like? And then what happens after that? What are the attackers doing once they're in the environment?

>> Hayden Smith: Yeah. So I can provide kind of like a real-world example, a recent one that was disclosed by a good friend of mine, Paul McCarty. He has a great kind of project out there called Open Source Malware, and definitely encourage everyone to check it out because it's an awesome resource kind of tracking open-source threats. So like a typical entry point for, you know, a software supply chain attack that's targeting a piece of open-source software. Ironically, the best way to attack open-source is to contribute, right? So what we're seeing is people starting to create real accounts. And whether it's on GitHub or whether it's like with npm and they're creating fake accounts on npm and then publishing packages that people think are legitimate, right? The attacker on the offensive side is trying to think, "How can I get ownership over this code?" In the case of XZ, which was kind of an attack two or three years ago, that was really building rapport with the open-source maintainer to leverage that and really exploiting trust, which is the kind of critical linchpin of the open-source software community, and really taking that and exploiting that to get access over it. And once they have that, you know, there's no like, you know, doing multi-factor authentication and bypassing that and things you would think of typically in a legacy kind of cybersecurity attack. It's really about just contributing and creating fake stuff so you could create fake packages that, you know, are full of malware, which was the case with the disclosure that Paul McCarty made with the IndonesianFoods campaign, which created thousands upon thousands -- I think there was a count of around 86,000 basically fake npm packages that were published just to pollute the marketplace, right? So there wasn't a ton of, like, super malicious activity going on, but some of those tactics and techniques scaling beyond initial access are really frightening. So you're looking at, you know, some of the core tenets of that, where, like, a new fake package full of malware being published every seven seconds. And then within that package, what gets really frightening is it sources from the package.json. It will actually source more fake packages, right? So then it's extremely concerning because you're sourcing the software supply chain attack itself, where they get established that initial access is actually bringing in even more bad packages, right? So if you pulled down one package, you're actually pulling down, you know, maybe in this case, eight to 10 new bad packages, right? And so all that started, though -- if we were to, like, peel back the onion, how did this really begin? It started with a guy creating a bunch of fake accounts and then pretending to be, you know, open source contributor that, you know, anyone of us would just think is just another guy making software. And so that's why you really have to, like, keep your guard up and really start using tools to start inspecting, like, the code itself. In addition to looking at who is this person behind the code, you know. Is this a ghost account that was just created two days ago to get access and to start contributing? And that's kind of two core tenets I would really enforce when you're trying to think about how do I do the right thing today? Start looking at, "Is the code good?" which shouldn't be new for anyone, right? That's typical, like, Software Security 101. But we need to start thinking about -- especially with the, like, onset of AI, now it's really easy to go and scale and create 30 fake accounts at once. You need to start thinking about who's this user account, right? How long have they been around? What's their history with open source? Are they a legitimate contributor, or are they just here to, you know, cause a ruckus? And you need to start inspecting those starting today.

>> Caleb Tolin: You know what it reminds me of is something -- somewhat like almost like a Reddit community. Like you have these moderators who are going in and kind of vetting to make sure that everyone's following the rules. And it's like that's kind of a similar system that we probably need to start looking at, you know, as kind of a North Star, if you will.

>> Hayden Smith: And like, so the open source community has, like, you know, certain projects that are very well maintained, right? So these are like Cloud Native Compute Foundation projects, OpenSSF Foundation projects, Linux Foundation. All of them do great work where they have maintainers that are basically like the stewards of the code, right? So they're reviewing pull requests. They're keeping an eye out for security issues, bugs, feature improvements, right? All of that's super useful. But it's really hard to scale that across the entirety of the open source ecosystem, where you're dealing with millions of packages, looking at, you know, from various language ecosystems like npm. You're looking at Go. You're looking at Python. And to really scale that for every single person is hard, right? So that leaves kind of the burden with enterprises to, you know, take it upon themselves to do their due diligence on open source before they, you know, decide to pull in and include that piece of open source in their application.

>> Caleb Tolin: I want to go back to what you were talking about with the idea of threat actors and bad actors contributing packages into these open source communities. So when you were saying that, the first thing that came to mind is threat hunting, threat detection, what roles they can kind of play in helping identify some of these malicious packages and the malware that's baked into them. So, can you kind of dive into that and talk about what role threat hunting and detection plays in identifying these vulnerabilities for supply chain attacks? And maybe even touch on the idea of this AI-powered threat hunting, which we're starting to see more and more and more of with the boom of AI.

>> Hayden Smith: Yeah. So a lot to unpack there. Threat hunting, using threat intelligence to basically dive in and inspect software before you use it, right? I'm trying to, like -- I think when we think about threat hunting, I like to think of it as a really proactive security measure. So we're trying to go out there, almost like conduct reconnaissance in advance of using software, right? So a lot of the kind of attacks that you mentioned when you're saying, you know, like, people are coming in and inserting malware into the, like, open-source community, the place that they're doing that at is really these upstream package managers. So, like in that IndonesianFoods campaign, that was really in the npm registry. And you can monitor that registry for packages, for accounts, for how old accounts are that, you know, are contributing the software. So you could do inspection on that. So when we talk about, you know, using threat hunting to go out and search for this stuff, you really need to drive that with really good threat intelligence. And for enterprises, what I try and focus on is we're not trying to boil the entire ocean here. We're trying to just look at your software, the bits of open-source software that you use, and then you need to go out and threat hunt against that target set, right? So you need to say okay, we rely on these 10 open-source packages. We're going to go out there and hunt against every single one of them and start looking at, is there malware against these? Are there unknown zero days against this? And that's where you get into, like, some places we're exploring now at Hunted, where we're trying to use models to go out there and interrogate code to find unknown vulnerabilities that maybe someone isn't disclosing, right? And I made kind of a big fuss about this with a super widely used component called runc lately. And they had three vulnerabilities disclosed one week by one engineer that worked at Huawei. But then, when you go into the kind of actor's profile, you'll see that he intentionally does not disclose vulnerabilities that he has found against runc because his boss does not want him to, right? So that may -- you know, if you're relying on that component, that makes you kind of think, "Okay, so what does he have in his back pocket? Or what does any bad actor have in his back pocket that I'm not seeing?" Right? And that's where, like, you could use large language models to come in and churn over the code and say, you know, "What am I missing?" And look at, okay, is there anything hidden in here that I've missed during, like, a manual check or a static check that could have gone unseen, right? And so we're starting to see some really interesting results there. And I think a lot of different security vendors are kind of playing in that space because they know it is being hit really hard right now by various kind of threat actors.

>> Caleb Tolin: What does the recovery strategy look like for these attacks, where we have some malicious actor embedding malware into some code, an organization applies that into their environment, and then it's exploited over time? What role does recovery play there?

>> Hayden Smith: Yeah. So when you look at it from, like, a software supply chain perspective, you typically, if you pulled down malware, you have to neutralize that, right, first. And then when you're looking at, "Okay. How do I prevent my developers from continuously pulling it?" That's the hard part, right? So you have to get that reversion and that software you use. So pulling it back and looking at okay, I was on version, let's say, 1.5. That was the tampered version, right? Now, I have to pin, right? This comes into a really good proactive kind of software supply chain measure, which is pinning your dependencies, right? So that way you're not automatically just boom, ingesting stuff, pulling that down. And so, really, it often involves, like, a version rollback. So that's really, like, when you're starting -- just restore or recover at that point. And so that's easy, I will say, if you're dealing with an organization that has really good software supply chain measures, right? So you kind of have a constant inventory of what's flowing in and out from a software perspective. So you know what kind of open source you're using because you're doing things like SBOM generation and SBOM management, and vulnerability scanning. And you're already out there conducting threat hunts. That gets really hard if one, you're not doing those things. And two, if you start looking at the types of attacks that we're dealing with today, right? So again, that IndonesianFoods campaign, that was like thousands of different packages. And if you're at a large enterprise and you have to monitor for every single one of those identifiers, recovery gets extremely complicated very quickly. So now it's not hunting for one kind of, like, zero-day event, and it's hunting for 20, 30, 40, 100 all at once that hit you over the head. So having good contingency planning in place is super, super important for software supply chain incident response type events. And I like to kind of take this moment to say, like, a lot of the things you could do to put yourself on, like, a really good, you know, security posture for a software supply chain. Don't involve doing anything, I would say that is outside the norm for information security best practices, right? These are very standard cybersecurity things. So doing things like contingency planning, having awareness if you rely on 10 critical things in order for your product to work, you better know what they are, right? You don't want to find out when the attack is already unfolding. You're already on your back foot at that point. And the way you do that, right, is really through continuous monitoring. So having continuous monitoring in place so you don't get to the point of having to deal with recovery, right? So recovery is like really part of that contingency plan that we need to have available at any given moment in time. Because in worst case, you could get to a point of like, they stole all my source code, right, that was previously private. And that happened to organizations like F5, like Red Hat. And, you know, that's a whole different scenario where you're like that, you can't just back up a software version and get your data back, right? So you have to be, you know, keeping backups of that data, keeping backups of everything you have and get, and being able to restore that in that event, right? So the software supply chain attacks can really be something as little as one package, or they could come in and steal all your source code. So it could get pretty gnarly very fast.

>> Caleb Tolin: Right. Right. Now, I'm going to put on some rose colored glasses for a second. Let's say there are some organizations out there that are just not exposed to this threat. I think that's, you know, quite optimistic. It sounds like this is much more of a pervasive issue than most people tend to understand. But let's say people are out there and they're just not exposed at all, or, you know, more likely they are at least unaware of their exposure. In addition to investing in modern threat hunting, cyber recovery plans, the things that we've talked about already here, what can organizations do to prepare for these types of attacks outside of those InfoSec best practices that you were just referring to?

>> Hayden Smith: Yeah. So I think there's a bunch of stuff. I don't have, like, all the time to get into it, but stuff like keeping -- you know, I'll just go down into the continuous monitoring bucket because I think that's so important. And I think that's really the starting point because a lot of best practices I prescribe are falling under that kind of bucket or section of cyber, right? So doing things like knowing your dependencies. Do you have a complete inventory of all the software that you use, whether proprietary or open source, right? You need to have insight into that. That's really step one. You need to have situational awareness of what you're using. You also need to know what are the most popular critical pieces of open source in that. You need to make sure of those versions. You're doing things like pinning those dependencies. So you need to make sure you're protecting your organization just on the way you do software development. You need to look at -- you know, from the continuous monitoring lens, you need to go upstream and see how are these people maintaining the code that you rely on? So, have they not made an update in six years, and there's three active vulnerabilities against it? Or do they update it every single week and cut a new release every single week as well, right? Those are two different stories in terms of open source risk. So having that level of insight. And then again, kind of the final two things there, as I'm like, okay, you need to start getting that inventory. So, scanning your code, generating SBOM, storing that, iterating over what are the most critical open source dependencies. And also having that continuous model baked into your software development processes for your application. So you need to constantly be interrogating the code to see if it's good or not. So every new release that gets pushed for a dependency you rely on, you need to be interrogating that code. You need to be looking for changes that could be suspicious, anomalous, coming -- and this is the last part -- coming from people that you may not know. So these are people that are previously unknown to the project or unknown to the community. Maybe their GitHub account is a day old, and it's just some AI-generated account generating AI-generated code. And that could introduce a ton of, you know, problems for your organization if you're just going to -- you know, if that's kind of the state of the software that you're choosing to consume. So everything I do or prescribe is really falling into that open -- the kind of continuous monitoring of the open source that you're consuming. And really driving, you know, for us, we really prescribe, like, using threat intelligence to have that insight available continuously. So if there is something going on with a, you know, critical dependency in the open source ecosystem that you rely on, that you just have insight and situational awareness of that risk.

>> Caleb Tolin: Absolutely. Hayden, where can folks learn more about the incredible work that you're doing?

>> Hayden Smith: Yeah. So you could go to huntedlabs.com, learn about all of our awesome stuff that we have coming. And we have intercept there. You can learn about our product. We also have the Hunting Ground, where we cover security research or really interesting things that we find unfolding in the software supply chain, kind of security space. So those two places is where I point you to have a good time.

>> Caleb Tolin: Thank you again for joining. This is a really insightful conversation. I really appreciate your time and the conversation. We'll link to all those resources that you mentioned in the show notes so that they're easy for everybody to access. So until next time.

>> Hayden Smith: Yeah, thank you.

[ Music ]

>> Caleb Tolin: And that's a wrap on today's episode of Data Security Decoded. If you like what you heard today, please subscribe wherever you listen and leave us a review on Apple Podcasts or Spotify. Your feedback helps us understand what you want to hear more about. And if you want to reach out to us about the show, email me directly at data-security-decoded@n2k.com. That's the letter n, number 2, letter k.com. Thank you to Rubrik for sponsoring this podcast. The team at N2K includes senior producer Alice Carruth and executive producer Jennifer Eiben, content strategy by Ma'ayan Plaut, sound design by Elliott Peltzman, audio mixing by Elliott Peltzman and Tré Hester, video production support by Brigitte Criqui Wild and Sarelle Joppy. Thank you for listening. See you next time.

[ Music ]

HOST(S):

As the host of Data Security Decoded, Caleb Tolin dives deep with cyber experts to deliver actionable, vendor-agnostic insights to reduce data security risks and improve cyber resilience outcomes. Caleb asks the incisive questions that you need answered, extracting actionable guidance for defenders. Come be obsessed with improving your organization's cyber resilience.

Schedule: Two times per month. Every other Tuesday.

Credits: Data Security Decoded is a podcast by Rubrik.

Creator: Rubrik