Risks, emerging when developing or using open-source software

It used to be that only specialized software houses and tech giants had to lose sleep over open-source vulnerabilities and supply chain attacks. But times have changed. Today, even small businesses are running their own development shops, making the problem relevant for everyone. Every second company’s internal IT teams are busy writing code, configuring integrations, and automating workflows — even if its core business has absolutely nothing to do with software. It’s what modern business efficiency demands. However, the byproduct of that is a new breed of software vulnerabilities — the kind that are far more complicated to fix than just installing the latest Windows update.

Modern software development is inseparable from open-source components. However, the associated risks have proliferated in recent years, increasing in both variety and sophistication. We’re seeing malicious code injected into popular repositories, fragmented and flawed vulnerability data, systematic use of outdated, vulnerable components, and increasingly complex dependency chains.

The open-source vulnerability data shortage

Even if your organization has a rock-solid vulnerability management process for third-party commercial software, you’ll find that open-source code requires a complete overhaul of that process. The most widely used public databases are often incomplete, inaccurate, or just plain slow to get updates when it comes to open source. This turns vulnerability prioritization into a guessing game. No amount of automation can help you if your baseline data is full of holes.

According to data from Sonatype, about 65% of open-source vulnerabilities assigned a CVE ID lack a severity score (CVSS) in the NVD — the most widely used vulnerability knowledge base. Of those unscored vulnerabilities, nearly 46% would actually be classified as High if properly analyzed.

Even when a CVSS score is available, different sources only agree on the severity about 55% of the time. One database might flag a vulnerability as Critical, while another assigns a Medium score to it. More detailed metadata like affected package versions is often riddled with errors and inconsistencies too. Your vulnerability scanners that compare software versions end up crying wolf with false positives, or falsely giving you a clean bill of health.

The deficit in vulnerability data is growing, and the reporting process is slowing down. Over the past five years, the total number of CVEs has doubled, but the number of CVEs lacking a severity score has exploded by a factor of 37. According to Tenable, by 2025, public proof-of-concept (PoC) exploit code was typically available within a week of a vulnerability’s discovery, but getting that same vulnerability listed in the NVD took an average of 15 days. Enrichment processes, such as assigning a CVSS score, are even slower — Sonatype in the same study estimates that the median time to assign a CVSS score is 41 days, with some defects remaining unrated for up to a year.

The legacy open-source code problem

Libraries, applications, and services that are no longer maintained — either being abandoned or having long reached their official end of life (EOL) — can be found in 5 to 15% of corporate projects, according to HeroDevs. Across five popular open-source code registries, there are at least 81 000 packages that contain known vulnerabilities but belong to outdated, unsupported versions. These packages will never see official patches. This “legacy baggage” accounts for about 10% of packages in Maven Central and PyPI, and a staggering 25% in npm.

Using this kind of open-source code breaks the standard patch management lifecycle: you can’t update, automatically or manually, a dependency that is no longer supported. Furthermore, when EOL versions are omitted from official vulnerability bulletins, security scanners may categorize them as “not affected” by a defect and ignore them.

A prime example of this is Log4Shell, the critical (CVSS 10) vulnerability in the popular Log4j library discovered back in 2021. The vulnerable version accounted for 40 million out of 300 million Log4j downloads in 2025. Keep in mind that we’re talking about one of the most infamous and widely reported vulnerabilities in history — one that was actively exploited, patched by the developer, and addressed in every major downstream product. The situation for less publicized defects is significantly worse.

Compounding this issue is the visibility gap. Many organizations lack the tools necessary to map out a complete dependency tree or gain full visibility into the specific packages and versions embedded within their software stack. As a result, these outdated components often remain invisible, never even making it into the remediation queue.

Malware in open-source registries

Attacks involving infected or inherently malicious open-source packages have become one of the fastest-growing threats to the software supply chain. According to Kaspersky researchers, approximately 14 000 malicious packages were discovered in popular registries by the end of 2024, a 48% year-over-year increase. Sonatype reported an even more explosive surge throughout 2025 — detecting over 450 000 malicious packages.

The motivation behind these attacks varies widely: cryptocurrency theft, harvesting developer credentials, industrial espionage, gaining infrastructure access via CI/CD pipelines, or compromising public servers to host spam and phishing campaigns. These tactics are employed by both spy APT groups and financially motivated cybercriminals. Increasingly, compromising an open-source package is just the first step in a multi-stage corporate breach.

Common attack scenarios include compromising the credentials of a legitimate open-source package maintainer, publishing a “useful” library with embedded malicious code, or publishing a malicious library with a name nearly identical to a popular one. A particularly alarming trend in 2025 has been the rise of automated, worm-like attacks. The most notorious example is the Shai-Hulud campaign. In this case, malicious code stole GitHub and npm tokens and kept infecting new packages, eventually spreading to over 700 npm packages and tens of thousands of repositories. It leaked CI/CD secrets and cloud access keys into the public domain in the process.

While this scenario technically isn’t related to vulnerabilities, the security tools and policies required to manage it are the same ones used for vulnerability management.

How AI agents increase the risks of open-source code usage

The rushed, ubiquitous integration of AI agents into software development significantly boosts developer velocity — but it also amplifies any error. Without rigorous oversight and clearly defined guardrails, AI-generated code is exceptionally vulnerable. Research shows that 45% of AI-generated code contains flaws from the OWASP Top 10, while 20% of deployed AI-driven applications harbor dangerous configuration errors. This happens because AI models are trained on massive datasets that include large volumes of outdated, demonstrational, or purely educational code. These systemic issues resurface when an AI model decides which open-source components to include in a project. The model is often unaware of which package versions currently exist, or which have been flagged as vulnerable. Instead, it suggests a dependency version pulled from its training data — which is almost certainly obsolete. In some cases, models attempt to call non-existent versions or entirely hallucinated libraries. This opens the door to dependency confusion attacks.

In 2025, even leading LLMs recommended incorrect dependency versions — simply making up an answer — in 27% of cases.

Can AI just fix everything?

It’s a simple, tempting idea: just point an AI agent at your codebase and let it hunt down and patch every vulnerability. Unfortunately, AI can’t fully solve this problem. The fundamental hurdles we’ve discussed handicap AI agents just as much as human developers. If vulnerability data is missing or unreliable, then instead of finding known vulnerabilities, you’re forced to rediscover them from scratch. That’s an incredibly resource-intensive process requiring niche expertise that remains out of reach for most businesses.

Furthermore, if a vulnerability is discovered in an obsolete or unsupported component, an AI agent cannot “auto-fix” it. You’re still faced with a need to develop custom patches or execute a complex migration. If a flaw is buried deep within a chain of dependencies, AI is likely to overlook it entirely.

What to do?

To minimize the risks described above, it will be necessary to expand the vulnerability management process to include open-source package download policies, AI assistant operating rules, and the software build process. This includes:

Using a comprehensive cloud workload security solution;
Checking open-source packages that are used in your software development process with threat intelligence feeds for open source components ;
Considering security measures to protect AI code and AI agents;
Systematically removing obsolete open-source components.

Kaspersky official blog – Read More