Cybercriminal abuse of large language models

Cybercriminals are continuing to explore artificial intelligence (AI) technologies such as large language models (LLMs) to aid in their criminal hacking activities.
Some cybercriminals have resorted to using uncensored LLMs or even custom-built criminal LLMs for illicit purposes.
Advertised features of malicious LLMs indicate that cybercriminals are connecting these systems to various external tools for sending outbound email, scanning sites for vulnerabilities, verifying stolen credit card numbers and more.
Cybercriminals also abuse legitimate AI technology, such as jailbreaking legitimate LLMs, to aid in their operations.

Generative AI and LLMs have taken the world by storm. With the ability to generate convincing text, solve problems, write computer code and more, LLMs are being integrated into almost every facet of society. According to Hugging Face (a platform that hosts models), there are currently over 1.8 million different models to choose from.

LLMs are usually built with key safety features, including alignment and guardrails. Alignment is a training process that LLMs undergo to minimize bias and ensure that the LLM generates outputs that are consistent with human values and ethics. Guardrails are additional real-time safety mechanisms that try to restrain the LLM from engaging in harmful or undesirable actions in response to user input. Many of the most advanced (or “frontier”) LLMs are protected in this manner. For example, asking ChatGPT to produce a phishing email will result in a denial, such as, “Sorry, I can’t assist with that.”

For cybercriminals who wish to utilize LLMs for conducting or improving their attacks, these safety mechanisms can present a significant obstacle. To achieve their goals, cybercriminals are increasingly gravitating towards uncensored LLMs, cybercriminal-designed LLMs and jailbreaking legitimate LLMs.

Uncensored LLMs

Uncensored LLMs are unaligned models that operate without the constraints of guardrails. These systems happily generate sensitive, controversial, or potentially harmful output in response to user prompts. As a result, uncensored LLMs are perfectly suited for cybercriminal usage.

Cybercriminal abuse of large language models — *Figure 1. An uncensored LLM, OnionGPT, advertised on the hacking forum Dread.*

Uncensored LLMs are quite easy to find. For example, using the cross-platform Omni-Layer Learning Language Acquisition (Ollama) framework, a user can download and run an uncensored LLM on their local machine. Ollama comes with several uncensored models such as Llama 2 Uncensored which is based on Meta’s Llama 2 model. Once it is running, users can submit prompts that would otherwise be rejected by more safety-conscious LLM implementations. The downside is that these models are running on users’ local machines and running larger models, which generally produce better results but requires more system resources.

Another uncensored LLM popular among cybercriminals is a tool called WhiteRabbitNeo. WhiteRabbitNeo bills itself as a “Uncensored AI model for (Dev) SecOps teams” which can support “use cases for offensive and defensive cybersecurity”. This LLM will happily write offensive security tools, phishing emails and more.

Researchers have also published methods to demonstrate how to strip alignment that is embedded into the training data of existing open-source models. Once removed, a user can uncensor their LLM by using the modified training set to fine tune a base model.

Cybercriminal-designed LLMs

Since most popular LLMs come with significant guardrails, some enterprising cybercriminals have developed their own LLMs without restrictions that they market to other cybercriminals. This includes apps like GhostGPT, WormGPT, DarkGPT, DarkestGPT and FraudGPT.

For example, the developer behind FraudGPT, CanadianKingpin12, advertises FraudGPT on the dark web, and also has an account on Telegram. The dark web site for FraudGPT advertises some interesting features:

Write malicious code
Create undetectable malware
Find non-VBV bins
Create phishing pages
Create hacking tools
Find groups, sites, markets
Write scam pages/letters
Find leaks and vulnerabilities
Learn to code/hack
Find cardable sites
Millions of samples of phishing emails
6220+ source code references for malware
Automatic scripts for replicating logs/cookies
In-panel Page hosting included (10 pages/month) with Google Chrome anti-red page
Code obfuscation
Custom data set (upload your sample page in .html)
Bot creation of virtual machines and accounts (1 virtual machine per month on license)
Utilizing GoldCheck CVV checker
OTP Bot with spoofing (*additional package)
Check CVVs with GoldCheck API
Create username:password website configs
Remote OpenBullet configs
Scan websites for vulnerabilities across a massive CVE database (*PRO only)
Generate realistic phishing panels, pages, SMS and e-mails
Send mail from webshells

Talos attempted to obtain access to FraudGPT by reaching out to CanadianKingpin12 on Telegram. After considerable negotiation, we were finally offered a username and password at the FraudGPT dark web site. However, the username and password provided by CanadianKingpin12 did not work. CanadianKingpin12 then asked us to send them cryptocurrency to purchase a software “crack” for the FraudGPT login page. At this point it was clear that CanadianKingpin12 had no working product, and they were scamming potential FraudGPT customers out of their cryptocurrency. This was confirmed by several other victims who had also been scammed by CanadianKingpin12 when they attempted to purchase access to the FraudGPT LLM. Scams such as these are an ever-present risk when dealing with unscrupulous actors, and it continues a long tradition of scams in the cybercrime space.

Similar cybercriminal-designed LLM projects can be found elsewhere on the dark web. A cybercriminal LLM called DarkestGPT, which starts at .0015BTC for a one-month subscription, advertises the following features:

LLM jailbreaks

Given the limited viability of uncensored LLMs due to resource constraints and the high level of fraud and scams present among cybercriminal LLM purveyors, many cybercriminals have elected to abuse legitimate LLMs instead. The main hurdle that cybercriminals need to overcome are the training alignment and guardrails that prevent the LLM from responding to prompts with unethical, illegal or harmful content. A form of prompt injection, jailbreak attacks aim to put the LLM into a state where it ignores its alignment training and guardrails protection.

There are many ways to trick an LLM into providing dangerous responses. New jailbreaking methods are constantly being researched and discovered, while LLM developers respond by enhancing the guardrails in a sort of jailbreak arms race. Below are just a few of the available jailbreaking techniques.

Obfuscation/encoding-based jailbreaks

By obfuscating certain words or phrases, these text-based jailbreak attacks seek to bypass any hardcoded restrictions on specific words/topics, or to cause the execution to follow a nonstandard path that might bypass protections put in place by the LLM developers. These obfuscation techniques may include:

Base64/Rot-13 encoding
Different languages
L33t sp34k
Morse code
Emojis
Adding spaces or UTF-8 characters into words/text, among others~~etc.~~

Adversarial suffix jailbreaks

These attacks are somewhat like obfuscation and encoding tricks. Instead of modifying the tokens in the prompt itself, adversarial suffix jailbreaks involve appending random text to the end of a malicious prompt to elicit a harmful response.

Role-playing jailbreaks

This type of attack involves prompting the LLM to adopt the persona of a fictional universe/character that ignores the ethical rules set by the model’s creators and is willing to fulfill any command. This includes jailbreak techniques such as DAN (Do Anything Now), and the Grandma jailbreak which involves asking the chatbot to assume the role of the user’s grandmother.

Meta prompting

Meta prompting involves exploiting the model’s awareness of its own limitations to devise successful workarounds, effectively enlisting the model in the effort to bypass its own safeguards.

Context manipulation jailbreaks

This covers several different jailbreak techniques including:

Crescendo, a technique which progressively increases the harmfulness in prompts until some sort of rejection is received in order to probe for where and how LLM guardrails are implemented.
Context Compliance Attacks, which exploit the fact that many LLMs do not maintain conversation state. Attackers inject fake prior LLM responses into their prompts, such as a brief statement discussing the sensitive topic, or a statement expressing readiness to supply further details as per the user’s preferences.

Math prompt jailbreaks

The math prompt method evaluates how well an AI system can manage malicious inputs when they’re disguised using mathematical frameworks such as set theory, group theory, and abstract algebra. Rephrasing harmful requests as math problems can allow attackers to evade safety features in advanced large language models (LLMs).

Payload splitting

In this scenario, the attacker guides the LLM to merge several prompts in a way that produces harmful output. While texts A and B may seem benign when considered separately, their combination (A+B) can result in malicious content.

Academic framing

This method makes harmful content appear acceptable by framing it as part of a research or educational discussion. It takes advantage of the model’s interpretation of academic intent and freedom, often using scholarly language and formatting to bypass safeguards.

System override

This strategy tries to trick the model into believing it is functioning in a unique mode where usual limitations are lifted. It leverages the model’s perception of system-level functions or maintenance states to circumvent safety mechanisms.

How cybercriminals use LLMs

In December 2024, Anthropic, the developers behind the Claude LLM, published a report detailing how its users were utilizing Claude. Using a system named Clio, they summarized and categorized users’ conversations with their AI model. According to Anthropic, the top three uses for Claude were programming, content creation and research.

Analyzing the feature sets advertised by the criminal-designed LLMs, we can see that cybercriminals are using LLMs for mostly the same tasks as normal LLM users. Programming features of many criminal LLMs include the ability to assist cybercriminals in writing ransomware, remote access trojans, wipers, code obfuscation, shellcode generation and script/tool creation. To facilitate content creation, criminal LLMs will assist in writing phishing emails, landing pages and configuration files. Criminal LLMs also support research activities like verifying stolen credit cards, scanning sites/code for vulnerabilities and even helping cybercriminals come up with “lucrative” criminal ideas for their next big score.

Various hacking forums also shed additional light on criminal uses of LLMs. For example, on the popular hacking forum Dread, users were discussing connecting LLMs to external tools like Nmap, and using the LLM to summarize the Nmap output.

LLMs are also targets for cyber attackers

Any new technology typically brings along with it changes to the attack surface, and LLMs are no exception. In addition to using LLMs for their own nefarious ends, attackers are also attempting to compromise LLMs and their users.

Backdoored LLMs

A vast majority of the models available at Hugging Face use Python’s pickle module to serialize the models into a file that users can download. Clever attackers can include Python code in the pickle file, which runs as part of the deserialization process. Thus, when a user downloads an AI model and runs it, they may be running code placed into the model by an attacker. Hugging Face uses Picklescan, among other tools, to scan the models uploaded by users in an effort to identify models that misbehave. However, there have been several recent vulnerabilities in Picklescan, and researchers have already identified Hugging Face models containing malware. As always, make sure any file you plan to download and run comes from a trusted source and consider running the file in a sandbox to mitigate any risk of infection.

Retrieval Augmented Generation (RAG)

LLMs that utilize Retrieval Augmented Generation (RAG) make calls to external data sources to augment their training data with up-to-date information. For example, if you ask an LLM what the weather is like a particular day, the LLM will need to reach out to an external data source such as a website to retrieve the correct forecast. If an attacker has access to submit or manipulate content in the RAG database, they may poison the lookup results, perhaps adding additional instructions for the LLM to alter its response to the user’s prompt, even targeting specific users.

Conclusion

As AI technology continues to develop, Cisco Talos expects cybercriminals to continue adopting LLMs to help streamline their processes, write tools/scripts that can be used to compromise users and generate content that can more easily bypass defenses. This new technology doesn’t necessarily arm cybercriminals with completely novel cyber weapons, but it does act as a force multiplier, enhancing and improving familiar attacks.

Cisco Talos Blog – Read More