The Evolving Battleground: AI Security in a Rapidly Changing Landscape
As fresh artificial intelligence (AI) products launch, security researchers and curious individuals alike immediately begin probing for vulnerabilities, attempting to force systems to violate their own safety measures and produce undesirable outputs – from offensive content to instructions for weapon fabrication.
The risks associated with AI are no longer purely theoretical. Recent months have seen several AI companies facing criticism for software allegedly contributing to mental health issues and suicides, the spread of non-consensual intimate imagery, and aiding hackers in cybercrimes. Simultaneously, techniques for bypassing security measures continue to evolve, ranging from malicious messages disguised as poetry to the surreptitious implantation of ideas into AI assistant memory through seemingly harmless online tools.
Inside Microsoft’s AI Red Team: A Proactive Defense
Long before new models reach the public, internal security teams subject them to rigorous stress testing. At Microsoft, this responsibility largely falls to the company’s AI Red Team, a group that has been working with product teams and the broader AI community since 2018 to test models and applications before malicious actors can exploit them.
In cybersecurity terms, a “Red Team” simulates attacks on a system, while a “Blue Team” focuses on defending it. Microsoft’s AI Red Team explores a wide range of security issues, from scenarios where AI escapes human oversight to concerns related to chemical, biological, and nuclear threats within various AI programs.
The Art of AI Manipulation
“We observe a great diversity of technologies,” says Tori Westerhoff, Principal Security Researcher in AI at the Microsoft AI Red Team. “Part of the magic of the team is that we can seem at anything from a product feature to a system, a copilot, or a state-of-the-art model, and notice how the technology integrates across all of them, as well as the growth and evolution of AI.”
In one instance, team members collaborated with other Microsoft researchers to determine if AI could be manipulated to assist in cyberattacks, including generating or refining malware. They experimented by phrasing requests innocuously, such as describing a student project or a security research scenario, then pushing the systems to produce increasingly detailed results.
The effort extended beyond simple response testing. Researchers evaluated whether the AI could generate code that compiled and ran correctly, and if certain programming languages increased the likelihood of harmful outcomes. In the most severe cases, the systems produced code comparable to that which a basic or intermediate-level hacker could create, but the team refined detection systems to better identify such behavior.
“Going forward, if a more capable model emerges that can add value, we will already be ahead of the curve,” explains Pete Bryan, Principal Research Lead in AI Security for the Red Team.
Microsoft’s Ongoing Commitment to AI Security
Currently, the Red Team comprises several dozen specialists with expertise in fields ranging from software testing to biology. The group also collaborates closely with external experts and other teams within the AI industry. Bryan and Westerhoff presented their work at the RSAC conference on March 24th, and the team has released open-source tools, including PyRIT (Python Risk Identification Tool), an automated testing framework, along with guides for evaluating AI systems.
The team’s efforts have been referenced in Microsoft publications, including the announcement of a new AI image generation model, and in third-party reports, such as the “system card” explaining the functionality and testing of OpenAI’s GPT-5 model. Microsoft has also recently published research on AI security exploring potential risks related to AI fine-tuning and methods for detecting hidden backdoors in open-weight models.
The Expanding AI Ecosystem and Future Challenges
As AI ecosystems expand to include more advanced copilots, autonomous agents, and multimodal systems capable of generating text, images, audio, and video, the Red Team’s mandate has turn into increasingly complex. Many of today’s leverage cases – from automated coding to AI-powered shopping and video generation – would have seemed like science fiction just a few years ago.
“For my team, I think that’s part of the fun: seeing so many diverse things,” says Westerhoff. “It’s not just about testing models day in and day out, but also testing how models function across the entire technology ecosystem.”
FAQ: AI Security Concerns
What is an AI Red Team?
An AI Red Team simulates attacks on AI systems to identify vulnerabilities before malicious actors can exploit them.
What are some current AI security risks?
Risks include the generation of harmful content, aiding cybercrime, and the spread of misinformation.
How is Microsoft addressing AI security?
Microsoft employs an AI Red Team, develops open-source tools, and publishes research on AI security best practices.
Did you realize? The number of public code commits on platforms like GitHub grew by 43% in 2025, largely driven by the adoption of AI-assisted development tools. This growth has also been accompanied by a 34% increase in exposed credentials in code.
Pro Tip: Regularly update your AI systems and security protocols to stay ahead of evolving threats. Consider implementing a “Zero Trust” security framework, which assumes no user or device is trusted by default.
What are your thoughts on the future of AI security? Share your insights in the comments below!
