New Open Source Tool from Angular Scores Vibe Code Quality

The Angular team has open sourced its Web Codegen Scorer to provide quantifiable metrics for AI-generated frontend code and frameworks.

Oct 29th, 2025 9:00am by Loraine Lawson

Featued image for: New Open Source Tool from Angular Scores Vibe Code Quality

Screenshot of Simona Cotin, a Google engineering manager who works on the Angular team, demoing the Web Codegen Scorer for audiences at the Angular + AI Developer event.

The Angular team at Google had a debate brewing internally: They couldn’t agree which large language model was best at implementing the framework.

“Across our teams, we had a different experience of using LLMs [large language models] to generate code, and we had a little bit different opinions about what is the level of code generation quality for Angular,” Simona Cotin, a Google engineering manager who works on the Angular team, told The New Stack this week.

One of the Angular developers took up the challenge and vibe-coded a prototype tool that could test how well vibe code works with Angular. That early experiment led to the creation of an open source tool that tests LLM-generated code for frontend development considerations, such as following best practices for a framework, using accessibility best practices and identifying security problems.

Called Web Codegen Scorer, the tool is designed to test all of these in vibe-coded applications.

“The speed of AI is very tempting, but the code it produces sometimes isn’t code you can actually trust. It’s not always production-ready, and this is the central challenge we face as developers today,” Cotin told audiences at September’s Angular + AI Developer event. “This tool is like a fitness test for LLM-generated code.”

Cotin explained how Web Codegen Scorer works and what it offers developers.

The AI Road So Far

AI-generated code can be problematic when it comes to the frontend, with its many languages, frameworks and micro-frameworks.

While an LLM might be able to create framework-specific code, LLMs in general aren’t necessarily trained on the best code or on best practices. This leads to problems in the code, although the problems vary by LLM and framework.

For instance, Claude likes to use refs in React to track state, which is not a good pattern, React Foundation Executive Director Seth Webster told The New Stack.

Angular has seen similar issues with GenAI-generated code, but with Angular, LLMs tend to use older APIs, Cotin said.

What the Scorer Did for Angular

The tool was also born out of a need to test Angular’s MCP server.

“Working on this tool was incredibly helpful for us to be able to assess code generation quality, and, more importantly, also be able to iterate on APIs, on some of the AI feature work,” Cotin told TNS. “As we did more and more work on the MCP, one of the recurrent questions was how do we test this? This tool enabled us to test the changes that we were making and really make sure that LLMs are producing the results that we are expecting.”

In the first few weeks of using the tool, the Angular team was very excited to be able to read through a list of common failure modes and errors.

“We’ve done a little bit of LLM-hallucination-driven development,” Cotin said. “We saw some common failure modes, and we were able to actually fix those and make changes to the framework itself so that we would no longer see those errors.”

Sometimes, though, it was easier to simply change the framework to accommodate the LLM better, she added. For instance, one change they made was adding better support for class names used by Tailwind and more ergonomic syntax for ARIA attributes.

“As the industry and these changes are stabilizing, and we have more and more of these types of tools that tell us a more complete story, then we’re going to be able to have a better support story for everyone,” she said. “That’s part of the reason why we’re open sourcing this tool, because we also want to make sure that there’s clarity across our industry, that we can measure these things and we can assign numbers to some of these hypotheses.”

How Web Codegen Scorer Works

Web Codegen Scorer can actually test whether an Angular application includes this older API pattern, Cotin said. But that’s just one of its functions.

One way to think about Web Codegen Scorer is as having two “parts.” First, it comes with “environments” created by and for frameworks, although so far, only Angular and Solid.js are supported. Essentially, this works by providing a prompt to your LLM that outlines what the GenAI should do.

Here’s the prompt included with the Angular environment: “You are an expert in TypeScript, Angular and scalable web application development. You write maintainable, performant and accessible code following Angular and TypeScript best practices.”

System prompt to add Angular and TypeScript best practices to an LLM.

System prompt for Angular and TypeScript best practices via GitHub.

The prompt incorporates a list of best practices for TypeScript and Angular that the Google team has found LLMs tend not to do, such as use signals for state management or always use standalone components over NGModules. It includes separate guidance for component, state management, templates and services.

The second part of the Web Codegen Scorer is a series of raters and AI evaluations that assess the code and provide developers with a scorecard that rates the application based on a number of factors including accessibility and security.

“Once the application has been created, we run those raters and evaluations that look at build time, errors at runtime. They look at effectively does this code include best practices?” Cotin said.

For the accessibility assessment, Web Codegen Scorer relies on the open source accessibility testing engine Axe, which checks applications against Web Content Accessibility Guidelines.

It also can look at security issues that might happen in code, she added.

“Something interesting that we also did was collaborate with one of the Google security teams, and we’ve added a set of security checks so now we’re also validating that the code being generated includes best practices related to security, and these best practices are basically ones that have been tried and tested and battle tested by all of Google,” she said.

There’s a scoring mechanism within the tool that provides developers with a zero to 100 score, based on the different types of errors. For example, a security vulnerability in the generated code subtracts 50 points from the score because it’s considered a critical issue. Less-critical problems, such as coding in a way that isn’t a best practice but also isn’t technically wrong, will delete fewer points.

“We’ve done a little bit of LLM-hallucination-driven development. We saw some common failure modes, and we were able to actually fix those and make changes to the framework itself so that we would no longer see those errors.”
– Simona Cotin, Google engineering manager with the Angular team

“It’s been proven to be very useful in basically putting a number [to] how good are LLMs at generating Angular code,” Cotin said. “What we found out is that, for example, Gemini is really good at generating Angular code, which was great, and now we are able to put a number on that.”

From there, Angular was able to improve the code generation even further by creating the set of best practices, she added.

“We started with an initial set of best practices, which is a little bit, as a system prompt; it’s quite short,” she said. “We iterated on the form of this specific prompt by basically running it against the evals tool, and we perfected it and improved it until it code generation got in the 97 to 100 score with with these instructions for LLMs.”

In addition to an overall score, the tool offers an AI-generated overview of the types of errors seen in tests Angular has run. The bottom half of the dashboard provides more details about whether the build was successful, if it had errors and what kind of errors.

“You can inspect individual applications for the errors. You can see screenshots. You can see where it failed and why,” Cotin said.

There are also other prompts that help developers add features to their application or site. Examples include a credit card form, a css-gradient-generator and a password strength generator.

Angular also plans to add an assessment for Core Web Vitals, Cotin said.

The Web Codegen Scorer can be used in your AI-enabled IDE or in an agent such as Gemini CLI or Claude Code to help with better code generation and instructions for the LLMs to generate good code, Cotin said.

Support for Other Frameworks

One of the reasons Angular open sourced the tool is so other frameworks could use it to create their own prompts for best practices, Cotin said.

“We have shipped it with a pre-configured environment, which is configured for Angular, and you can generate and you can create new environments that will configure different frameworks, like Solid.js or Vue,” she said.

The Angular team provided Solid.js team member Dev Agrawal, a Solid Start contributor and full-stack developer, with early access to the tool so he could create an environment that supports the framework.

“We’re very grateful to the Angular team for reaching out to us and providing us early access to the Web Codegen Scorer tool,” Agrawal told TNS by email. “We have used it to test the performance of GPT 5, Claude Sonnet, as well as Gemini models on building a variety of apps using Solid, and so far these models have done a great job with just a little extra guidance.”

The Solid.js team is actively investigating how to best provide AI coding agents with the context they need to build performant and accessible Solid apps, and to work seamlessly with the new APIs in Solid 2.0 once it gets released, he added.

“The Web Codegen Scorer is going to be an essential tool to help us verify our improvements,” Agrawal said.

Loraine Lawson is a veteran technology reporter who has covered technology issues from data integration to security for 25 years. Before joining The New Stack, she served as the editor of the banking technology site Bank Automation News. She has...