p016-planifest-frontend-stack-evaluation - Planifest Frontend Stack Evaluation

Purpose

This document evaluates frontend frameworks and meta-frameworks for use in Planifest's agentic CI/CD pipeline, where code is generated by AI agents (via LLM API), not written by humans. Traditional developer-experience priorities (learning curve, community vibe, personal preference) are irrelevant. The sole question is: what gets an LLM to write correct, visually acceptable, production-ready frontend code with minimal iteration?

Frontend presents a unique challenge compared to backend: correctness has two dimensions - functional (does it work?) and visual (does it look right?). Addy Osmani's React Summit research identifies a "complexity cliff" where models achieve ~40% success on isolated component tasks but collapse to ~25% on multi-step integrations involving state management, routing, and design coherence. The evaluation criteria below are calibrated to this reality.

Evaluation Criteria

The backend evaluation (p013) used 15 criteria. Frontend shares some of these but introduces visual and interaction-specific concerns. The criteria used here are:

LLM Training Corpus Coverage - how much idiomatic code exists in the training data
Compile-Time Error Detection - how many bugs are caught before runtime
Error Feedback Clarity - how actionable are build/runtime errors for agent self-correction
Type System - strength, soundness, and boundary enforcement
Component Model Clarity - how predictable is the component lifecycle/rendering model
State Management - how likely is the agent to produce correct state logic
Styling & Design System Integration - how naturally does the framework pair with utility CSS and component libraries
Testing Framework - how well do LLMs generate tests and how verifiable are the results
Build Tooling & Bundle Characteristics - build speed, bundle size, HMR, container footprint
Routing & Data Fetching - complexity of SSR/SSG/CSR routing and data loading patterns
Accessibility by Default - how much a11y does the framework enforce or encourage
Third-Party Integration Coverage - availability of UI libraries, SDKs, and community packages
Ecosystem Maturity & Stability - production track record, breaking change frequency, governance
Agent Skill & Context Engineering Support - availability of published agent skills, AGENTS.md files, best-practice documents optimised for LLM consumption
Overall Agent-Suitability - estimated first-pass success rates and typical iteration counts

Scoring Key

Stars	Meaning
★★★★★	Best in class - near-zero agent iteration needed
★★★★	Strong - occasional iteration, mostly correct first time
★★★	Adequate - regular iteration needed but manageable
★★	Weak - frequent iteration, many classes of bugs slip through
★	Poor - unsuitable for agent-generated code

1. React 19 + Vite + TypeScript (SPA / Client-Side)

Note: Evaluated as a Vite-based SPA without a meta-framework. This is the "plain React" option.

LLM Training Corpus Coverage

Score: ★★★★★

React is the most represented frontend framework in LLM training data by a wide margin. Stack Overflow, GitHub, blog posts, documentation - all saturated with React + TypeScript examples.
LLMs generate idiomatic React with hooks, context, and JSX more fluently than any other frontend framework.
The React + TypeScript + Tailwind CSS + shadcn/ui combination is the de facto "vibe coding" stack and is over-represented in training corpora from 2023 onwards.

Compile-Time Error Detection

Score: ★★★

TypeScript catches type mismatches, prop errors, and basic null issues.
Same weaknesses as backend TypeScript: any escape hatch, unsound type system, no enforced error handling.
JSX type checking catches mismatched props, missing required props, and invalid HTML attributes - this is genuinely useful for agent-generated UI code.
Common agent mistakes that slip through: missing key props in lists (warning only), stale closures in useEffect, race conditions in async state updates.

Error Feedback Clarity

Score: ★★★★

Vite's HMR overlay provides clear, formatted error messages with source locations.
TypeScript compiler errors in JSX are verbose but actionable.
React's development-mode warnings (missing keys, invalid hook calls, prop type mismatches) are well-known to LLMs from training data.
Stack traces from React error boundaries are readable.

Type System

Score: ★★★

Same structural typing as backend TypeScript - powerful but unsound.
React's generic component types (FC<Props>, ComponentProps<>, Ref<>) add meaningful safety for agent-generated UI.
Discriminated unions for component variants (e.g. type ButtonProps = PrimaryButton | SecondaryButton) are well-handled by LLMs.
Risk: LLMs frequently use as assertions to silence complex generic errors rather than fixing them.

Component Model Clarity

Score: ★★★★

Function components with hooks are a straightforward mental model that LLMs handle well.
The rules of hooks (don't call conditionally, don't call in loops) are well-represented in training data; agents rarely violate them.
useEffect dependency arrays are the #1 source of agent-generated bugs - missing dependencies, unnecessary dependencies, and infinite re-render loops.
React 19's compiler (React Compiler / React Forget) reduces the need for manual useMemo/useCallback, which removes a class of optimisation errors agents previously produced.

State Management

Score: ★★★

useState and useReducer are generated correctly for simple cases.
Context API is generated idiomatically but agents frequently create unnecessary re-renders by placing too much state in a single context.
For complex state, LLMs reach for Redux (verbose, boilerplate-heavy) or Zustand (simpler, but less represented in training data). Neither is generated reliably without explicit prompting.
Async state (loading/error/data patterns) is a common failure point - agents often forget error states or produce race conditions between rapid state updates.

Styling & Design System Integration

Score: ★★★★★

Tailwind CSS utility classes are generated with near-perfect fluency by all major LLMs.
shadcn/ui components (built on Radix UI primitives) are extremely well-represented in training data and are generated correctly with minimal prompting.
The combination of Tailwind + shadcn/ui gives the agent a constrained design vocabulary that produces visually consistent results.
Agents handle responsive breakpoints (sm:, md:, lg:) correctly. Dark mode (dark: prefix) is usually correct.
Risk: agents produce "generic AI aesthetic" - safe but bland. Design differentiation requires explicit constraints in the specification.

Testing Framework

Score: ★★★★

Vitest is Vite-native, fast, and well-understood by LLMs (API is Jest-compatible).
React Testing Library (@testing-library/react) is the standard and LLMs generate idiomatic tests using render, screen, userEvent.
Playwright for E2E testing is well-supported; Playwright now ships dedicated Test Agents (planner, generator, healer) for LLM-driven test creation.
Agents generate reasonable happy-path tests but frequently miss edge cases, error states, and accessibility assertions.
Vitest Browser Mode (backed by Playwright) enables component tests in real browsers - important for catching rendering bugs jsdom misses.

Build Tooling & Bundle Characteristics

Score: ★★★★★

Build tool: Vite - sub-second HMR, fast cold starts, excellent DX.
Typical production bundle: 80-200 KB (gzipped, depending on dependencies).
Build time: 2-10 seconds for a typical SPA.
Container image (static serving via Nginx/Caddy): 10-30 MB.
Startup time: Instant (static files served by a web server).
Vite's Rollup-based production build produces well-optimised chunks with tree-shaking.

Routing & Data Fetching

Score: ★★★

React Router v7 (or TanStack Router) for client-side routing. LLMs generate basic routing correctly.
No built-in SSR/SSG - this is a pure SPA. Data fetching is entirely client-side.
TanStack Query (React Query) for server state is well-known to LLMs and generated idiomatically.
Risk: agents often produce waterfall data fetching patterns (sequential useEffect chains) rather than parallelised queries.

Accessibility by Default

Score: ★★★

React itself enforces nothing. Accessibility is opt-in.
shadcn/ui (built on Radix UI) provides accessible primitives (keyboard navigation, ARIA attributes, focus management) out of the box - this is a significant advantage when agents use it as the component library.
Without an accessible component library, agents rarely generate correct ARIA attributes, skip links, or keyboard handlers.
ESLint plugin eslint-plugin-jsx-a11y catches basic issues at lint time.

Third-Party Integration Coverage

Score: ★★★★★

React has the broadest UI component library ecosystem of any frontend framework.
Every major design system ships React components: Material UI, Ant Design, Chakra UI, Mantine, shadcn/ui, Radix, Headless UI.
All major charting libraries (Recharts, Nivo, Victory, Chart.js wrappers) have React bindings.
Map libraries (react-map-gl, Leaflet wrappers), form libraries (React Hook Form, Formik), animation libraries (Framer Motion, React Spring) - all mature.

Ecosystem Maturity & Stability

Score: ★★★★

React 19 released December 2024. Stable, backward-compatible upgrade from React 18.
Meta maintains React with a long-term roadmap. React Compiler reached v1.0 in 2025.
Vite is maintained by Evan You and a dedicated team, with rapid iteration but good backward compatibility.
Risk: the React ecosystem's breadth creates decision fatigue - many ways to solve the same problem, and agents can produce inconsistent patterns across a codebase.

Agent Skill & Context Engineering Support

Score: ★★★★★

Vercel's react-best-practices Agent Skill: 58+ rules across 8 categories, compiled into a single AGENTS.md optimised for LLM consumption. Installable into Claude Code, Cursor, Codex, and other coding agents via npx skills add vercel-labs/agent-skills.
Anthropic's frontend-design skill: bundled with Claude, provides React/Tailwind/shadcn/ui generation patterns.
Playwright Test Agents: dedicated planner/generator/healer agents for E2E test creation.
The React + TypeScript + Tailwind stack is the best-supported stack in the entire agent skills ecosystem. No other framework comes close.

Overall Agent-Suitability

Score: ★★★★★

Estimated first-pass functional success rate: 70-80% (component renders, routes work, basic interactions function).
Estimated first-pass visual acceptability rate: 55-65% (layout correct, spacing acceptable, responsive behaviour present - but design often generic).
Typical iterations for a standard CRUD SPA: 2-4.
The enormous training corpus and mature agent skills ecosystem make React + Vite + TypeScript the clear leader for agent-generated frontend code.

Best Use Cases

CRUD SPAs (admin panels, dashboards, internal tools)
Integration-heavy applications where third-party React component libraries save agent iteration
Any application where the confirmed design requirementsification drives a known component structure

Avoid If

You need server-side rendering or static site generation (use Next.js instead)
You need edge rendering or streaming HTML (use a meta-framework)
Content-heavy marketing sites where SEO is critical

Key Risks

useEffect complexity: Dependency arrays, cleanup functions, and async effects are the primary source of agent-generated bugs
State management sprawl: Without explicit architectural guidance in the specification, agents produce inconsistent state patterns
Generic design: Agent-generated UI converges on a "Tailwind default" aesthetic without strong design constraints in the spec
No SSR: Pure SPAs have inherent SEO and initial load limitations

2. Next.js 15+ (React Meta-Framework)

LLM Training Corpus Coverage

Score: ★★★★★

Next.js is the most-used React meta-framework and extremely well-represented in training data.
App Router (introduced in Next.js 13, stabilised in 14-15) has substantial but less mature training data than Pages Router. LLMs sometimes conflate the two patterns.

Compile-Time Error Detection

Score: ★★★

Same TypeScript base as plain React. Next.js adds some build-time checks for invalid configurations.
Server Component / Client Component boundary violations produce clear errors.

Error Feedback Clarity

Score: ★★★★

Next.js dev overlay is informative with source-mapped stack traces.
Server-side errors are rendered in the browser with clear formatting.
Risk: hydration mismatch errors can be cryptic and are a common source of agent-generated bugs.

Type System

Score: ★★★

Same as React + TypeScript.

Component Model Clarity

Score: ★★★

App Router introduces Server Components (default) and Client Components ('use client' directive).
This dual model is a significant source of agent confusion. LLMs frequently place hooks in Server Components or forget the 'use client' directive.
loading.tsx, error.tsx, layout.tsx conventions are well-documented but agents sometimes misplace them in the file hierarchy.
Server Actions ('use server') add another boundary that agents must reason about correctly.

State Management

Score: ★★★

Same client-side state options as React.
Server state via Server Components and Server Actions is powerful but agents struggle with the mental model - especially around when data is fetched server-side vs. client-side.

Styling & Design System Integration

Score: ★★★★★

Same Tailwind + shadcn/ui story as plain React.
Next.js has first-party next/font for optimised font loading, which agents use correctly.

Testing Framework

Score: ★★★

Same Vitest/Playwright options as React.
Testing Server Components and Server Actions is less mature and less well-represented in training data.
E2E tests work well; unit/component testing of the App Router model is more complex.

Build Tooling & Bundle Characteristics

Score: ★★★★

Build tool: Turbopack (default since Next.js 15) or Webpack.
Typical production bundle: 80-250 KB (first load JS, after code splitting).
Build time: 10-60 seconds (depends on page count and data fetching).
Container image: 100-300 MB (Node.js runtime required for SSR).
Startup time: 500 ms - 2 s (Node.js server).
Larger and slower than a static SPA, but enables SSR/SSG.

Routing & Data Fetching

Score: ★★★★

File-based routing is intuitive and well-generated by agents.
Data fetching in Server Components via async component functions is clean.
Caching behaviour (fetch options, revalidate, dynamic) is complex and agents frequently guess wrong. Vercel's react-best-practices skill specifically addresses this.
Parallel data fetching via Promise.all in layouts is a pattern agents often miss.

Accessibility by Default

Score: ★★★

Same as React - framework enforces nothing, component library (shadcn/ui) provides the foundation.

Third-Party Integration Coverage

Score: ★★★★★

All React libraries work. Next.js adds its own optimised components: next/image, next/link, next/font.
Vercel's ecosystem (Analytics, Speed Insights, Toolbar) integrates natively.

Ecosystem Maturity & Stability

Score: ★★★★

Next.js is maintained by Vercel with a rapid release cadence.
Major versions introduce significant API changes (Pages Router -> App Router was a paradigm shift).
Risk: the App Router is still evolving; some patterns that LLMs learned from early documentation have been superseded.

Agent Skill & Context Engineering Support

Score: ★★★★★

Vercel's react-best-practices skill specifically covers Next.js patterns (Server Components, caching, bundle optimisation).
Vercel's web-design-guidelines skill audits UI code for 100+ rules.
Next.js evals (Vercel's own benchmark) measure LLM performance on Next.js-specific tasks - best models achieve ~42% success on these evaluation tasks.

Overall Agent-Suitability

Score: ★★★★

Estimated first-pass functional success rate: 60-70%.
Estimated first-pass visual acceptability rate: 55-65%.
Typical iterations for a standard CRUD application with SSR: 3-5.
The Server/Client Component boundary is the primary source of additional iteration compared to plain React + Vite.

Best Use Cases

SEO-critical applications (marketing sites, e-commerce, content platforms)
Full-stack applications where the frontend and API co-locate
Applications requiring SSR, SSG, or ISR for performance

Avoid If

Pure SPAs where SSR adds complexity without benefit
Teams where the agent orchestration layer cannot handle the Server/Client Component boundary reliably

Key Risks

Server/Client Component confusion: The #1 source of agent-generated Next.js bugs
Hydration mismatches: Server-rendered HTML diverging from client hydration
Caching complexity: fetch caching, revalidate, and dynamic options are frequently misconfigured
Rapid evolution: App Router patterns change between minor versions, and LLM training data lags

3. SvelteKit (Svelte 5)

LLM Training Corpus Coverage

Score: ★★★

Svelte has meaningful but substantially less training data than React. Svelte 5's runes system ($state, $derived, $effect) was released in late 2024 and is poorly represented in most LLM training corpora.
LLMs frequently generate Svelte 4 syntax (reactive $: declarations) when targeting Svelte 5, requiring correction.

Compile-Time Error Detection

Score: ★★★★

Svelte's compiler catches many errors that React defers to runtime: unused CSS, unreachable template code, invalid bindings.
TypeScript support is solid and integrated into the Svelte compiler.
Compile-time errors are tied directly to source code, not to an intermediate representation.

Error Feedback Clarity

Score: ★★★★

Svelte compiler errors are clear, concise, and point directly to the problematic line in the .svelte file.
Error messages are among the best of any frontend framework - one error at a time, actionable suggestion included.
SvelteKit runtime errors are less polished but still readable.

Type System

Score: ★★★

TypeScript in <script lang="ts"> blocks. Same structural typing limitations as React.
Svelte's template type checking is more limited than JSX - some prop errors are only caught at runtime.

Component Model Clarity

Score: ★★★★

Single-file components (.svelte) with <script>, <style>, and template in one file - close to vanilla HTML/CSS/JS.
Svelte 5 runes provide explicit reactivity: $state() for state, $derived() for computed values, $effect() for side effects.
Fewer footguns than React hooks - no dependency arrays, no stale closures, no rules-of-hooks violations.
Risk: the runes API is new and LLMs have limited training data for it.

State Management

Score: ★★★★

Svelte stores (writable, readable, derived) are simple and lightweight - no external library needed.
Svelte 5 runes make reactivity explicit and less error-prone than React's useState/useEffect model.
Less state management fragmentation than React (no Redux vs. Zustand vs. Jotai vs. Recoil decision).

Styling & Design System Integration

Score: ★★★

Scoped CSS by default - clean and predictable.
Tailwind CSS works well with Svelte.
Component library ecosystem is much smaller: Skeleton UI, Melt UI, Bits UI exist but have far less LLM training data than React equivalents.
No shadcn/ui equivalent with the same depth of LLM training data (shadcn-svelte exists but is less mature).

Testing Framework

Score: ★★★

Vitest for unit tests. @testing-library/svelte for component tests.
Playwright for E2E.
Less testing training data than React - agents produce less idiomatic Svelte tests.

Build Tooling & Bundle Characteristics

Score: ★★★★★

Build tool: Vite (SvelteKit is Vite-native).
Typical production bundle: 30-80 KB (gzipped) - significantly smaller than React due to compile-time approach.
Build time: 2-8 seconds.
Container image (static): 10-25 MB. (SSR: 80-150 MB with Node.js).
Startup time: Instant (static) or 100-500 ms (SSR).
Svelte's compiler eliminates the virtual DOM runtime, producing the smallest bundles of any major framework.

Routing & Data Fetching

Score: ★★★★

SvelteKit's file-based routing with +page.svelte, +page.server.ts, +layout.svelte is clean and well-documented.
load functions for data fetching are explicit and type-safe.
SSR, SSG, and CSR are configurable per route.
Less complex than Next.js App Router - fewer boundary concepts for agents to reason about.

Accessibility by Default

Score: ★★★★

Svelte compiler produces a11y warnings for missing alt attributes, incorrect ARIA roles, and other common issues.
This is a meaningful advantage - the compiler catches a11y bugs that agents would otherwise miss.

Third-Party Integration Coverage

Score: ★★

Significantly smaller ecosystem than React. Most third-party SDKs ship React components, not Svelte components.
Charting: LayerChart, Pancake - far less mature than React charting options.
Form handling: Superforms - solid but less training data.
Agent must write more custom code where a React agent would use a pre-built library.

Ecosystem Maturity & Stability

Score: ★★★

Svelte 5 represents a major paradigm shift (runes replacing reactive declarations). LLM training data covers both old and new patterns, creating confusion.
SvelteKit reached 1.0 in December 2022 and has been stable since, but the Svelte 5 migration is still ongoing in the ecosystem.
Smaller community means fewer maintained third-party packages.
Rich Harris is employed by Vercel, providing institutional backing.

Agent Skill & Context Engineering Support

Score: ★★

No published Svelte-specific agent skills comparable to Vercel's React best practices.
No AGENTS.md equivalent for SvelteKit.
Agents must rely on general documentation rather than LLM-optimised guidance.

Overall Agent-Suitability

Score: ★★★

Estimated first-pass functional success rate: 50-60%.
Estimated first-pass visual acceptability rate: 45-55%.
Typical iterations for a standard CRUD application: 4-6.
Better compiler errors and simpler reactivity model are offset by smaller training corpus and ecosystem.

Best Use Cases

Performance-critical SPAs where bundle size matters (dashboards, embedded widgets)
Applications where the smaller ecosystem is not a limitation (internal tools with standard UI)
Teams willing to invest in Svelte-specific agent prompts and specifications

Avoid If

The application requires extensive third-party UI component integration
Agent iteration cost is the primary concern (React will iterate faster)
LLM training data currency is important (Svelte 5 runes are poorly represented)

Key Risks

Svelte 4 vs. 5 confusion: LLMs generate outdated reactive syntax for Svelte 5 targets
Ecosystem gaps: Missing libraries force custom agent-generated code with higher error rates
Smaller training corpus: Fewer Stack Overflow answers, fewer GitHub examples, less blog coverage
Component library deficit: No shadcn/ui-quality component library with deep LLM training data

4. Vue 3 + Nuxt 3

LLM Training Corpus Coverage

Score: ★★★★

Vue 3 has strong training data coverage - second only to React among frontend frameworks.
Composition API (ref, reactive, computed, watch) is well-represented. Options API also well-known.
Risk: LLMs sometimes generate Vue 2 patterns (Options API, this.$emit) when targeting Vue 3 Composition API.

Compile-Time Error Detection

Score: ★★★

vue-tsc (Volar) provides TypeScript checking for .vue single-file components.
Template type checking has improved significantly with Volar but is less mature than JSX type checking.
<script setup> syntax is clean and well-generated by LLMs.

Error Feedback Clarity

Score: ★★★★

Vue's development warnings are detailed and include component hierarchy context.
Vite HMR overlay provides clear error reporting.
Nuxt's error handling for SSR/SSG issues is reasonable.

Component Model Clarity

Score: ★★★★

Single-file components with <script setup>, <template>, <style scoped> are clear and predictable.
Composition API provides a function-based reactivity model similar to React hooks but with automatic dependency tracking (no dependency arrays).
Less footgun-prone than React: ref and reactive handle dependency tracking automatically.

State Management

Score: ★★★★

Pinia is the official state management library - simple, type-safe, and well-generated by LLMs.
Less fragmentation than React's state management ecosystem.
Composables (custom hooks equivalent) are generated correctly for common patterns.

Styling & Design System Integration

Score: ★★★

Scoped styles by default in SFCs.
Tailwind works well. UnoCSS (an alternative) is popular in the Vue ecosystem.
Component libraries: Vuetify, PrimeVue, Naive UI, Element Plus - decent but less LLM training data than React equivalents.
No shadcn/ui equivalent with the same depth (shadcn-vue exists but is less mature).

Testing Framework

Score: ★★★★

Vitest is Vite-native and works seamlessly with Vue.
@vue/test-utils and @testing-library/vue for component testing.
Playwright/Cypress for E2E.

Build Tooling & Bundle Characteristics

Score: ★★★★

Build tool: Vite (Vue is Vite's native framework - Evan You created both).
Typical production bundle: 60-150 KB (gzipped).
Build time: 3-12 seconds.
Container image (Nuxt SSR): 100-250 MB. (Static: 10-25 MB).
Smaller bundles than React due to Vue's lighter runtime.

Routing & Data Fetching

Score: ★★★★

Nuxt 3 file-based routing is clean and similar to Next.js.
useFetch, useAsyncData composables for data fetching are straightforward.
Less Server/Client Component confusion than Next.js App Router.

Accessibility by Default

Score: ★★★

Vue does not enforce accessibility at the framework level.
Component libraries vary in a11y quality (Vuetify is strong; others vary).

Third-Party Integration Coverage

Score: ★★★

Smaller than React but larger than Svelte.
Most major UI needs are covered but with fewer options per category.
Some SDKs ship Vue components (e.g. Stripe Elements), but many are React-only.

Ecosystem Maturity & Stability

Score: ★★★★

Vue 3 is stable and mature. Composition API is settled.
Nuxt 3 is stable since late 2022, with regular releases.
Strong governance under Evan You and the Vue core team.

Agent Skill & Context Engineering Support

Score: ★★

No published Vue-specific agent skills comparable to Vercel's React best practices.
No AGENTS.md equivalent for Vue/Nuxt.

Overall Agent-Suitability

Score: ★★★★

Estimated first-pass functional success rate: 60-70%.
Estimated first-pass visual acceptability rate: 50-60%.
Typical iterations for a standard CRUD application: 3-5.
Strong training data and simpler reactivity model compared to React, but smaller ecosystem and agent skill support.

Best Use Cases

SPAs and SSR applications where Vue's simpler reactivity model reduces iteration
Teams with existing Vue codebases being retrofitted with Planifest
Laravel-backed applications (Vue + Laravel is a common pairing)

Avoid If

Maximum agent skill/context engineering support is required (React is better served)
The application requires extensive third-party component integration
React Native mobile support is needed in future

Key Risks

Vue 2 vs. 3 confusion: LLMs occasionally generate Options API when Composition API is required
Component library depth: Fewer deeply-trained component libraries than React
No agent skills ecosystem: Agents must rely on general documentation

5. Angular 18+ (with Signals)

LLM Training Corpus Coverage

Score: ★★★★

Angular has substantial training data, particularly for enterprise patterns.
Risk: much of the training data covers older Angular versions (RxJS-heavy, module-based). Angular's shift to standalone components and signals (v16-18) is less well-represented.

Compile-Time Error Detection

Score: ★★★★

Angular's ahead-of-time (AOT) compiler catches template errors, binding mismatches, and type violations at build time.
Strict mode catches more issues than React's TypeScript setup.
Angular's template type checking is more rigorous than Vue or React.

Error Feedback Clarity

Score: ★★★

Angular's error messages have historically been verbose and noisy - cascading template errors are common.
Improved in recent versions but still less clear than Svelte or Go (from the backend evaluation).
LLMs can struggle to extract the actionable fix from Angular's multi-line error output.

Component Model Clarity

Score: ★★★

Angular's component model is comprehensive but complex: decorators, dependency injection, lifecycle hooks, change detection strategies.
Standalone components (v15+) simplify the model, but LLMs still generate module-based patterns from older training data.
Signals (v16+) provide a simpler reactivity model, but training data coverage is limited.

State Management

Score: ★★★

RxJS (Observables) is powerful but complex - agents frequently produce incorrect observable chains, missing unsubscribe calls, and broken pipe sequences.
Signals reduce complexity but are new and less well-known to LLMs.
NgRx (Redux-like) is heavyweight and produces verbose, boilerplate-heavy code.

Styling & Design System Integration

Score: ★★★

Angular Material is mature and well-documented, with reasonable LLM training data.
Tailwind works but is less idiomatic in Angular than in React.
PrimeNG, ng-bootstrap, and NG-ZORRO are available.

Testing Framework

Score: ★★★

Karma/Jasmine (default) is being replaced by Jest/Vitest, but the transition is incomplete.
Angular's TestBed for component testing is verbose and agents frequently produce incorrect test configurations.
Playwright for E2E is straightforward.

Build Tooling & Bundle Characteristics

Score: ★★★

Build tool: Angular CLI (esbuild-based since v17).
Typical production bundle: 100-300 KB (gzipped) - larger than React, Vue, or Svelte.
Build time: 10-30 seconds.
Container image (SSR): 150-350 MB.
Larger bundles due to the framework's comprehensive runtime.

Routing & Data Fetching

Score: ★★★★

Angular Router is mature and feature-rich.
Route guards, resolvers, and lazy loading are well-documented.
Agents generate basic routing correctly but struggle with complex guard logic.

Overall Agent-Suitability

Score: ★★★

Estimated first-pass functional success rate: 55-65%.
Estimated first-pass visual acceptability rate: 45-55%.
Typical iterations for a standard CRUD application: 4-6.
Strong compiler but excessive complexity and boilerplate increase iteration cost.

Best Use Cases

Enterprise applications being retrofitted with Planifest where Angular is already the standard
Applications requiring Angular-specific features (Angular Material, complex routing, DI)

Avoid If

Starting greenfield with Planifest (React or Vue are simpler targets for agents)
Agent iteration cost is the primary concern

Key Risks

Version confusion: LLMs generate module-based patterns for standalone component targets
RxJS complexity: Observable chains are a major source of agent-generated bugs
Boilerplate: Angular's verbosity increases the surface area for agent errors
Testing verbosity: TestBed configuration errors are common

6. Solid.js + SolidStart

LLM Training Corpus Coverage

Score: ★★

Very small training corpus. LLMs frequently generate React patterns when targeting Solid (JSX looks similar but reactivity semantics are fundamentally different).
SolidStart is even less represented.

Compile-Time Error Detection

Score: ★★★

TypeScript support is solid. JSX type checking works.
No Solid-specific compile-time analysis beyond TypeScript.

Component Model Clarity

Score: ★★★★

Fine-grained reactivity with signals - components don't re-render, only specific DOM nodes update.
Simpler mental model than React for reactivity, but agents must understand that destructuring props breaks reactivity (a common mistake).

Overall Agent-Suitability

Score: ★★

Estimated first-pass functional success rate: 35-45%.
Typical iterations: 5-8.
The JSX similarity to React causes agents to generate React patterns that silently break in Solid.

Key Risks

React pattern contamination: Agents generate React hooks syntax in Solid contexts
Tiny ecosystem: Very few third-party libraries
Minimal training data: Most LLMs cannot generate idiomatic Solid code reliably

7. Qwik + QwikCity

LLM Training Corpus Coverage

Score: ★

Extremely small training corpus. LLMs have very limited knowledge of Qwik's resumability model.

Component Model Clarity

Score: ★★

Qwik's $ suffix convention (e.g., component$, useSignal$) is unique and poorly understood by LLMs.
Resumability (lazy-loading at the component level) is a novel concept that agents cannot reason about from first principles.

Overall Agent-Suitability

Score: ★

Estimated first-pass functional success rate: 20-30%.
Typical iterations: 8-12.
Not viable for agent-generated code with current LLM capabilities.

8. Astro

LLM Training Corpus Coverage

Score: ★★★

Growing training data, particularly for content-heavy sites.
Astro's .astro component syntax is unique but simple.

Component Model Clarity

Score: ★★★★

Islands architecture: static HTML by default, interactive "islands" opt-in using React/Vue/Svelte components.
The model is straightforward - agents generate static templates well.
Risk: agents sometimes add unnecessary interactivity (React islands) where static HTML would suffice.

Overall Agent-Suitability

Score: ★★★

Estimated first-pass functional success rate: 55-65% (for content sites).
Typical iterations: 3-5.
Strong for content-heavy sites but limited for interactive SPAs.

Best Use Cases

Documentation sites, blogs, marketing sites
Content-first applications where interactivity is isolated

Avoid If

Building interactive SPAs or dashboards
The application is interaction-heavy rather than content-heavy

9. HTMX + Server-Rendered HTML

LLM Training Corpus Coverage

Score: ★★★

HTMX is well-represented in recent training data due to its popularity surge.
The HTML-first model is simple and LLMs generate it competently.

Component Model Clarity

Score: ★★★★★

No component model - server renders HTML, HTMX handles partial updates via HTML attributes.
The simplicity is a significant advantage for agents: no virtual DOM, no reactivity system, no state management.
HTMX attributes (hx-get, hx-post, hx-target, hx-swap) are a small, well-defined API.

Overall Agent-Suitability

Score: ★★★

Estimated first-pass functional success rate: 65-75% (for server-rendered pages with partial updates).
Typical iterations: 2-3.
High success rate for what it does, but limited to server-rendered patterns.

Best Use Cases

Server-rendered applications paired with a Go or Python backend
Applications where client-side JavaScript should be minimal

Avoid If

Building rich interactive SPAs
Offline-capable or client-heavy applications
The confirmed design architecture specifies a React frontend (HTMX is architecturally incompatible)

10. Remix (React Meta-Framework)

LLM Training Corpus Coverage

Score: ★★★

Moderate training data. Remix was acquired by Shopify and has evolved significantly.
Remix v2 merged with React Router v7, creating some confusion in training data.

Component Model Clarity

Score: ★★★★

Loader/action pattern for data fetching is explicit and well-defined.
Less Server/Client Component confusion than Next.js - Remix's model is simpler.
Progressive enhancement by default.

Overall Agent-Suitability

Score: ★★★

Estimated first-pass functional success rate: 55-65%.
Typical iterations: 3-5.
Simpler mental model than Next.js App Router, but less training data and less agent skill support.

Comparative Analysis

Tier Rankings by Agent Success Rate

Rank	Framework	First-Pass Functional	First-Pass Visual	Typical Iterations
1	React 19 + Vite + TS	70-80%	55-65%	2-4
2	Next.js 15+	60-70%	55-65%	3-5
3	Vue 3 + Nuxt 3	60-70%	50-60%	3-5
4	HTMX	65-75%	N/A (server)	2-3
5	Angular 18+	55-65%	45-55%	4-6
6	Astro	55-65%	50-60%	3-5
7	SvelteKit (Svelte 5)	50-60%	45-55%	4-6
8	Remix	55-65%	50-60%	3-5
9	Solid.js	35-45%	35-45%	5-8
10	Qwik	20-30%	20-30%	8-12

Tier Rankings by Ecosystem Breadth

Rank	Framework	Third-Party Libraries	UI Component Depth	Agent Skills
1	React + Vite	★★★★★	★★★★★	★★★★★
2	Next.js	★★★★★	★★★★★	★★★★★
3	Angular	★★★★	★★★★	★★
4	Vue + Nuxt	★★★★	★★★	★★
5	Remix	★★★★	★★★★	★★
6	SvelteKit	★★★	★★★	★★
7	Astro	★★★	★★★	★★
8	HTMX	★★	★	★
9	Solid.js	★★	★★	★
10	Qwik	★	★	★

Tier Rankings by Bundle / Container Efficiency

Rank	Framework	Bundle (gzipped)	Container Image	Startup
1	HTMX	<10 KB (JS)	N/A (backend)	N/A
2	SvelteKit (static)	30-80 KB	10-25 MB	Instant
3	React + Vite (SPA)	80-200 KB	10-30 MB	Instant
4	Astro	50-100 KB	10-25 MB	Instant
5	Vue + Nuxt (static)	60-150 KB	10-25 MB	Instant
6	Qwik	30-60 KB	80-150 MB	100-500 ms
7	Solid.js	40-80 KB	10-25 MB	Instant
8	Next.js	80-250 KB	100-300 MB	500 ms-2 s
9	Angular	100-300 KB	150-350 MB	500 ms-2 s
10	Remix	80-200 KB	100-250 MB	500 ms-2 s

Trade-Off Matrix

                    Agent Success ←-> Bundle Efficiency
                    ┌─────────────────────────────────┐
          High      │  React+Vite                     │
      Agent         │    Next.js   Vue+Nuxt           │
      Success       │       Angular                   │
                    │    Remix  Astro                  │
                    │       SvelteKit                  │
                    │                                  │
          Low       │  Solid.js                        │
      Agent         │     Qwik                         │
      Success       │                                  │
                    └─────────────────────────────────┘
                    Large                        Small
                    Bundles                   Bundles

                    Agent Success ←-> Framework Complexity
                    ┌─────────────────────────────────┐
          High      │  React+Vite  HTMX               │
      Agent         │     Vue+Nuxt                     │
      Success       │  Next.js                         │
                    │     Remix  Astro                  │
                    │  Angular    SvelteKit             │
                    │                                   │
          Low       │     Solid.js                      │
      Agent         │        Qwik                       │
      Success       │                                   │
                    └─────────────────────────────────┘
                    Simple                       Complex
                    Framework                 Framework

The Complexity Cliff - Frontend-Specific

Addy Osmani's research at React Summit quantified the "complexity cliff" for frontend AI code generation:

Task Complexity	Success Rate (Best Models)	Notes
Isolated component generation	~70-80%	Single component, clear props, no routing
Page-level composition	~40-50%	Multiple components, state management, layout
Multi-step full-stack tasks	~25%	Routing, data fetching, state, error handling
Framework-specific eval tasks	~42%	Next.js-specific patterns (SSR, caching, routing)

The cliff is steepest where:

State management crosses component boundaries
Routing and data fetching interact with component lifecycle
Design taste (spacing, hierarchy, colour) must be applied consistently
Error handling must cover loading, error, and empty states for every data dependency

Red Flag Summary

Framework	TS Types	Compile-Time UI Checks	Silent Failures	A11y Enforcement	Testing Maturity	SDK > 30%	Clear Errors	Stable API	Mature (5yr+)	Agent Skills	Bundle < 300 KB
React + Vite	✅	⚠️	⚠️	⚠️	✅	✅	✅	✅	✅	✅	✅
Next.js	✅	⚠️	⚠️	⚠️	✅	✅	✅	⚠️	✅	✅	✅
Vue + Nuxt	✅	⚠️	⚠️	❌	✅	✅	✅	✅	✅	❌	✅
SvelteKit	✅	✅	✅	✅	✅	✅	✅	⚠️	❌	❌	✅
Angular	✅	✅	⚠️	❌	⚠️	✅	⚠️	⚠️	✅	❌	⚠️
Solid.js	✅	❌	⚠️	❌	⚠️	❌	⚠️	⚠️	❌	❌	✅
Qwik	✅	⚠️	⚠️	❌	❌	❌	❌	❌	❌	❌	✅
Astro	✅	⚠️	✅	❌	✅	✅	✅	✅	❌	❌	✅
HTMX	❌	❌	⚠️	❌	⚠️	❌	✅	✅	❌	❌	✅
Remix	✅	⚠️	⚠️	⚠️	✅	✅	✅	⚠️	❌	❌	✅

Legend: ✅ = passes, ⚠️ = conditional/partial, ❌ = fails

Frameworks with red flags:

Qwik: Immature, tiny training corpus, novel paradigm LLMs cannot handle
Solid.js: React-pattern contamination causes silent failures
Angular: Version confusion and RxJS complexity increase iteration cost
HTMX: Architecturally incompatible with SPA-driven Planifest frontends

Final Recommendations

1. Single Best Framework for Agent-Generated Frontend Applications

React 19 + Vite + TypeScript + Tailwind CSS + shadcn/ui

React wins on the combination that matters most for agent-generated frontend code: the largest LLM training corpus, the broadest component library ecosystem, the deepest agent skill support (Vercel's react-best-practices, Playwright Test Agents, Anthropic's frontend-design skill), and the highest first-pass success rate.

The useEffect footgun and state management complexity are real risks, but they are well-understood and mitigatable through:

Explicit specification constraints (e.g. "use TanStack Query for all server state, Zustand for client state")
Vercel's react-best-practices skill loaded into the codegen-agent
React Compiler (v1.0) eliminating manual memoisation errors
ESLint rules (react-hooks/exhaustive-deps, jsx-a11y) catching common issues at lint time

The combination of Tailwind CSS + shadcn/ui is equally critical - it constrains the agent's design vocabulary, producing consistent visual output. Without a specified component library, agents produce inconsistent, "AI slop" UI.

2. Best Framework by Use Case

Use Case	Recommendation	Runner-Up
SPA / Dashboard / Internal Tool	React + Vite + TS	Vue 3 + Vite
SEO-Critical Application	Next.js 15	Nuxt 3
Content-Heavy Site (docs, blog)	Astro	Next.js (static)
Performance-Critical Widget	SvelteKit	React + Vite
Server-Rendered + Minimal JS	HTMX + Go/Python backend	Astro
Existing Angular Codebase	Angular 18+ (with Signals)	-
Full-Stack React Application	Next.js 15	Remix

3. Recommended Planifest Frontend Template Stack

┌──────────────────────────────────────────────────────────┐
│                   Frontend Architecture                    │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  ┌─────────────────────────────────────────────┐        │
│  │  React 19 + TypeScript (strict mode)        │        │
│  │  Build: Vite 6+                             │        │
│  │  Styling: Tailwind CSS v4 + shadcn/ui       │        │
│  │  State: Zustand (client) + TanStack Query   │        │
│  │         (server state)                      │        │
│  │  Routing: React Router v7 (SPA) or          │        │
│  │           Next.js App Router (SSR)          │        │
│  │  Forms: React Hook Form + Zod validation    │        │
│  │  Animation: Framer Motion                   │        │
│  └─────────────────────────────────────────────┘        │
│                          │                               │
│                          ▼                               │
│  ┌─────────────────────────────────────────────┐        │
│  │  Testing                                    │        │
│  │  Unit/Component: Vitest + Testing Library   │        │
│  │  E2E: Playwright (with Test Agents)         │        │
│  │  Visual Regression: Playwright screenshots  │        │
│  │  a11y: eslint-plugin-jsx-a11y + axe-core    │        │
│  └─────────────────────────────────────────────┘        │
│                          │                               │
│                          ▼                               │
│  ┌─────────────────────────────────────────────┐        │
│  │  Quality Gates                              │        │
│  │  Lint: ESLint (strict React + a11y rules)   │        │
│  │  Format: Prettier                           │        │
│  │  Types: tsc --noEmit (strict mode)          │        │
│  │  Bundle: bundlesize / size-limit            │        │
│  │  Agent Skill: vercel react-best-practices   │        │
│  └─────────────────────────────────────────────┘        │
│                                                          │
│  Shared contracts: OpenAPI spec (Zod derived from it)    │
│  Container: Multi-stage Dockerfile -> Nginx/Caddy         │
│  Image size target: < 30 MB                              │
│  Bundle size target: < 200 KB (gzipped, first load)      │
└──────────────────────────────────────────────────────────┘

4. Rationale - Why These Choices

React + Vite as default frontend:

70-80% first-pass agent success rate is the highest evaluated for SPAs
The deepest agent skills ecosystem of any frontend framework
The broadest third-party component library coverage
Frontend Zod schemas derived from the OpenAPI spec enforce API contracts regardless of backend language
LLMs generate React + TypeScript more fluently than any other frontend framework

Tailwind CSS + shadcn/ui for styling:

Constrains the agent's design vocabulary, producing consistent visual output
shadcn/ui provides accessible, well-tested primitives (built on Radix UI)
Both are over-represented in LLM training data, maximising first-pass correctness
Utility classes are more deterministic for agents than CSS-in-JS or custom CSS

TanStack Query for server state, Zustand for client state:

Eliminates the "Redux vs. Context vs. hooks" decision from agent-generated code
TanStack Query handles loading/error/data states, caching, and refetching - patterns agents frequently get wrong when implementing manually
Zustand is minimal and produces less boilerplate for agents to get wrong

Vitest + Playwright for testing:

Vitest is Vite-native, fast, and Jest-API-compatible (maximising training data coverage)
Playwright's Test Agents (planner, generator, healer) are purpose-built for LLM-driven test creation
Playwright Browser Mode in Vitest enables component testing in real browsers

Next.js for SSR use cases:

When the specification requires SSR/SSG, Next.js is the only React meta-framework with deep agent skill support
The Server/Client Component boundary adds iteration cost; use only when SSR is genuinely required

5. Trade-Offs

Choice	You Gain	You Lose
React + Vite as default	Best agent success rate, deepest ecosystem, most agent skills	No SSR (use Next.js when needed), larger runtime than Svelte
Tailwind + shadcn/ui	Consistent visual output, accessible primitives	Design differentiation requires explicit spec constraints
Zustand + TanStack Query	Clear state management boundaries, reduced boilerplate	Two libraries to learn; less flexibility than raw Context
Vitest + Playwright	Fast feedback loops, real-browser testing, LLM-native test agents	More tooling to configure than a single Jest setup
Next.js for SSR	SEO, streaming, ISR, server components	Larger container, more complex mental model, more agent iterations

6. Strategies for Minimising Agent Error Rate

Based on the research, the following strategies have the largest impact on reducing first-pass failure rates for agent-generated frontend code:

Specification-Level:

Explicitly name every library in the spec (React, Vite, Tailwind, shadcn/ui, Zustand, TanStack Query, React Hook Form, Zod)
Specify the component library (shadcn/ui) and constrain the agent to use it for all UI primitives
Define content density ("minimal landing page" vs. "data-dense dashboard") to prevent spacing defaults
Specify responsive breakpoints and collapse behaviour
Require accessibility: semantic landmarks, skip links, ARIA labels, safe contrast ratios

Agent-Level:

Load Vercel's react-best-practices skill into the codegen-agent context
Load Playwright Test Agents for automated E2E test generation
Enforce incremental generation: scaffold structure first, then implement page by page
Use a generator/evaluator pattern: one agent generates, a second reviews for edge cases, security, and style

Template-Level:

Provide a stamped template repo with Vite, TypeScript (strict mode), Tailwind, shadcn/ui, Zustand, TanStack Query, Vitest, and Playwright pre-configured
Include ESLint configuration with react-hooks/exhaustive-deps, jsx-a11y, and a no-any rule
Include a size-limit configuration enforcing bundle budgets
Pre-install shadcn/ui components that the agent can import rather than generate from scratch

Validation-Level:

Type checking (tsc --noEmit) - catches ~30% of agent-generated bugs
ESLint (with strict React + a11y rules) - catches ~15% of remaining issues
Vitest unit/component tests - catches functional regressions
Playwright E2E tests - catches integration failures
Playwright screenshot comparison - catches visual regressions (the "does it look right?" dimension)
Lighthouse CI - catches performance regressions, a11y violations, SEO issues

Answers to Success Criteria

Which framework produces the lowest agent error rate for frontend code? React 19 + Vite + TypeScript. The combination of the deepest training corpus, broadest ecosystem, and most mature agent skills produces the highest first-pass success rate (70-80% functional, 55-65% visual).

Which framework has the best error messages for LLM iteration? SvelteKit. Svelte's compiler errors are the clearest of any frontend framework - one error at a time, actionable, tied to source. However, React + Vite is close behind and the larger training corpus means fewer errors in the first place.

Which framework has the best component library ecosystem? React. No other framework has a comparable depth of UI component libraries, charting libraries, form libraries, and design system implementations.

Which framework produces the best visual output from agents? React + Tailwind CSS + shadcn/ui. The constrained design vocabulary and over-representation in training data produce the most visually acceptable agent-generated UI. Gemini 3 Pro specifically leads Web Dev Arena scores for frontend aesthetics.

Which framework has the best agent skill support? React. Vercel's react-best-practices (58+ rules), web-design-guidelines (100+ rules), Playwright Test Agents, and Anthropic's frontend-design skill create an unmatched context engineering ecosystem.

Which would you choose for a data-dense dashboard? React + Vite + TanStack Query + Recharts + shadcn/ui. Maximum component library coverage for tables, charts, forms, and data visualisation.

Which would you choose for an SEO-critical marketing site? Next.js 15 for SSR/SSG with static export where possible. Astro if JavaScript interactivity is minimal.

Which would you choose for an embedded widget with strict bundle budgets? SvelteKit. Smallest bundle size of any major framework (30-80 KB gzipped).

For a Planifest-managed application built entirely from agent-generated code, which would you choose? React 19 + Vite + TypeScript for the frontend, with Tailwind CSS + shadcn/ui as the design system, TanStack Query + Zustand for state management, and Vitest + Playwright for testing. This stack optimises for the metric that matters most in Planifest's context: correct code on the first pass, with the fewest agent iterations.

Implications for Planifest

The current Planifest architecture specifies React 18+ with TypeScript, Vite, TailwindCSS for the frontend. This evaluation confirms this is the optimal choice for agent-generated code, with the following refinements:

Upgrade target to React 19 - React Compiler eliminates manual memoisation errors; a common agent footgun removed
Specify shadcn/ui as the component library - constrains design vocabulary and provides accessible primitives; this is not currently specified and its absence increases visual inconsistency in agent output
Specify TanStack Query for server state and Zustand for client state - eliminates state management decision fatigue for the codegen-agent
Specify React Hook Form + Zod for form handling - Zod schemas on the frontend are derived from the OpenAPI spec, so form validation contracts hold regardless of backend language; the backend validates against the same OpenAPI spec using its own language-native library (e.g. Pydantic for Python, Go struct validation, Rust serde)
Load Vercel's react-best-practices skill into the codegen-agent - 58+ rules optimised for LLM consumption, covering the most impactful performance patterns
Use Playwright Test Agents for E2E test generation - purpose-built for LLM-driven test creation with plan/generate/heal workflow
Enforce strict TypeScript (strict: true, noUncheckedIndexedAccess, noAny via ESLint) - mitigates the type system's weaknesses
Add bundle size budgets via size-limit - prevents agent-generated code from silently bloating the bundle
For SSR use cases only: adopt Next.js 15 as the meta-framework, accepting the 10-15% increase in iteration cost for the Server/Client Component boundary

The OpenAPI spec remains the language-agnostic contract between frontend and backend (as established in p013). On the frontend, Zod schemas are derived from the OpenAPI definition, giving the codegen-agent type-safe validation without assuming the backend is also TypeScript. When the backend is TypeScript, Zod schemas can be shared directly - but this is an optimisation, not a requirement. The frontend stack requires no polyglot consideration - React + TypeScript is the clear winner on every agent-suitability metric regardless of backend language choice.