AI-Native Development Criteria

Purpose

A structured diagnostic to score your development environment's AI-Native readiness on a 100-point scale across four perspectives.

How to Use

Read each perspective and its sub-items
Score each item based on the criteria provided
Sum scores to determine your level

Score Summary

Level	Score	State
Excellent	90–100	Fully structured AI-Native workflow with high-quality context
Good	70–89	Workflow established with conscious quality practices
Fair	50–69	Core workflows in place, but operations and autonomy are immature
Poor	30–49	Initial efforts underway, but workflows are fragmented and unintegrated
Critical	0–29	Not started or early stage

Evaluation Scope and Premises

Scope: Evaluate against the product's primary working repository (or single monorepo).

Context fragmentation: If core context is distributed across multiple repositories, and cross-repository search, reference, and update do not complete within a single workflow, context closure scores low.

External sources of truth: If the SSoT (Single Source of Truth) for PRDs, design documents, or tasks resides in an external tool, that item scores 0.

Why "Updatable" Is Non-Negotiable

Read-only documents are not true context. Context is something you grow—not something you write once and reference forever.

As LLMs execute development work, they need to:

Record design decisions discovered during implementation
Add constraints that emerge from technical investigation
Update specifications that prove incomplete or incorrect

Context that cannot be updated drifts from reality. "Readable but not updatable" context becomes stale and misleading as the codebase evolves—and misleading context is worse than no context, because the LLM trusts it.

Allowing external SSoTs would mean tolerating this drift. This criteria does not. Search, reference, and update must all complete within the LLM's workflow. Without this, LLMs cannot operate autonomously.

Point Allocation Rationale

Perspective	Points	Rationale
Context Closure	15	Necessary foundation, but a prerequisite—not a differentiator
Codified Principles	30	Highest leverage: determines every LLM decision's quality
Workflow and Guardrails	25	Operationalizes principles into enforced practice
Context Quality	30	Existence of principles is meaningless if poorly expressed

Principles and Quality together account for 60 out of 100 points. This reflects a core belief: an LLM with well-written, project-specific principles in a messy repository will outperform an LLM in a perfectly organized repository with no principles.

Scoring Rules

Perspectives 1–3 are evaluated cumulatively. If foundational items are not met, dependent items score 0 (no renormalization).
Perspective 4 (Context Quality) is evaluated across all artifacts, independent of Perspectives 1–3.
Qualitative evaluation requires actual file content review.
Evaluate the essence of "providing appropriate context to LLMs"—avoid tool-specific assessments.
Evidence-based scoring: Score based on what you can verify in the repository, not what you believe to be true.
- "We have documentation" is not evidence—open the files and evaluate their content
- "We do reviews" is not evidence—check whether review integration is defined in the workflow

Perspective 1: Context Closure (15 points)

Whether all product context is in a state where search, reference, and update complete within the LLM's workflow.

All three capabilities must complete within the workflow:

Capability	Meaning	Met	Not Met
Searchable	Discoverable without knowing exact paths	Files in repo (grep/glob), semantically searchable docs	Single file reference in external tool
Readable	Loadable into LLM context	File read, API retrieval	Documents behind authentication walls
Updatable	LLM or workflow can maintain content	Markdown in repo (edit/write)	Read-only external tool integration

A. Product-Level Closure (9 points)

Item	Points	Criteria
Design documents	3	PRDs, Design Docs, ADRs, and similar artifacts are searchable, readable, and updatable within the workflow
Task management	3	Tasks (issues/tickets) are searchable, readable, and updatable within the workflow
Infrastructure/platform definitions	3	Terraform, k8s manifests, CI definitions, and similar configurations are searchable, readable, and updatable within the workflow

Scoring per item:

Condition	Points
Search, reference, and update complete within primary repository	3
Separate repository, but cross-repository workflow completes all three	2
Search and reference possible, but update not possible	1
Reference only, or external SSoT	0

B. Application-Level Closure (6 points)

Item	Points	Criteria
Feature-based directory structure	4	Core services, modules, contract definitions (API, SDK, Proto), and infrastructure definitions are organized by feature/domain within the same repository
Colocation	2	Tests, type definitions, and styles colocated within corresponding features (holds for most core features)

Scoring:

Feature-based: Same repo + all core domains = 4, partial = 2, layer-based = 0
Colocation: Most core features = 2, some = 1, separated = 0

When scoring: Feature-based structure with bloated files undermines closure effectiveness. Complexity concentrated in specific directories indicates insufficient separation of concerns.

Perspective 2: Codified Principles (30 points)

Whether project-specific principles are documented in a form LLMs can reference. Not generic rules—judgment criteria specific to this project.

Foundation (5 points)

Item	Points	Criteria
Principle document exists	5	AGENTS.md (or equivalent) functions as entry point with universal principles stated concisely, routing to skills/references for details

Scoring:

5: Entry point contains minimal universal principles + routing only (no task procedures)
3: Entry point mixes task procedures or multi-domain details
0: No principle document, or external links only

Coverage (25 points)

Domain	Points	Criteria
Coding standards & design principles	5	Project-specific coding rules, design patterns, naming conventions documented
Domain-specific knowledge	5	Business logic, domain terminology, business workflows documented
Architecture decision criteria	5	ADRs exist with current decision criteria summarized separately
Done definitions	5	Mechanical completion criteria (tests, verification, acceptance) documented so LLM can judge autonomously
Structured review perspectives	5	Project-specific review criteria structured, not reliant on generic templates alone

Scoring notes:

Done definitions: PR template only = 1, per-task mechanical criteria = 5
Review perspectives: Generic-dominant = 1, project-specific-dominant = 5

When scoring: Abundant technical debt markers (TODO/FIXME/HACK) without codified principles are evidence of subjective frustration accumulation, not healthy management. Debt markers grounded in principles are evidence of conscious debt tracking.

Perspective 3: Workflow and Guardrails (25 points)

Whether operational mechanisms exist to translate principles into execution.

Foundation (2 points)

Item	Points	Criteria
Commands/agents/skills exist	2	Some form of task execution definition exists

Task Execution Environment (6 points)

Item	Points	Criteria
Navigation design (entry point → skills → references)	3	Progressive context loading from entry point—minimal context acquired at each stage
Context reference	2	Skills/commands specify reference targets and purposes, loading only when needed
Specialized agent definitions	1	Single-responsibility agents defined for context control and bias elimination

Scoring:

Navigation: Entry point is minimal principles + routing = 3, entry point is overloaded / commands carry excessive context = 1, no navigation = 0

Review Workflow (7 points)

Item	Points	Criteria
Review integration point	3	Review occurs during implementation, not dependent on PR/CI post-stage
Design/test priority	2	Design doc review and test review are built in as highest priority
Review perspective reference	2	Structured review perspectives are referenced

Scoring:

Integration point: During implementation = 3, PR/CI only = 1, none = 0
Design/test priority: Documented + prioritized in workflow = 2, documented only = 1, none = 0

Test Verification & Quality Assurance (6 points)

Item	Points	Criteria
Test generation/verification commands	2	Commands or skills for testing are defined
Test guidelines reference	2	Testing guidelines are referenced
Quality assurance guardrails	2	Pre-commit, local verification, or AI auto-checks run during implementation

Scoring:

Guardrails: CI only = 1, automated verification during implementation = 2

Value Stream Integration (4 points)

Item	Points	Criteria
Integrated review mechanism	2	A mechanism to synthesize multiple review results is defined
Conflict resolution rules	2	Priority and adjudication rules for conflicting perspectives are documented

Scoring:

Integrated review: Explicit integration output format = 2, partial = 1, none = 0
Conflict resolution: Documented priority/adjudication rules = 2, implicit/ad-hoc = 0

When scoring: Test verification workflows with low-quality tests (low behavioral coverage, high mock density) indicate the workflow is ineffective regardless of its existence. High test quality maintained through workflows is evidence that guardrails are functioning.

Perspective 4: Context Quality (30 points)

Quality criteria for all context artifacts: AGENTS.md, commands, agents, skills, rules, and design templates.

4-1. Single Responsibility (5 points)

Each context artifact serves one purpose/domain.

Points	Criteria
0	Multiple unrelated responsibilities mixed in a single file
3	Generally separated, but some responsibility mixing remains
5	Each artifact clearly serves a single purpose

Detection indicators:

Single command/skill handles multiple unrelated tasks
AGENTS.md contains detailed procedures beyond entry-point role
Agents split by review perspective, not by context-control purpose

4-2. Consistency (5 points)

No contradictions between artifacts; all references match the current codebase.

Points	Criteria
0	Contradictory instructions between files, or stale descriptions remain
3	Main artifacts are consistent, but some inconsistencies exist
5	All artifacts are coherent and match the current codebase state

Detection indicators:

AGENTS.md instructions contradict command procedures
References to deleted files or APIs remain
Conflicting rules between rule files

4-3. Explicitness (5 points)

No ambiguity; LLM can judge and execute without hesitation. Stated in positive form.

Points	Criteria
0	Many ambiguous instructions; relies on negative-form instructions
3	Main instructions are clear, but some ambiguity or negative forms remain
5	All instructions are specific, positive-form, with no room for LLM hesitation

Detection indicators:

Vague instructions like "write good code"
Negative-form instructions like "don't do X"
Success criteria or output formats undefined
Skills/commands with ambiguous purpose, usage conditions, or I/O

4-4. Target Clarity (5 points)

Written for LLMs. No human-oriented preambles, decoration, or explanatory prose.

Points	Criteria
0	Human-oriented documentation repurposed as LLM context
3	Generally LLM-targeted, but human-oriented decorative text remains
5	All artifacts designed with LLMs as the target audience

Detection indicators:

Preambles like "This document aims to..."
Decorative headings or emoji
Verbose explanations assuming human readers
README content copied verbatim

4-5. Project Specificity (5 points)

Focused on project-specific judgment, not generic advice. Not verbose.

Points	Criteria
0	Mostly transcribed generic best practices or copied from other projects
3	Contains project-specific content, but generic advice is mixed in
5	All statements are project-specific judgments; nothing that should be handled by tooling

Detection indicators:

Content like "use camelCase for variables" that linters/formatters should handle
Transcribed generic coding principles
Settings copied from another project without understanding

4-6. Quality Assurance Mechanisms (5 points)

Each task/principle has clear purpose and criteria, with verification mechanisms.

Points	Criteria
0	Procedures only; no purpose, criteria, or verification steps
3	Purpose is clear, but verification means (checklists and similar) are partial
5	Each task/principle has purpose, criteria, and quality checklists enabling artifact verification

Detection indicators:

Lists of procedures without stated purpose
Execution instructions without verification steps
Undefined behavior for uncertain situations