Architecture Decisions in the Age of AI

From Intent to Constraints

In a previous article, I argued that documentation matters more in AI-assisted development because it preserves intent across repeated machine-mediated revisions. This article picks up where that argument leaves off:

if documentation matters, which kind matters most? How do we turn recorded intent into usable constraints?

That claim runs against instinct. When a system can read code, explain it, refactor it, and even rewrite it, it is tempting to assume that documentation is becoming less relevant. The opposite is happening. As more of the development process is delegated to systems that infer meaning rather than understand it, intent becomes easier to lose.

That loss rarely happens all at once. It happens gradually.

A function is rewritten for clarity. A module is reorganized for consistency. A piece of logic is simplified. Each change appears reasonable. Each change may pass review. Over time, however, the system drifts away from the decisions that shaped it. The result is not obviously broken. It is something more subtle: a system that behaves differently than intended, for reasons that are difficult to trace.

Documentation preserves intent.

The next question is more practical:

What form of documentation is most effective when working with AI?

Not All Documentation Is Equal

Most documentation explains what a system does. Some documentation explains how it does it. A smaller portion explains why it was designed that way.

That final category carries the most weight.

A system's behavior is shaped by decisions. Those decisions involve trade-offs:

performance vs. consistency
simplicity vs. flexibility
strict validation vs. permissive handling
immediate correctness vs. eventual consistency

Those trade-offs are rarely visible in code alone. They are not consistently captured in comments. They live in conversations, design reviews, and the experience of the people who built the system.

When those decisions are not recorded, they do not disappear. They become implicit. Implicit decisions are easily reinterpreted.

That reinterpretation is where drift begins.

Architecture Decision Records (ADRs) provide a way to make those decisions explicit.

Here is a sample ADR from my public template repository: ADR-000: Capability Scope, Epistemic Honesty, and Separation of Concerns

A well-written ADR does more than record what was chosen. It explains:

the context in which the decision was made
the alternatives that were considered
the trade-offs that were accepted
the consequences that must be preserved

This is valuable for humans.

It is equally valuable for AI.

Every Unanswered Question Becomes a Guess

When an LLM is asked to generate or modify code, it operates from patterns. Given a prompt, it predicts what a reasonable continuation might look like. It does not operate from first principles. If the prompt is underspecified, the model fills in the gaps.

Those gaps are not filled randomly. They are filled with statistically likely defaults:

synchronous calls instead of asynchronous messaging
simplified validation instead of defensive checks
conventional patterns instead of domain-specific constraints

Those defaults are often reasonable. They are also often wrong.

The model cannot distinguish between:

an intentional constraint
and an accidental omission

Both appear negotiable.

This leads to a simple but important principle:

Every unanswered question becomes a decision the model will make on your behalf.

ADRs reduce the number of unanswered questions. They move decisions out of the model and into the design process.

That is meaningful progress. It is not sufficient on its own.

From Recorded Decisions to Operational Constraints

In practice, ADRs are long, detailed, and narrative. They are written for humans. They preserve context, reasoning, and trade-offs.

That is exactly what they should do.

However, a practical limitation appears when working with AI systems.

Even when a model can accept large amounts of text, it does not treat all of it as equally important. As context grows, the model must decide what to prioritize. Some details are emphasized. Others are effectively ignored.

In my own experiments, I worked with a corpus of several dozen ADRs and providing all of them at once produced inconsistent results. Earlier models recommended including only a handful at a time. Newer models expand the window, but not the underlying behavior.

The issue remains:

Comprehensive documentation does not automatically become actionable context.

When too many decisions compete for attention, signal is diluted. Important constraints are buried within explanation. The model complies unevenly.

The result is inconsistency.

This leads to a necessary refinement.

ADRs are excellent at preserving intent. They are not always optimal for enforcing it.

To make decisions operational, they must be expressed in a form the model can reliably apply.

Writing Code vs Writing Prose

I experimented with corpora of ADRs for both software engineering and coding projects as well as prose-generation projects. I was curious to see if the domain and type of artifact generated was a key factor in the experiments.

In some ways, it was. Codifying a loop has only so many possible completions whereas drafting a narrative has nearly limitless completions.

In other ways, it wasn't. Attempting to constrain the AI with a large body of rules and including the reasoning behind those rules so that the AI could apply the logic was very similar in both domains.

Constraints, Not Suggestions

A suggestion influences behavior. A constraint restricts it.

Consider the difference:

"Prefer asynchronous communication between services."
"Services must communicate asynchronously. Direct synchronous calls are not allowed except for authentication."

Both express the same idea. Only one is enforceable.

When given suggestions, a model treats them as preferences. When given constraints, the model is more likely to preserve them across transformations.

More consistently, but not perfectly.

This distinction matters because of how models handle uncertainty.

When uncertain, the model does not pause. It generalizes. It leans toward common patterns. In doing so, it preserves what is easy to infer and loses what is important to maintain.

Explicit constraints protect the latter.

A Hybrid Approach

The solution is not to replace ADRs. It is to extend them.

An ADR should continue to serve its original purpose: preserving the human reasoning behind a decision. It should explain context, alternatives, trade-offs, and consequences. That narrative record matters because it helps people understand why a design exists in its current form.

For AI-assisted workflows, however, that is only half the job.

Each ADR should also produce a smaller derived artifact: a constraint set written in terms the model can reliably apply. The ADR explains the reasoning. The constraint set defines what must remain true.

For example, an ADR might say that internal services communicate asynchronously to preserve resilience and avoid tight coupling, with a narrow exception for authentication. A derived constraint set based on that ADR could say:

Internal service-to-service communication must use asynchronous messaging.
Direct synchronous calls are prohibited outside authentication flows.
Generated code must not introduce new synchronous internal dependencies unless explicitly approved.

That second form is not better than the ADR. It is better suited to a different task. The ADR preserves intent for humans. The constraint set operationalizes that intent for iterative AI-assisted generation. As a result, ambiguity is reduced without sacrificing completeness.

Iteration, Convergence, and the Reality of Applying Constraints

Applying ADRs is iterative, not a single-step operation.

With a large ADR corpus, constraints must be applied in subsets. Each subset refines the artifact. The result is passed forward, revised again, and evaluated repeatedly.

This introduces friction.

Each iteration carries slightly different context. Earlier decisions weaken. Later decisions introduce tension. Reapplication becomes necessary.

The process becomes cyclical. Seven to ten rounds of interaction are not unusual. Each round introduces opportunities for drift.

Sequential constraint application produces local correctness.

V-Shaped Design

For the best results, I found a V-shaped process was very helpful. At the start, I would generate a description of the project at a high level.

Then, I would codify a series of tests to verify that the generated artifact functions correctly or, in the case of prose, complies with the intended writing rules.

Next, I would engage the primary generation loop. This is where those iterations live. The phases and passes and collections of ADRs are all processed here.

Then, I would run the tests. For code, this is unit tests. For writing, this is the writing guidance compliance verification.

Finally, I would ask the AI to describe the artifact it generated; I would then compare that description with the original description that was generated first. If both were similar, great! If they were both far from each other, then clearly something went wrong.

I found it helpful to use different LLMs and/or different models here so that the biases and assumptions inherent to one model weren't amplified when the second description was created.

Validation as a Multi-Layered Process

I found that validation must operate across multiple dimensions:

Constraint validation ensures required properties and prohibitions are respected.
ADR alignment checks consistency with the full decision set.
Intent reconstruction compares the output to the original objective, sometimes using multiple models to surface discrepancies.

These layers provide signal but they do not provide certainty.

A Brief Experiment in Fidelity

To better understand how well intent survives transformation, consider a simple experiment.

Take a photograph. Ask an AI system to describe it in detail. Use that description to generate a new image. Repeat this process independently several times.

Here is my source image:

Bald white man with a long gray beard and a pit bull in his lap

This is the very best outcome when ChatGPT was used to describe and then to generate the image:

Feeding a ChatGPT-generated description to another image generator often had surprisingly poor results.

Some results were worse.

Two similar bald white men with beards, one growing out of the other

Some results were comically worse.

Bald white man with four arms, no dogs, and the wrong number of fingers

Some results were just plain cursed.

Distorted bald white man with six arms, way too many figures, and a distorted dog

Some results had no apparent connection to the original source image.

a mass of gray wool, some brown wool, and three random, disarticulated hand segments

Even with detailed descriptions and careful prompting, the outputs varied widely. Some retain recognizable elements while others introduce distortion, omission, or substitution. Some diverge dramatically.

These are not accumulated errors. Each is an independent attempt to reconstruct the same intent.

This matters here because derived constraints reduce ambiguity, but they do not eliminate the fundamental indeterminacy of reconstruction.

Knowledge, Discernment, and Action

At this point, the boundary becomes clear.

The process improves structure. It does not determine correctness.

This is where the Knowledge, Discernment, and Action model applies:

Knowledge includes captured decisions, constraints, and outputs. It can be expanded and organized.
Discernment interprets that knowledge. It identifies subtle misalignment. It recognizes when something is plausible but wrong.
Action determines what happens next: accept, revise, reject, or stop.

These cannot be automated away.

A system can produce something that is consistent, compliant, and convincing - and still require a human to determine whether it is correct.

Conclusion: Control, Not Certainty

This process is not definitive and it can definitely be improved. Newer models and better tooling will reduce the probability of error but they will not eliminate it.

It is easy to accept the first convincing result.

That, however, is not a viable approach.

AI can improve efficiency. It does not replace knowledge, discernment, or responsibility.

Shifting Responsibilities

The role of people does not disappear. It shifts.

Less effort is spent on execution while more effort is spent on defining intent, evaluating outcomes, and ensuring alignment with the desired result.

Skills like optimizing loops and memorizing function parameters become less central while understanding systems, interconnections, and the domain knowledge needed to describe a problem move to the forefront.

Just as craftspeople transitioned from one skillset to another as manufacturing processes evolved, so, too, must the skillsets of knowledge-based professionals.

Knowledge, Discernment, and Action remain.