Documentation in the age of AI
Documentation Is Not Dead in the Age of AI. It Matters More Than Ever.
LLMs (Large Language Models) and AI are very good at producing code quickly. Sometimes, AI can generate in minutes what would take an experienced developer days to write. It's similarly very common to use AI to "update this function to..." or "modify this data structure to..." or "optimize this code to..." and let the machine do the heavy lifting.
It's tempting. It's really tempting. In fact, not only is it faster and easier to let the computer worry about edge cases or obscure syntax, there's often pressure to get more done, faster. Some places of business mandate the use of AI. There's definitely an incentive to "do more with less." Would you rather pay a human for hours of effort to write a function that may not address edge cases and may not be optimally efficient, or would you rather have an LLM spend twenty seconds to put together something quickly? After all, LLMs are trained using samples of code that were already written, already checked for edge cases, already optimized, and already made available to the public; why not take advantage of those existing efforts and investments?
So, if an AI will be drafting code, who cares about documentation? Why would someone take even more time to explain their code, document their choices, work through the details and consequences of a block of code? If the incentive is to get more code written faster, if the consumer of those comments will most likely be an AI that can figure out how things already work on its own, why in the world would someone look for new and creative ways to make that process take even longer?
That line of thinking is understandable. It is also wrong.
Documentation matters more now, not less. In an environment where code is often drafted, revised, summarized, and refactored by AI, intent becomes easier to lose with each successive pass. The risk is not only that the model produces incorrect code. In fact, life would be much easier if the biggest risks of generated code were either caught by a linter or obviously wrong the moment the code was reviewed.
No, the real risk is that it produces plausible code -- code that may be locally correct -- while gradually drifting away from the original purpose.
That distinction matters.
A bad answer can be rejected quickly. A plausible answer that subtly warps the original intent can survive review, pass a unit test, and quietly reshape a system over time.
That is one of the great dangers of AI-assisted development. The failure mode is not always hallucination. Sometimes it is gradual reinterpretation.
The Telephone Game Problem in Source Code
Most of us know the old telephone game. One person says something to the next. That person repeats it to the next. After enough rounds, the message may still sound like something reasonable, but it no longer says what was originally meant.
AI-assisted coding can work the same way.
The first draft may be decent. A later prompt asks for cleanup. Another asks for modernization. A later review asks for more consistency. Someone else asks the model to "simplify" the logic. Another person asks it to "make this easier to maintain." Each request sounds reasonable. Each revision may even be locally defensible.
That does not mean the result remains faithful to the original intent.
In fact, each pass creates an opportunity for the model to infer meaning from code patterns rather than from the author's actual design. Once that happens repeatedly, the code can drift in the same way a repeated retelling drifts. Each version resembles the one before it. The final result may no longer resemble the original reasoning at all.
This is one of the reasons I have become increasingly convinced that documentation is not a decorative artifact. It is part of the control surface by which we preserve intent.
Replicate This Image 100 Times
A funny example of this can be demonstrated by repeatedly asking AI to replicate an image you provide, each time feeding the result of the previous iteration back into AI. Some folks repeated this process dozens of times and often wound up with a result that was radically, comically different than the original source image. You can experiment and see for yourself using a meme-worthy prompt:
Replicate this image 100 times, don't change a thing.
To save you some time, you can also search and see the results from others who tried the same experiment.
For copyright and usage reasons, I am not reproducing those animations here. The prompt itself tells the story well enough, and readers can easily locate examples of the experiment by searching for that exact phrase.
That restraint is not only the safer choice. It also keeps the emphasis where it belongs: on the phenomenon rather than on a specific media asset.
That said, I tried this experiment myself. The output of one generation was fed into the next, again and again. At first, the changes were subtle. A detail moved slightly. Something in the background became more prominent. Facial features softened or shifted. Little distortions accumulated. By the end of the process, the final image no longer meaningfully represented the original.
Each iteration was, in one sense, trying to be faithful. The cumulative effect was drift. Local plausibility did not preserve global fidelity.
Software behaves similarly under repeated AI reinterpretation. Each revision may appear reasonable when judged only against the immediately prior version. That is not the same thing as preserving the original intent, constraints, safeguards, and trade-offs that gave rise to the code in the first place.
Code Does Not Explain Itself as Well as We Pretend
Developers sometimes say that good code is self-documenting.
That is true only in a very narrow sense.
Good code may reveal structure. It may communicate naming, boundaries, and flow. It may express something about what the system does. It rarely explains, by itself, why a particular decision was made, which constraints shaped the implementation, what edge case forced an unusual guardrail, or what trade-off a maintainer must not casually undo.
That "why" matters enormously.
It matters to a tired human reviewer during an incident. It matters to the developer who returns to a file nine months later. It matters to the teammate who inherits a subsystem they did not design. It now matters to the model that is being asked to interpret, refactor, and extend that code.
Without that context, the AI does what it always does: it predicts the most plausible continuation from the signals available to it. It's turbo-charged autocomplete.
- Sometimes that works beautifully.
- Sometimes it decides that a weird-looking conditional is unnecessary.
- Sometimes it decides that a duplicated validation step is redundant.
- Sometimes it decides that a conservative retry loop is overly cautious.
- Sometimes it decides that a backward-compatibility quirk is poor style and ought to be cleaned up.
In each case, the model may be doing exactly what it was asked to do. The problem is that it is optimizing for surface coherence without access to the deeper rationale unless that rationale has been preserved somewhere visible.
That is where documentation becomes essential.
Documentation as Intent Preservation
When documentation is done well, it serves at least three purposes.
First, and most importantly, it helps a human reviewer validate whether the AI understood the assignment. A developer can read the comments, docstrings, and narrative explanation and compare them against the actual requirement. It is often easier to spot misunderstanding in plain language than in implementation details alone.
Second, it helps future revisions preserve design intent. When code is revisited later, good documentation reminds the reader that certain behavior is deliberate. It tells them which odd-looking parts are there for a reason. It records the assumptions that must continue to hold.
Third, it reduces interpretive drift across repeated machine involvement. If the code tells the model what the system does, and tests tell the model what must continue to work, documentation tells the model why the code exists in this form.
That third point is easy to underestimate.
Documentation is not only a gift to future human maintainers. It is also a guardrail for future machine-assisted edits.
A Few Concrete Examples
Consider a security check that appears redundant:
if [[ -z "${TARGET_DIR:-}" ]]; then
printf '%s\n' "TARGET_DIR must be set"
exit 1
fi
if [[ ! -d "${TARGET_DIR}" ]]; then
printf '%s\n' "TARGET_DIR must reference an existing directory"
exit 1
fiA later model might collapse or reorder those checks in the name of brevity. That may not sound dangerous until you realize the original author had a reason to distinguish "unset" from "set but invalid." The first case might indicate configuration failure. The second might indicate environmental drift or operator error. Those are not the same operational problem, and the logs may need to preserve that distinction.
Now imagine the same code with intent documented:
# We check for "unset" and "invalid path" separately on purpose.
# These are distinct operational failures with different remediation paths.
# During incident review, collapsing them into a single message makes it
# harder to determine whether the failure was caused by configuration
# omission or by an unexpected filesystem state.
if [[ -z "${TARGET_DIR:-}" ]]; then
printf '%s\n' "TARGET_DIR must be set"
exit 1
fi
if [[ ! -d "${TARGET_DIR}" ]]; then
printf '%s\n' "TARGET_DIR must reference an existing directory"
exit 1
fiThat comment changes the situation entirely. A future human or AI can still propose a revision, but now they must contend with intent rather than syntax alone.
Here is another example. Suppose a developer writes a retry loop that looks conservative to the point of awkwardness:
attempt=1
while (( attempt <= 3 )); do
if perform_remote_update; then
break
fi
sleep 10
(( attempt += 1 ))
doneA later assistant may "improve" this by making retries more aggressive or by reducing delay to speed up execution. That change may look elegant. It may also be a serious mistake if the remote endpoint rate-limits callers or if the delay is deliberately chosen to let eventual consistency settle.
A small narrative note can prevent a bad refactor:
# The 10-second pause is intentional. The remote service can briefly report
# inconsistent state after an accepted update. Retrying too quickly increases
# the likelihood of duplicate work and misleading failure signals.
# This loop is conservative by design.That is the sort of detail code does not reveal by itself.
Reviewable Comments Create a Second Lens
One of the underrated benefits of documentation in AI-assisted workflows is that it gives the reviewer a second lens through which to judge the result.
When an AI produces code and commentary together, a human can evaluate both the implementation and the explanation. If the code appears correct but the explanation is off, that is a warning sign. If the explanation is strong but the code fails to reflect it, that is also useful information. Either way, the comments become part of the review surface.
This is one reason I favor a documentation-first standard for AI-assisted work. The goal is not to bury the reader in prose. The goal is to make intent visible enough that misunderstandings can be detected early.
That includes purpose, scope, assumptions, invariants, edge cases, failure modes, security considerations, and non-goals.
Those are not luxuries. They are stabilizers.
There Is a Catch
The only thing worse than no documentation is wrong documentation.
Documentation helps only when it is truthful.
That sounds obvious, yet it is worth saying plainly. Bad comments are worse than missing comments because they lend false confidence. In an AI-assisted workflow, that danger increases because the model can generate commentary that sounds polished, complete, and authoritative while being subtly or fundamentally wrong.
So the lesson is not, "Let the AI generate lots of comments."
The lesson is, "Let the AI draft reviewable documentation, then make a human responsible for verifying that the explanation matches reality."
That is a very different posture.
The goal is preservation of intent, not accumulation of words.
Commentary that merely narrates syntax is not enough. Commentary that invents rationale after the fact is dangerous. Commentary that explains purpose, constraints, trade-offs, and consequences can be invaluable.
If a future human or model cannot tell why the code exists in its present form, the system is more fragile than it appears.
What This Looks Like
The Doxygen project has been around since 1997. Like many other documentation generators, Doxygen scans through source code files for comments annotated with symbols, macros, and other markup. Here's a brief example from a shell script:
## @fn _log()
## @brief Format a message and emit it through logger or standard error.
## @details
## This helper accepts printf-style arguments, formats them into a single
## message, and then attempts to write that message through logger with
## standard-error mirroring enabled. When logger is unavailable, the function
## falls back to writing the formatted message directly to STDERR with printf.
## The syslog facility and level may be supplied through the facility and level
## environment variables and default to user and info.
## @param format the printf-style format string followed by replacement values
## @retval 0 logging completed
## @par Examples
## @code
## level='info' _log 'Loaded configuration from %s' '/etc/example.conf'
## facility='local0' level='err' _log 'Fatal error: %s' 'missing input'
## @endcode
_log() {
local _message
_message="$(printf "$@")"
if command -v logger >/dev/null 2>&1 ; then
logger -s -p "${facility:-user}.${level:-info}" -- "${_message}"
else
printf '%s\n' "${_message}" >&2
fi
}This documents a function, _log(), that sends debugging and diagnostic information back to the user.
This Is About More Than Style
Some people will hear a call for extensive documentation and assume this is a stylistic preference.
It is not.
This is an architectural concern, a maintenance concern, a reliability concern, a security concern.
When the goal is to have Fewer Incidents, clear, accurate documentation is one of the best tools available.
The Real Question
The question is not whether AI can read code. It can.
The real question is whether AI can be trusted to preserve intent across repeated revisions when that intent has not been made explicit.
In many cases, the answer is no. Not because the model is malicious or uniquely defective, but because inference is not the same thing as understanding. Prediction is not the same thing as memory. Surface coherence is not the same thing as fidelity to purpose.
That is why documentation still matters.
Actually, that is why it matters more than ever.
The alternative is a kind of technical telephone game in which every iteration sounds plausible, every revision seems reasonable, and the final result quietly forgets what mattered most.