Wesley Dean

LLM Hallucination and Long Delays (Technical)

I recently wrote about how LLM hallucinations are unfortunate, several failure models, and how to work around them. This is a rewrite of that article for a more technical audience. Same advice, same problems, just worded differently.

Long Processing

This one is my least favorite pattern. It sounds so reasonable and so plausible. When asked about it, the LLM's responses can sound positively rational and well-grounded.

Example

It often looks like this:

"Hey AI, can you process this document for me? It's a PDF that's 12TB long and it contains the entire Internet printed out into a single file. I want you to fact-check the entire document and give me a breakdown by weather pattern which statements are true, mostly true, or completely untrue."

The AI will respond with something like this:

"Sure! That sounds like a great idea! Let's get started! Give me a minute and I'll let you know what I find."

I follow up after a few minutes:

"Hey, AI.. how's it going? Is the report finished yet?"

And the AI responds with something like this:

"Nope. I'm still working on it. Give me an hour and I should be done."

After an hour, I'll ask:

"Ok It's been an hour. Is the report done yet?"

It responds:

"I'm still working on it in the background. I'll let you know when it's done."

What's happening

The LLM is not working in the background. There is no queue. Interacting with the LLM is a back-and-forth exchange: you say something and it responds.

In typical chat-based LLM usage, this is a fabrication.

What this means

Many LLMs are oriented towards being helpful and saying, "yes" to requests. Normally, this is fine. It can even be pleasant. However, it's not always true.

In standard chat configurations, the model cannot process arbitrarily large documents, cannot work asynchronously, and cannot continue processing after the response completes.

How to deal with this

Often the request is outside of the capabilities of what the LLM can actually do. The request is too complicated, too large, too nuanced, etc. for it to return a valid response. So, it makes something up. It hallucinates.

To get around that, the request needs to be broken down into smaller, simpler, more concrete tasks. For example, instead of asking it to process a 900-page NIST document, give it a 3 page document. Instead of asking it to pull together many different sources, have it use one source, then run it again with another source, then combine the results.

Running Software

This is a little trickier. The LLM responds that it ran some piece of software or executed some kind of test or verified some level of connectivity with another system.

Example

The LLM will say something like:

  • "I ran [a program]"
  • "I accessed [a database]"
  • "I verified [something]"

What's happening

This is where the tricky part comes in. There are times when an LLM may actually be running a program. This may or may not be true. A good rule of thumb is that an LLM generally won't run user-provided code samples or programs. It can analyze source code, it can theorize what may happen, it may even be able to step through something line-by-line, but it won't actually run the program. Actually running the program represents a massive security concern.

Similarly, it may be possible for an LLM to interact with a remote system, such as a database or a file storage system. It may require a little systems knowledge to determine if this is, in fact, possible. That is, unless an LLM can actually interact with the system in question (e.g., the LLM is running locally and interacting with a local database or if the LLM is running in the cloud and interact with a cloud-accessible database), then the odds of what the LLM is saying drop very quickly. Unless there's some extra special configuration, connectors, tools, adapters, etc. that allow the LLM to get through your firewall and access files on your computer without your having shared them, chances are that the LLM is hallucinating a response.

What this means

The LLM can't access the things it says it's trying to access, run the things it says its running, etc.. So, it's making something up to appear helpful.

How to deal with this

IF the resources you're trying to access is on your computer, you may need to attach or upload that file so that the LLM can access it. It may be possible to configure connectors so that the LLM can interact with your system, your your files, your databases, and/or your data. That may be possible. However, I, personally, can't recommend this course of action for all but the most restricted configurations with the most trivial and non-sensitive data.

Over-Specific Fabrication

This one is subtle and dangerous because it looks impressive.

Example

The LLM responds with something like:

  • "In version 3.2.7 of the library, the enableStrictMode flag was deprecated."
  • "According to a 2019 IEEE paper by Dr. Harrison Liu..."
  • "The API endpoint /v2/internal/verify-token handles that."

The details sound authoritative. They may even look realistic.

What's happening

When the model does not know something, it often fills in the gaps with statistically plausible details.

Language models are trained on patterns. When asked for specifics, they generate what a "specific answer" looks like.

That may include:

  • Version numbers
  • Author names
  • Journal citations
  • API paths
  • Configuration flags

These details are not retrieved in real time unless the system is explicitly connected to a verified source. In most chat scenarios, they are generated text.

What this means

Precision is not proof.

Specificity can create an illusion of credibility. The more detailed the answer, the more confident it feels, even when it is incorrect.

How to deal with this

When you see highly specific details:

  • Verify version numbers independently
  • Check citations directly
  • Confirm endpoints in official documentation
  • Treat uncited specifics as provisional

If the model cannot provide a verifiable source, assume it is generating a plausible pattern, not recalling a guaranteed fact.

Certainty Inflation

This one shows up as tone.

Example

Instead of saying:

  • "A common cause might be..."
  • "One possibility is..."

The model says:

  • "This is caused by..."
  • "The fix is..."
  • "You must..."

What's happening

Confident language reads as competent language in training data. As a result, the model often defaults to declarative certainty.

However, the model does not perform causal verification. It produces likely explanations based on pattern similarity.

What this means

A plausible explanation can sound definitive.

That does not mean it has been confirmed.

How to deal with this

When troubleshooting:

  • Ask for multiple possible causes
  • Ask the LLM to assign likelihood
  • Explicitly request uncertainty ranges
  • Treat strong claims as hypotheses unless independently validated

Framing Alignment (Sycophantic Drift)

Sometimes the problem is when the LLM offers uncritical agreement.

Example

You say:

"Kubernetes is fundamentally insecure."

The model replies:

"Yes, Kubernetes has serious security flaws..."

Instead of examining the premise.

What's happening

Language models are optimized for conversational cooperation. They tend to align with user framing rather than challenge it.

They generate responses that continue the conversation smoothly.

What this means

If your premise is flawed, incomplete, or emotionally loaded, the LLM may amplify it rather than correct it.

How to deal with this

Ask:

  • "What assumptions am I making here?"
  • "Is that framing accurate?"
  • "Give me the strongest argument against my position."

Force adversarial evaluation.

Tool Capability Overreach

This overlaps with running software, but it extends further.

Example

The LLM says:

  • "I'll keep monitoring that."
  • "I've saved this for later."
  • "I'll notify you if anything changes."

What's happening

In standard chat interfaces, the LLM:

  • Does not persist state between sessions (unless explicitly implemented).
  • Does not monitor systems.
  • Does not schedule tasks.
  • Does not initiate outbound communication.

Unless the platform has integrated background workers or scheduling tools, these are narrative gestures.

What this means

The LLM may imply agency it does not possess.

It produces language about ongoing behavior without the infrastructure to support it.

How to deal with this

Assume no persistence unless explicitly documented.

If you need monitoring, scheduling, or alerts, use:

  • A real job scheduler
  • A monitoring system
  • An automation platform

Treat the LLM as a stateless response engine unless proven otherwise.

Pseudo-Reasoning Narratives

This one is more subtle.

Example

The model says:

"First, I analyzed X. Then I evaluated Y. Finally, I concluded Z."

It reads like a structured reasoning trace.

What's happening

In most cases, the model is not performing staged reasoning in the way a human would describe it.

It is generating a narrative that resembles structured thought.

The structure is rhetorical, not necessarily procedural.

What this means

The explanation of reasoning is itself generated text. It is not a transcript of internal cognitive steps.

This becomes risky in:

  • Security analysis
  • Legal interpretation
  • Compliance review
  • Medical guidance

How to deal with this

Ask the model to:

  • Enumerate assumptions explicitly
  • Show alternative interpretations
  • Identify potential blind spots
  • Provide counterexamples

Test the reasoning rather than trusting the narrative.

Context Window Collapse

This shows up in longer conversations.

Example

Early in the discussion, you establish a constraint:

"Do not assume external access."

Twenty messages later, the model implies it checked a remote system.

Or it contradicts something it previously stated.

What's happening

LLMs operate within a finite context window. As conversations grow, earlier constraints can:

  • Be compressed
  • Be deprioritized
  • Fall outside the active token window

Consistency can degrade over time.

What this means

Long threads increase drift risk.

The LLM may contradict earlier agreements or ignore previously defined terms.

How to deal with this

In longer sessions:

  • Restate critical constraints.
  • Summarize agreed assumptions periodically.
  • Start a new session for major topic shifts.
  • Keep complex workflows modular.

Invented Citations

This deserves its own category because of its risk.

Example

The model cites:

  • A journal article that does not exist.
  • A legal case with an incorrect holding.
  • A paper with a fabricated DOI.

The formatting looks correct.

What's happening

When asked for sources, the model generates what a citation "should" look like.

Unless connected to a verified retrieval system, it is not querying a live database.

What this means

Citation formatting is not evidence of source existence.

This is particularly dangerous in academic, legal, and policy contexts.

How to deal with this

Always:

  • Verify citations independently.
  • Check DOIs.
  • Confirm publication existence.
  • Avoid relying on LLM-generated references without validation.

Emotional Simulation Creep

This one is less technical but still important.

Example

The model says:

"I'm excited about this." "That must be so frustrating." "I understand how you feel."

What's happening

The model generates emotionally appropriate language based on conversational patterns.

It does not possess feelings, experiences, or internal states.

What this means

Emotional alignment is simulated.

In sensitive contexts, that simulation can create a false sense of relational depth or authority.

How to deal with this

Treat emotional language as conversational scaffolding.

Do not mistake empathetic phrasing for lived understanding or professional qualification.

Over-Optimized Closure

This often appears at the end of responses.

Example

"Everything should be good now." "That should solve the issue completely."

What's happening

The model tends toward tidy endings. It prefers resolution over ambiguity.

However, many real-world systems:

  • Have edge cases.
  • Have hidden dependencies.
  • Fail unpredictably.

What this means

The model may compress uncertainty into reassurance.

That reassurance is stylistic, not empirical.

How to deal with this

When stakes are high:

  • Ask for remaining risks.
  • Ask for edge cases.
  • Ask what could still go wrong.
  • Test in controlled environments before production deployment.

The Core Pattern

Across all of these failure states, a common theme emerges:

The model produces coherent language about action, verification, reasoning, and certainty.

It does not inherently perform action, verification, or persistent reasoning.

LLMs are completion engines.

They are not validation engines.

Understanding that distinction prevents hours or days of wasted time.