Next-Token Prediction, Not Database Lookup
ChatGPT generates every response, including citations, by predicting the most statistically likely next word or token given everything generated so far, based on patterns learned from its training data. When asked to cite a source, it does not search a live database of academic papers — by default, it has no connection to one. Instead, it produces a sequence of tokens shaped like a citation: an author name pattern, a journal name pattern, a year, and often a DOI-formatted string, because all of those patterns appeared frequently enough in training data for the model to reproduce their shape convincingly.
This is the central technical fact that explains the entire phenomenon: the model isn't retrieving a citation and getting it wrong. It's generating a citation-shaped output from a process that has no concept of 'real' versus 'fabricated,' because both categories were represented in training data as the same kind of token sequence.
Why This Persists Across Model Versions
Each new version of ChatGPT improves general capability, reasoning, and the breadth of patterns it can reproduce convincingly — which, counterintuitively, can make fabricated citations more convincing rather than less, since a more capable model generates more plausible-sounding fabrications. The underlying mechanism — generation without retrieval by default — doesn't change with model improvements unless retrieval is specifically built in and active for that conversation.
Versions of ChatGPT with browsing or retrieval tools enabled can reduce hallucination by actually looking sources up in some contexts, but this depends on the tool being active and used for the specific request — it isn't a guaranteed behavior, and citation tasks performed without an active retrieval tool remain subject to the same generation-without-verification pattern.
Why Proofreading Doesn't Catch It
A fabricated ChatGPT citation is generated with the same fluency, formatting confidence, and internal consistency as a real one, because both are produced by the identical generation process. There is no stylistic tell, no hedging language, no lower confidence signal that distinguishes a hallucinated citation from a real one in the text itself. The only reliable way to distinguish them is checking whether the citation resolves to a real document — which requires either a structural completeness check (does it have a verifiable identifier at all) or a direct database lookup (does this specific source exist).