How ChatGPT Generates References Without a Database
ChatGPT is not connected to CrossRef, PubMed, or any live bibliographic index when it generates a reference list. Instead, it produces reference strings that conform to the statistical pattern of real references in its training data. The output looks like a bibliographic entry because it was trained on millions of them. The author name follows the correct format for the requested style. The journal title is real (though it may not publish on the topic you requested). The year falls within a plausible range. The DOI prefix 10.XXXX/ is present — but the numeric suffix was generated, not retrieved.
This is why the fabrication does not appear in proofreading. Every visual field looks correct. The only test that reliably catches the fabrication is checking whether the identifier resolves to a real document — which the structural checker does by validating the identifier's format and flagging any entry where the identifier is absent or structurally invalid.
The Specific Patterns ChatGPT References Follow When Fabricated
Structurally fabricated ChatGPT references cluster into three patterns. The first is a complete entry with a plausible DOI that does not resolve — the DOI prefix is valid but the suffix is invented. The second is a complete entry with no identifier at all — the author, year, title, and journal are all present, but the DOI and URL fields are simply absent. The third is a complete entry with a journal name that is real but a volume, issue, or page range that does not correspond to any actual article in that journal.
The structural checker catches the first two patterns directly — invalid DOI format and missing identifier are both flagged. The third pattern (real journal, invented article) requires a manual database lookup to detect, since the structural fields all pass the completeness test.
What the Structural Check Actually Tests
The check runs five tests per reference entry. It confirms that an author field is present in the expected position for the citation style. It confirms that a four-digit year appears in a plausible range. It confirms that a title fragment is present and long enough to be a real title. It confirms that a publisher or journal name is present. And it tests the identifier field — checking that a DOI matches the 10.XXXX/ prefix format and a URL begins with a valid scheme.
The identifier test is where most ChatGPT-generated entries are caught. A DOI-formatted string is flagged if the prefix does not match the standard format. An entry with no identifier at all receives the maximum risk weight for that field. Entries that pass all five fields are returned as low-risk, meaning they are structurally complete and warrant a manual spot-check rather than immediate concern.
What to Do After Running the Check
For every flagged reference, the correct response is a manual lookup: take the author and title from the flagged entry and search CrossRef (doi.org/search) or Google Scholar. If an exact match appears, replace the original entry with the verified version including its real DOI. If no match appears after two independent searches across different databases, the reference should be removed and replaced with a source you can independently locate. Do not assume a non-matching result means the source does not exist — some legitimate sources are indexed only in specialized databases. But if the source cannot be found in CrossRef, Google Scholar, and the relevant field-specific database, it should not appear in your bibliography.