What Is an AI Detector?
How AI detection tools decide whether a piece of text was written by a human or generated by a language model, and where their limits lie.
If you have sent an essay through Turnitin recently, you have already met an AI detector face to face. Behind the "AI Generated" label sits a piece of software that tries to guess whether every sentence was typed by a human or spat out by a model. Understanding how the guessing works is the first step toward knowing how much weight to give it.
What detectors actually measure
Most commercial detectors rely on two core signals: perplexity and burstiness.
Perplexity
Perplexity captures how predictable each word is given the words that came before. A sentence like "The quarterly revenue increased by 12 percent" is highly predictable to a language model, so it scores low on surprise. Human writers constantly reach for unusual phrasing, odd metaphors, or sentence fragments that would make a language model pause.
Think of it this way: if you can guess the next word in a sentence before you finish reading it, the perplexity is low. If the writer surprises you, the perplexity is high.
Burstiness
Burstiness tracks the rhythm of a paragraph. People tend to write in bursts: a long clause, then a short one, then a run-on sentence that trails off. Language models produce a more uniform cadence. Detectors look for that flattening.
Key insight: Human writing is rhythmically uneven. AI writing is rhythmically flat. That difference is one of the strongest signals detectors use.
Token probability
A third signal that many systems now incorporate is the probability distribution the model assigns to its own top tokens. Researchers call this log probability or token-level entropy. If the model is extremely confident about every token, the text is likely something the model could have produced itself. If the model hesitates, swapping between many possible next words, the text is more likely to be human.
A 2024 study from researchers at Stanford found that token-level entropy is one of the strongest single features for classification, outperforming perplexity alone on several benchmarks.
The machine-learning layer
Perplexity and burstiness feed into a classifier—usually a logistic regression model or a small neural network. That classifier was trained on pairs of human-written and AI-generated texts.
- Turnitin has said its training corpus includes tens of thousands of essays across multiple disciplines
- GPTZero uses a mix of academic papers, news articles, and its own synthetic data
- Originality.ai trains on content from the open web
The training data shapes what the detector considers "human" and "AI." If a detector was trained mostly on academic prose, it may flag blog posts or conversational copy simply because the writing style falls outside its training distribution. This is the root cause of many false positives.
Where false positives come from
A 2025 Stanford study analyzed over 1,200 essays across six universities and found that detectors labeled roughly 6.1 percent of human-written work as AI-generated.
| Group | False-positive rate |
|---|---|
| ESL writers | ~12% |
| Native English speakers | ~3% |
The gap exists because ESL prose often uses more standardized grammar, shorter sentences, and a narrower vocabulary—the exact patterns detectors associate with AI.
- Turnitin has published its own internal false-positive rate of 3.8 percent on English-language essays
- ZeroGPT's rate in independent tests has exceeded 14 percent
These numbers matter because a single false positive can trigger an academic integrity investigation.
Genres that cause problems
Detectors also struggle with certain genres:
- Legal contracts follow rigid templates
- Medical abstracts use standardized structures
- Press releases follow the inverted-pyramid format
- Literature reviews repeat the same paragraph structure
When a student writes a literature review using the same structure every time, the detector sees low perplexity and flags it. The detector is not wrong about predictability; it is wrong about the cause.
The missing context problem
AI detectors give you a score, not an explanation. They do not:
- Tell you which sentences triggered the flag
- Account for Grammarly or AI autocomplete usage
- Distinguish between copying an entire ChatGPT output and using AI to brainstorm then rewriting by hand
Both scenarios might produce text that looks statistically similar to AI output because the student internalized the model's phrasing during the drafting process.
Why the tools still matter
Despite their flaws, detectors serve a real purpose. At scale, they give educators a triage tool.
A professor with 200 essays cannot read every one with the same level of scrutiny. A detector that flags 20 of those essays lets the professor focus attention where misconduct is more likely.
Universities that treat detector output as one signal among many—alongside writing samples, drafts, and oral examinations—get reasonable results. Problems arise when institutions treat the score as a verdict rather than a hint.
The takeaway
An AI detector is a probabilistic classifier trained on a snapshot of writing styles. It works reasonably well on the data it was trained on, and it fails in predictable ways on text that falls outside that distribution.
Knowing those limits puts you in a better position to interpret its output and to make informed decisions about when and how to use it.