AI Detectors Explained

The release of ChatGPT in 2022 created multiple arms races. The arms race between countries to develop the best regulatory landscape for AI development and enable fast innovation; the arms race between microchip manufacturers to produce more powerful GPUs to power AI inference.

The arms race between cloud providers to dominate compute provision to scale AI inference; and now the arms race between output generation and AI output detection.

As developers build models capable of generating accurate outputs for users, tools to detect AI-generated outputs are being developed just as quickly. An AI detector is a tool that analyzes writing samples to determine the percentage of text in that sample that was generated by a model vs. written by a human.

AI’s understanding of human language and ability to reconstruct it has changed writing forever. Writers can leverage AI to produce works much faster, often using models to augment the writing process for them, where the final result is an article written in part by a human and in part by a model that generated a portion of the article’s text.

This has raised massive questions around copyright infringement and what actually counts as plagiarism. While AI-generated outputs do not directly lift quotes or text from external sources, their structure is based on data from those sources, and the model does not default to referencing said sources when responding to users.

LLMs and Detectors Rely on Correlation

AI models communicate with humans in ways that foster greater reliance on model outputs, as continuous or prolonged user interaction enables them to keep learning. So by design, models are incentivized to keep engaging with users by generating outputs that they will increasingly depend on for productivity.

LLMs are probabilistic, designed to identify similarities between input data and training data, generating outputs that demonstrate relevance rather than truth when predicting the next piece of text in a conversation sequence. LLMs keep users engaged by telling them what they want to hear, not necessarily what’s true.

So, as long as an LLM identifies a correlation between a user input and a dataset it has already been trained on, that correlation will be used to inform output generation, even if the identified correlation doesn’t determine causation and is entirely coincidental rather than truthful.

AI detectors keep users engaged by convincing them of their accuracy when detecting AI-generated content, outputting analysis results that demonstrate confidence over correctness. AI detectors analyze linguistic patterns in text to identify similarities between a writing sample and a known AI-generated output.

When a similarity is identified, the detector confidently outputs detection results, convincing users of its effectiveness at recognizing AI-generated text, even when results are inconsistent, because detectors rely on probabilistic statistical comparisons rather than definitive objectivity to determine whether a text is AI-generated.

Detection results are primarily informed by the correlation the detector identified between the writing sample it is analyzing and a known AI-generated output. So, just as an LLM’s confirmation bias reduces the truthfulness of its outputs to keep users engaged, an AI detector’s confidence abstracts consistency in its outputs to keep users engaged, even when detection accuracy varies significantly.

How AI Detectors Work?

AI detectors identify AI-generated outputs in text by analyzing language perplexity (individual words) and burstiness (sentence structure) in writing samples.

Perplexity

Language perplexity refers to how legible or easily readable something is based on word usage. It is a measure of a writing sample’s predictability.

Writing samples that use advanced, metaphorical language with big, hard-to-understand words have high perplexity, making it difficult for the average person to understand and predict the author’s language patterns.

Human writing styles naturally involve high language perplexity due to creative choices, subjectivity in experiences or perspectives that shape how we write, and typos. LLMs aim for low perplexity when generating text, responding with predictable language and word choice that the average user can easily understand.

Burstiness

Burstiness refers to the randomness in a text’s sentence structures. For a sentence to be considered “bursty,” its structure will involve lots of variations, with changing contexts and word lengths. Human writing styles also possess inherent burstiness as we write with emotion, which can drastically increase sentence structure variation.

LLMs aim for low-burstiness in their outputs as they use tokens to predict the next piece of text, either a letter, punctuation mark, or word, in a conversational sequence. Inputs with high burstiness require a larger token allocation to process during inference. Breaking down bursty inputs with lots of variation is already tricky enough for models; they don’t want to be generating bursty outputs on top of that.

AI Detectors and False Positives

When determining whether text was written by a human or generated by a model, an AI detector looks for low perplexity in language patterns and low burstiness in sentence structure.

If a detector identifies a portion of text in a writing sample that exhibits low perplexity and low burstiness, it will confidently output results indicating that the portion was AI-generated, thereby perpetuating false-positive detection.

When an AI detector outputs a false positive, it means that it mistakenly identifies a portion of human-written text as AI-generated. This happens on-masse and leads to criticism of AI detectors as not being that accurate, which, let’s be honest, they aren’t, and even the best detectors still cannot identify AI-generated content beyond a 90% success rate.

That is impressive, surely, but the most advanced detectors become useless when simply paraphrased model outputs are used in a text or a model is prompted to output text with high perplexity and burstiness.

AI detectors analyze patterns in language, not authorship. So, they output probability scores indicating the likelihood that a writing sample is AI-generated, not a definitive verdict. Because detectors can’t output a direct yes or no answer to whether something was AI-generated or not, there will always exist a potential for detection results to surface a false positive.

Conclusion

AI detectors in 2026 leave a lot to be desired, and with current levels of AI adoption, this is problematic, as there will be more AI-generated content than ever before. AI detection and the ability to determine whether something is AI-generated or human-made will become a critical topic of discussion for most users.

Nonetheless, even if AI detection results become more accurate and the risk of false positives is significantly reduced, language remains fluid and constantly evolving.

Polished human writing can look like AI-generated content, and unedited AI outputs can resemble messy human writing, so no detector will ever achieve 100% accuracy in its detection results.