What is Perplexity?
Perplexity measures how well a language model predicts the next word in a sequence. It quantifies the model's "surprise" when encountering new data — lower surprise indicates better prediction accuracy.
Mathematical Definition:
PPL(X) = exp(-1/N × Σ log P(xi|x<i))
Where lower values indicate better prediction
Perplexity Comparison
Sample Texts
Interactive Perplexity Calculator
What is Burstiness?
Burstiness is a measure of how much writing patterns and text perplexities vary over the entire document. As humans, we have a tendency to vary our writing patterns, while language models write with a very consistent level of AI-likeness.
Key Characteristics:
- High burstiness: Variable sentence lengths and structures
- Low burstiness: Consistent, uniform patterns
- Measures intermittent increases and decreases in activity
Burstiness Comparison
Higher values indicate more variation in writing patterns
Burstiness Scores
Sentence-by-Sentence Analysis
| Sentence | Length | Perplexity | Complexity |
|---|
Perplexity vs Burstiness
Understanding how these metrics work together to distinguish human and AI-generated text.
Human vs AI Characteristics
Human Writing
- Higher perplexity (more surprising word choices)
- Higher burstiness (varied sentence structures)
- Natural inconsistencies and creativity
- Emotional and contextual variations
AI Writing
- Lower perplexity (predictable patterns)
- Lower burstiness (consistent structure)
- Formulaic word selection
- Uniform sentence construction
Real-World Applications
AI Detection
GPTZero and other tools use these metrics to identify AI-generated content
Language Model Evaluation
Perplexity is a key metric for evaluating language model performance
Content Quality Assessment
Writers use these concepts to improve engagement and naturalness
Limitations and Considerations
Perplexity Limitations
- May not capture broad contextual understanding
- Challenges in capturing ambiguity and creativity
- Vocabulary size affects performance
- Can flag human text as AI-generated
Burstiness Limitations
- Genre-dependent patterns can skew results
- Cultural and linguistic variations
- False positives with non-native speakers
- Context-dependent meaning