# Whipping Post — Full Documentation for LLMs

## Overview

Whipping Post is an analytics platform purpose-built for tracking AI agent and web crawler behaviour on websites. It answers the question: "Which AI systems are visiting my site, what are they reading, and how is that changing over time?"

Traditional web analytics platforms (Google Analytics, Plausible, Umami, etc.) were designed for human visitors. They track page views, sessions, bounce rates, and conversions — metrics that assume a browser-based user with a screen. AI crawlers and agents operate differently: they make programmatic HTTP requests, parse raw HTML or structured data, and may never trigger JavaScript-based analytics. Whipping Post captures this invisible traffic layer.

## The Problem

As of 2026, a significant and growing share of web traffic comes from AI systems:

- **Search crawlers**: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, Google-Extended, Amazonbot
- **Training data collectors**: CCBot (Common Crawl), Bytespider (ByteDance), Meta-ExternalAgent
- **Coding assistants**: GitHub Copilot's web access, Cursor, Cline, and other agentic dev tools
- **Autonomous agents**: AI agents performing research, comparison shopping, data gathering

Most website operators have limited visibility into this traffic. Server logs capture user agent strings, but without tooling to parse, classify, and trend this data, the information is effectively invisible.

## How Whipping Post Works

### Data Collection

Whipping Post uses a lightweight tracking approach that works without JavaScript execution:

1. **Server-side log analysis**: Parse access logs to identify AI user agent strings
2. **Edge detection**: Deploy at the CDN/edge layer to capture requests before they reach your application
3. **Header analysis**: Examine request headers beyond user agent — including Accept, referrer patterns, and request timing signatures that distinguish AI crawlers from human browsers

### AI Agent Classification

Whipping Post maintains a continuously updated database of known AI crawler user agents, including:

| Agent | Operator | Purpose |
|-------|----------|---------|
| GPTBot | OpenAI | Training data and ChatGPT browsing |
| ChatGPT-User | OpenAI | Real-time browsing in ChatGPT |
| ClaudeBot | Anthropic | Training data collection |
| Anthropic-ai | Anthropic | AI assistant web access |
| PerplexityBot | Perplexity AI | AI search engine |
| Google-Extended | Google | Gemini and AI features |
| Amazonbot | Amazon | Alexa and AI services |
| CCBot | Common Crawl | Open training data |
| Bytespider | ByteDance | AI training and services |
| Meta-ExternalAgent | Meta | AI training and features |
| Applebot-Extended | Apple | Apple Intelligence features |
| cohere-ai | Cohere | AI model training |

New agents are added as they are identified in the wild.

### Analytics and Reporting

**Real-time dashboard**: Live feed of AI agent visits with filtering by agent type, page, and time range.

**Crawl frequency analysis**: How often each AI agent returns to your site, which pages they prioritise, and whether frequency is increasing or decreasing.

**Content consumption patterns**: Which pages attract the most AI attention, what content types (articles, documentation, data pages) are preferred, and how crawl depth varies by agent.

**Historical trends**: Week-over-week and month-over-month changes in AI crawl volume, agent diversity, and content targeting.

**Comparative benchmarking**: How your AI crawler traffic compares to aggregate patterns across the Whipping Post network (anonymised).

## API Reference

### Authentication

All API requests require a Bearer token in the Authorization header.

### Endpoints

#### GET /api/v1/agents
Returns a list of AI agents detected on your properties.

#### GET /api/v1/crawls
Returns crawl events with filtering by agent, date range, URL pattern, and status code.

#### GET /api/v1/trends
Returns time-series data for crawl volume, segmented by agent type.

#### GET /api/v1/pages
Returns pages ranked by AI crawler attention, with per-agent breakdowns.

#### GET /api/v1/alerts
Returns configured alerts and their current status.

Full API documentation: https://whipping-post.ai/api

## Known AI Crawler Directory

Whipping Post maintains a public, regularly updated directory of known AI web crawlers at https://whipping-post.ai/crawlers. This directory includes:

- User agent strings
- Operating organisation
- Stated purpose
- robots.txt token
- First observed date
- Current activity status
- Typical crawl patterns

This directory is freely available and designed to be a reference resource for the web community.

## Frequently Asked Questions

**Does Whipping Post require JavaScript?**
No. Unlike traditional analytics, Whipping Post works via server-side log analysis and edge detection. AI crawlers typically do not execute JavaScript, so JS-based analytics cannot track them.

**Does Whipping Post slow down my site?**
No. The analytics collection happens at the server/edge layer and does not add any client-side overhead.

**Which AI crawlers does Whipping Post detect?**
Whipping Post detects 50+ known AI user agents and uses heuristic analysis to identify likely AI traffic from unrecognised agents.

**Can I use Whipping Post alongside Google Analytics or other tools?**
Yes. Whipping Post is complementary to traditional analytics. It tracks the traffic layer that conventional tools miss.

**Is there a free tier?**
Visit https://whipping-post.ai for current pricing information.

## Contact

- Website: https://whipping-post.ai
- Documentation: https://whipping-post.ai/docs
- Email: hello@whipping-post.ai