Architecture & Methods
This page covers the research foundation behind GEO Optimizer: the nine optimization methods from the Princeton KDD 2024 study, the scoring algorithm that powers geo_audit.py, and the AI bot ecosystem your website needs to support.
The 9 Princeton GEO Methods
The research paper "GEO: Generative Engine Optimization" (Princeton, KDD 2024) tested nine content optimization strategies across 10,000 real queries on Perplexity.ai. The methods are listed below in priority order, ranked by measured visibility improvement.
Priority 1: High-Impact Methods
1. Cite Sources (+30 to +115% visibility)
The single most effective technique. Linking to authoritative external sources inline dramatically increases the probability that AI systems will cite your content.
AI search engines treat source citations as credibility signals. When your content references verifiable external sources, the AI can cross-validate claims and is more likely to surface your page as a trusted source.
Implementation:
- Add inline links to authoritative sources (research papers, government data, industry reports)
- Cite primary sources rather than secondary aggregators
- Use descriptive anchor text that indicates what the source contains
<!-- Before -->
<p>Remote work increases productivity.</p>
<!-- After (GEO-optimized) -->
<p>Remote work increases productivity by 13%, according to a
<a href="https://stanford.edu/...">Stanford study published in the
Quarterly Journal of Economics</a>.</p>
2. Statistics (+40% average visibility)
Replace vague claims with specific numerical data. AI systems prefer content that includes verifiable metrics because they can extract and present precise answers.
Implementation:
- Replace qualitative statements with quantitative data
- Always include the source and recency of statistics (within 3 years preferred)
- Use specific numbers, not rounded approximations
Before: "Most companies use cloud computing."
After: "94% of enterprises use cloud services as of 2024 (Flexera State of the Cloud Report)."
3. Quotation Addition (+30 to +40% visibility)
Direct quotes from recognized experts, researchers, or official bodies signal verifiability and authority. This is especially effective for YMYL (Your Money, Your Life) topics such as health, finance, and legal content.
Implementation:
- Quote named experts with their credentials
- Use blockquote formatting for visual distinction
- Attribute the quote with name, title, and organization
Priority 2: Moderate-Impact Methods
4. Authoritative Tone (+6 to +12% visibility)
Write with expert-level confidence using structured exposition: definition, mechanism, then practical application. Remove hedging language ("might," "possibly," "often") and replace it with precise scope statements.
Before: "This might help improve your rankings."
After: "This technique improves AI citation rates by 30-40% for content that includes verifiable sources."
5. Fluency Optimization (+15 to +30% visibility)
Grammatically correct, well-structured prose improves extraction reliability. AI systems parse and extract content more accurately from clean, logical text.
Guidelines:
- Target 15--25 words per sentence
- Use logical connectives ("therefore," "as a result," "specifically")
- One idea per paragraph
- Lead with the conclusion, then support it
6. Easy-to-Understand (+8 to +15% visibility)
Simplify technical concepts through in-context definitions without sacrificing precision. Use a two-level approach: plain language explanation first, technical details second.
Before: "Implement HSTS with includeSubDomains and a max-age of 31536000."
After: "Enable HSTS to force encrypted connections across your entire site.
Technically, set the Strict-Transport-Security header with
includeSubDomains and max-age=31536000 (one year)."
7. Technical Terms (+5 to +10% for specialized queries)
Use industry-standard terminology with proper definitions. Spell out acronyms on first use. This helps AI systems match your content to specialized queries.
8. Unique Words (+5 to +8% visibility)
Use contextually appropriate synonyms instead of repeating identical terms. This increases semantic coverage and helps your content match a wider range of query phrasings.
Methods to Avoid
9. Keyword Stuffing (~0% -- Neutral or Negative)
The Princeton study explicitly tested keyword density manipulation. The result: no significant improvement, and in some cases a net negative effect on AI visibility. Keyword stuffing degrades readability, which harms both fluency and extraction quality.
Do not use keyword stuffing. It was a marginal SEO tactic for traditional search and it is counterproductive for AI search engines. Focus your effort on the high-impact methods above.
Methods Summary Table
| # | Method | Impact | Priority |
|---|---|---|---|
| 1 | Cite Sources | +30 to +115% | High |
| 2 | Statistics | +40% avg | High |
| 3 | Quotation Addition | +30 to +40% | High |
| 4 | Authoritative Tone | +6 to +12% | Moderate |
| 5 | Fluency Optimization | +15 to +30% | Moderate |
| 6 | Easy-to-Understand | +8 to +15% | Moderate |
| 7 | Technical Terms | +5 to +10% | Moderate |
| 8 | Unique Words | +5 to +8% | Moderate |
| 9 | Keyword Stuffing | ~0% | Avoid |
Scoring Algorithm
The geo_audit.py script evaluates websites across five weighted sections totaling 100 points. The algorithm checks infrastructure (can AI bots access your site?) and content quality (will AI systems find your content citation-worthy?).
Point Distribution
| Section | Points | Weight | Purpose |
|---|---|---|---|
| robots.txt | 20 | 20% | Can AI bots crawl your site? |
| llms.txt | 20 | 20% | Is your site structure machine-readable? |
| JSON-LD Schema | 25 | 25% | Does your site provide structured data? |
| Meta Tags | 20 | 20% | Are standard meta tags properly configured? |
| Content Quality | 15 | 15% | Does your content follow GEO best practices? |
Detailed Breakdown
robots.txt (20 points)
| Check | Points | Criteria |
|---|---|---|
| Citation bots allowed | 15 | OAI-SearchBot, ClaudeBot, PerplexityBot must be accessible |
| General bots config | 5 | Other AI and search bots properly configured |
The scoring prioritizes citation bots (OAI-SearchBot, ClaudeBot, PerplexityBot) over training bots (GPTBot, anthropic-ai, Google-Extended). Blocking training bots is a valid privacy choice; blocking citation bots makes you invisible to AI search.
llms.txt (20 points)
| Check | Points | Criteria |
|---|---|---|
| File presence | 10 | /llms.txt exists and returns 200 |
| H1 heading | 3 | File contains a top-level heading |
| Sections | 4 | Content organized into sections |
| Links | 3 | Contains links to site pages |
JSON-LD Schema (25 points)
| Check | Points | Criteria |
|---|---|---|
| WebSite schema | 10 | Valid WebSite JSON-LD on homepage |
| FAQPage schema | 10 | FAQPage schema detected (any page) |
| WebApplication schema | 5 | WebApplication schema for tools/utilities |
Meta Tags (20 points)
| Check | Points | Criteria |
|---|---|---|
| Title tag | 5 | Present and non-empty |
| Meta description | 8 | Present, 50--160 characters recommended |
| Canonical URL | 3 | <link rel="canonical"> set |
| Open Graph tags | 4 | og:title, og:description, og:image, og:url |
Content Quality (15 points)
| Check | Points | Criteria |
|---|---|---|
| H1 heading | 4 | Single, descriptive H1 on the page |
| Statistics/data | 6 | Numerical data, percentages, metrics present |
| External citation links | 5 | Links to authoritative external sources |
Score Bands
| Score | Band | Interpretation |
|---|---|---|
| 91--100 | Excellent | Fully optimized; actively citation-ready |
| 71--90 | Good | Strong foundation with minor gaps to close |
| 41--70 | Foundation | Key elements present but significant work required |
| 0--40 | Critical | Major infrastructure missing; AI bots likely cannot access or parse your site |
AI Bot Ecosystem
AI search engines use dedicated bots to crawl, index, and potentially cite web content. Understanding which bot does what is critical for proper robots.txt configuration.
Citation Bots (Must Allow)
These bots determine whether your content appears in AI-generated search answers. Blocking these makes your site invisible to AI search.
| Bot | Vendor | Purpose |
|---|---|---|
| OAI-SearchBot | OpenAI | ChatGPT Search index -- determines citation eligibility |
| ClaudeBot | Anthropic | Claude.ai real-time web citations |
| PerplexityBot | Perplexity | Perplexity AI citation index |
OAI-SearchBot is the bot that determines if ChatGPT cites you. GPTBot is the training data crawler. Blocking GPTBot does NOT prevent ChatGPT from citing you -- only blocking OAI-SearchBot does that.
Training Bots (Optional to Block)
These bots collect data for model training. Blocking them is a valid choice that does not affect your citation visibility.
| Bot | Vendor | Purpose |
|---|---|---|
| GPTBot | OpenAI | ChatGPT model training data |
| anthropic-ai | Anthropic | Claude model training |
| Google-Extended | Gemini training and AI Overviews | |
| Applebot-Extended | Apple | Apple Intelligence training data |
| CCBot | Common Crawl | Open dataset used by many AI models |
Additional AI and Search Bots
| Bot | Vendor | Type |
|---|---|---|
| ChatGPT-User | OpenAI | On-demand URL fetching by ChatGPT |
| claude-web | Anthropic | General web crawling |
| Googlebot | Traditional Google Search + AI-assisted results | |
| Bingbot | Microsoft | Bing Search and Microsoft Copilot index |
| Applebot | Apple | Siri and Spotlight Search |
| Bytespider | ByteDance | TikTok AI and recommendations |
| DuckAssistBot | DuckDuckGo | AI-powered answers |
| cohere-ai | Cohere | Language model training |
| AI2Bot | Allen Institute | Academic research indexing |
| FacebookBot | Meta | Link preview generation |
Recommended robots.txt Configuration
User-agent: *
Allow: /
# Citation bots - MUST allow for AI search visibility
User-agent: OAI-SearchBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
# Training bots - block if desired (does not affect citation)
User-agent: GPTBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Google-Extended
Disallow: /
Sitemap: https://example.com/sitemap.xml
Next: AI Context Setup -- detailed platform-by-platform setup guide with usage examples.