Skip to main content

Architecture & Methods

This page covers the research foundation behind GEO Optimizer: the nine optimization methods from the Princeton KDD 2024 study, the scoring algorithm that powers geo_audit.py, and the AI bot ecosystem your website needs to support.

The 9 Princeton GEO Methods

The research paper "GEO: Generative Engine Optimization" (Princeton, KDD 2024) tested nine content optimization strategies across 10,000 real queries on Perplexity.ai. The methods are listed below in priority order, ranked by measured visibility improvement.

Priority 1: High-Impact Methods


1. Cite Sources (+30 to +115% visibility)

The single most effective technique. Linking to authoritative external sources inline dramatically increases the probability that AI systems will cite your content.

AI search engines treat source citations as credibility signals. When your content references verifiable external sources, the AI can cross-validate claims and is more likely to surface your page as a trusted source.

Implementation:

  • Add inline links to authoritative sources (research papers, government data, industry reports)
  • Cite primary sources rather than secondary aggregators
  • Use descriptive anchor text that indicates what the source contains
<!-- Before -->
<p>Remote work increases productivity.</p>

<!-- After (GEO-optimized) -->
<p>Remote work increases productivity by 13%, according to a
<a href="https://stanford.edu/...">Stanford study published in the
Quarterly Journal of Economics</a>.</p>

2. Statistics (+40% average visibility)

Replace vague claims with specific numerical data. AI systems prefer content that includes verifiable metrics because they can extract and present precise answers.

Implementation:

  • Replace qualitative statements with quantitative data
  • Always include the source and recency of statistics (within 3 years preferred)
  • Use specific numbers, not rounded approximations
Before: "Most companies use cloud computing."
After: "94% of enterprises use cloud services as of 2024 (Flexera State of the Cloud Report)."

3. Quotation Addition (+30 to +40% visibility)

Direct quotes from recognized experts, researchers, or official bodies signal verifiability and authority. This is especially effective for YMYL (Your Money, Your Life) topics such as health, finance, and legal content.

Implementation:

  • Quote named experts with their credentials
  • Use blockquote formatting for visual distinction
  • Attribute the quote with name, title, and organization

Priority 2: Moderate-Impact Methods


4. Authoritative Tone (+6 to +12% visibility)

Write with expert-level confidence using structured exposition: definition, mechanism, then practical application. Remove hedging language ("might," "possibly," "often") and replace it with precise scope statements.

Before: "This might help improve your rankings."
After: "This technique improves AI citation rates by 30-40% for content that includes verifiable sources."

5. Fluency Optimization (+15 to +30% visibility)

Grammatically correct, well-structured prose improves extraction reliability. AI systems parse and extract content more accurately from clean, logical text.

Guidelines:

  • Target 15--25 words per sentence
  • Use logical connectives ("therefore," "as a result," "specifically")
  • One idea per paragraph
  • Lead with the conclusion, then support it

6. Easy-to-Understand (+8 to +15% visibility)

Simplify technical concepts through in-context definitions without sacrificing precision. Use a two-level approach: plain language explanation first, technical details second.

Before: "Implement HSTS with includeSubDomains and a max-age of 31536000."
After: "Enable HSTS to force encrypted connections across your entire site.
Technically, set the Strict-Transport-Security header with
includeSubDomains and max-age=31536000 (one year)."

7. Technical Terms (+5 to +10% for specialized queries)

Use industry-standard terminology with proper definitions. Spell out acronyms on first use. This helps AI systems match your content to specialized queries.


8. Unique Words (+5 to +8% visibility)

Use contextually appropriate synonyms instead of repeating identical terms. This increases semantic coverage and helps your content match a wider range of query phrasings.


Methods to Avoid


9. Keyword Stuffing (~0% -- Neutral or Negative)

The Princeton study explicitly tested keyword density manipulation. The result: no significant improvement, and in some cases a net negative effect on AI visibility. Keyword stuffing degrades readability, which harms both fluency and extraction quality.

warning

Do not use keyword stuffing. It was a marginal SEO tactic for traditional search and it is counterproductive for AI search engines. Focus your effort on the high-impact methods above.

Methods Summary Table

#MethodImpactPriority
1Cite Sources+30 to +115%High
2Statistics+40% avgHigh
3Quotation Addition+30 to +40%High
4Authoritative Tone+6 to +12%Moderate
5Fluency Optimization+15 to +30%Moderate
6Easy-to-Understand+8 to +15%Moderate
7Technical Terms+5 to +10%Moderate
8Unique Words+5 to +8%Moderate
9Keyword Stuffing~0%Avoid

Scoring Algorithm

The geo_audit.py script evaluates websites across five weighted sections totaling 100 points. The algorithm checks infrastructure (can AI bots access your site?) and content quality (will AI systems find your content citation-worthy?).

Point Distribution

SectionPointsWeightPurpose
robots.txt2020%Can AI bots crawl your site?
llms.txt2020%Is your site structure machine-readable?
JSON-LD Schema2525%Does your site provide structured data?
Meta Tags2020%Are standard meta tags properly configured?
Content Quality1515%Does your content follow GEO best practices?

Detailed Breakdown

robots.txt (20 points)

CheckPointsCriteria
Citation bots allowed15OAI-SearchBot, ClaudeBot, PerplexityBot must be accessible
General bots config5Other AI and search bots properly configured
Citation vs. Training Bots

The scoring prioritizes citation bots (OAI-SearchBot, ClaudeBot, PerplexityBot) over training bots (GPTBot, anthropic-ai, Google-Extended). Blocking training bots is a valid privacy choice; blocking citation bots makes you invisible to AI search.

llms.txt (20 points)

CheckPointsCriteria
File presence10/llms.txt exists and returns 200
H1 heading3File contains a top-level heading
Sections4Content organized into sections
Links3Contains links to site pages

JSON-LD Schema (25 points)

CheckPointsCriteria
WebSite schema10Valid WebSite JSON-LD on homepage
FAQPage schema10FAQPage schema detected (any page)
WebApplication schema5WebApplication schema for tools/utilities

Meta Tags (20 points)

CheckPointsCriteria
Title tag5Present and non-empty
Meta description8Present, 50--160 characters recommended
Canonical URL3<link rel="canonical"> set
Open Graph tags4og:title, og:description, og:image, og:url

Content Quality (15 points)

CheckPointsCriteria
H1 heading4Single, descriptive H1 on the page
Statistics/data6Numerical data, percentages, metrics present
External citation links5Links to authoritative external sources

Score Bands

ScoreBandInterpretation
91--100ExcellentFully optimized; actively citation-ready
71--90GoodStrong foundation with minor gaps to close
41--70FoundationKey elements present but significant work required
0--40CriticalMajor infrastructure missing; AI bots likely cannot access or parse your site

AI Bot Ecosystem

AI search engines use dedicated bots to crawl, index, and potentially cite web content. Understanding which bot does what is critical for proper robots.txt configuration.

Citation Bots (Must Allow)

These bots determine whether your content appears in AI-generated search answers. Blocking these makes your site invisible to AI search.

BotVendorPurpose
OAI-SearchBotOpenAIChatGPT Search index -- determines citation eligibility
ClaudeBotAnthropicClaude.ai real-time web citations
PerplexityBotPerplexityPerplexity AI citation index
Critical Distinction

OAI-SearchBot is the bot that determines if ChatGPT cites you. GPTBot is the training data crawler. Blocking GPTBot does NOT prevent ChatGPT from citing you -- only blocking OAI-SearchBot does that.

Training Bots (Optional to Block)

These bots collect data for model training. Blocking them is a valid choice that does not affect your citation visibility.

BotVendorPurpose
GPTBotOpenAIChatGPT model training data
anthropic-aiAnthropicClaude model training
Google-ExtendedGoogleGemini training and AI Overviews
Applebot-ExtendedAppleApple Intelligence training data
CCBotCommon CrawlOpen dataset used by many AI models

Additional AI and Search Bots

BotVendorType
ChatGPT-UserOpenAIOn-demand URL fetching by ChatGPT
claude-webAnthropicGeneral web crawling
GooglebotGoogleTraditional Google Search + AI-assisted results
BingbotMicrosoftBing Search and Microsoft Copilot index
ApplebotAppleSiri and Spotlight Search
BytespiderByteDanceTikTok AI and recommendations
DuckAssistBotDuckDuckGoAI-powered answers
cohere-aiCohereLanguage model training
AI2BotAllen InstituteAcademic research indexing
FacebookBotMetaLink preview generation
User-agent: *
Allow: /

# Citation bots - MUST allow for AI search visibility
User-agent: OAI-SearchBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

# Training bots - block if desired (does not affect citation)
User-agent: GPTBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

Sitemap: https://example.com/sitemap.xml

Next: AI Context Setup -- detailed platform-by-platform setup guide with usage examples.