Architecture & Methods

This page covers the research foundation behind GEO Optimizer: the nine optimization methods from the Princeton KDD 2024 study, the scoring algorithm that powers geo_audit.py, and the AI bot ecosystem your website needs to support.

The 9 Princeton GEO Methods

The research paper "GEO: Generative Engine Optimization" (Princeton, KDD 2024) tested nine content optimization strategies across 10,000 real queries on Perplexity.ai. The methods are listed below in priority order, ranked by measured visibility improvement.

Priority 1: High-Impact Methods

1. Cite Sources (+30 to +115% visibility)

The single most effective technique. Linking to authoritative external sources inline dramatically increases the probability that AI systems will cite your content.

AI search engines treat source citations as credibility signals. When your content references verifiable external sources, the AI can cross-validate claims and is more likely to surface your page as a trusted source.

Implementation:

Add inline links to authoritative sources (research papers, government data, industry reports)
Cite primary sources rather than secondary aggregators
Use descriptive anchor text that indicates what the source contains

<!-- Before -->
<p>Remote work increases productivity.</p>

<!-- After (GEO-optimized) -->
<p>Remote work increases productivity by 13%, according to a
<a href="https://stanford.edu/...">Stanford study published in the
Quarterly Journal of Economics</a>.</p>

2. Statistics (+40% average visibility)

Replace vague claims with specific numerical data. AI systems prefer content that includes verifiable metrics because they can extract and present precise answers.

Implementation:

Replace qualitative statements with quantitative data
Always include the source and recency of statistics (within 3 years preferred)
Use specific numbers, not rounded approximations

Before: "Most companies use cloud computing."
After:  "94% of enterprises use cloud services as of 2024 (Flexera State of the Cloud Report)."

3. Quotation Addition (+30 to +40% visibility)

Direct quotes from recognized experts, researchers, or official bodies signal verifiability and authority. This is especially effective for YMYL (Your Money, Your Life) topics such as health, finance, and legal content.

Implementation:

Quote named experts with their credentials
Use blockquote formatting for visual distinction
Attribute the quote with name, title, and organization

Priority 2: Moderate-Impact Methods

4. Authoritative Tone (+6 to +12% visibility)

Write with expert-level confidence using structured exposition: definition, mechanism, then practical application. Remove hedging language ("might," "possibly," "often") and replace it with precise scope statements.

Before: "This might help improve your rankings."
After:  "This technique improves AI citation rates by 30-40% for content that includes verifiable sources."

5. Fluency Optimization (+15 to +30% visibility)

Grammatically correct, well-structured prose improves extraction reliability. AI systems parse and extract content more accurately from clean, logical text.

Guidelines:

Target 15--25 words per sentence
Use logical connectives ("therefore," "as a result," "specifically")
One idea per paragraph
Lead with the conclusion, then support it

6. Easy-to-Understand (+8 to +15% visibility)

Simplify technical concepts through in-context definitions without sacrificing precision. Use a two-level approach: plain language explanation first, technical details second.

Before: "Implement HSTS with includeSubDomains and a max-age of 31536000."
After:  "Enable HSTS to force encrypted connections across your entire site.
         Technically, set the Strict-Transport-Security header with
         includeSubDomains and max-age=31536000 (one year)."

7. Technical Terms (+5 to +10% for specialized queries)

Use industry-standard terminology with proper definitions. Spell out acronyms on first use. This helps AI systems match your content to specialized queries.

8. Unique Words (+5 to +8% visibility)

Use contextually appropriate synonyms instead of repeating identical terms. This increases semantic coverage and helps your content match a wider range of query phrasings.

Methods to Avoid

9. Keyword Stuffing (~0% -- Neutral or Negative)

The Princeton study explicitly tested keyword density manipulation. The result: no significant improvement, and in some cases a net negative effect on AI visibility. Keyword stuffing degrades readability, which harms both fluency and extraction quality.

warning

Do not use keyword stuffing. It was a marginal SEO tactic for traditional search and it is counterproductive for AI search engines. Focus your effort on the high-impact methods above.

Methods Summary Table

#	Method	Impact	Priority
1	Cite Sources	+30 to +115%	High
2	Statistics	+40% avg	High
3	Quotation Addition	+30 to +40%	High
4	Authoritative Tone	+6 to +12%	Moderate
5	Fluency Optimization	+15 to +30%	Moderate
6	Easy-to-Understand	+8 to +15%	Moderate
7	Technical Terms	+5 to +10%	Moderate
8	Unique Words	+5 to +8%	Moderate
9	Keyword Stuffing	~0%	Avoid

Scoring Algorithm

The geo_audit.py script evaluates websites across five weighted sections totaling 100 points. The algorithm checks infrastructure (can AI bots access your site?) and content quality (will AI systems find your content citation-worthy?).

Point Distribution

Section	Points	Weight	Purpose
robots.txt	20	20%	Can AI bots crawl your site?
llms.txt	20	20%	Is your site structure machine-readable?
JSON-LD Schema	25	25%	Does your site provide structured data?
Meta Tags	20	20%	Are standard meta tags properly configured?
Content Quality	15	15%	Does your content follow GEO best practices?

Detailed Breakdown

robots.txt (20 points)

Check	Points	Criteria
Citation bots allowed	15	OAI-SearchBot, ClaudeBot, PerplexityBot must be accessible
General bots config	5	Other AI and search bots properly configured

Citation vs. Training Bots

The scoring prioritizes citation bots (OAI-SearchBot, ClaudeBot, PerplexityBot) over training bots (GPTBot, anthropic-ai, Google-Extended). Blocking training bots is a valid privacy choice; blocking citation bots makes you invisible to AI search.

llms.txt (20 points)

Check	Points	Criteria
File presence	10	`/llms.txt` exists and returns 200
H1 heading	3	File contains a top-level heading
Sections	4	Content organized into sections
Links	3	Contains links to site pages

JSON-LD Schema (25 points)

Check	Points	Criteria
WebSite schema	10	Valid WebSite JSON-LD on homepage
FAQPage schema	10	FAQPage schema detected (any page)
WebApplication schema	5	WebApplication schema for tools/utilities

Meta Tags (20 points)

Check	Points	Criteria
Title tag	5	Present and non-empty
Meta description	8	Present, 50--160 characters recommended
Canonical URL	3	`<link rel="canonical">` set
Open Graph tags	4	og:title, og:description, og:image, og:url

Content Quality (15 points)

Check	Points	Criteria
H1 heading	4	Single, descriptive H1 on the page
Statistics/data	6	Numerical data, percentages, metrics present
External citation links	5	Links to authoritative external sources

Score Bands

Score	Band	Interpretation
91--100	Excellent	Fully optimized; actively citation-ready
71--90	Good	Strong foundation with minor gaps to close
41--70	Foundation	Key elements present but significant work required
0--40	Critical	Major infrastructure missing; AI bots likely cannot access or parse your site

AI Bot Ecosystem

AI search engines use dedicated bots to crawl, index, and potentially cite web content. Understanding which bot does what is critical for proper robots.txt configuration.

Citation Bots (Must Allow)

These bots determine whether your content appears in AI-generated search answers. Blocking these makes your site invisible to AI search.

Bot	Vendor	Purpose
OAI-SearchBot	OpenAI	ChatGPT Search index -- determines citation eligibility
ClaudeBot	Anthropic	Claude.ai real-time web citations
PerplexityBot	Perplexity	Perplexity AI citation index

Critical Distinction

OAI-SearchBot is the bot that determines if ChatGPT cites you. GPTBot is the training data crawler. Blocking GPTBot does NOT prevent ChatGPT from citing you -- only blocking OAI-SearchBot does that.

Training Bots (Optional to Block)

These bots collect data for model training. Blocking them is a valid choice that does not affect your citation visibility.

Bot	Vendor	Purpose
GPTBot	OpenAI	ChatGPT model training data
anthropic-ai	Anthropic	Claude model training
Google-Extended	Google	Gemini training and AI Overviews
Applebot-Extended	Apple	Apple Intelligence training data
CCBot	Common Crawl	Open dataset used by many AI models

Additional AI and Search Bots

Bot	Vendor	Type
ChatGPT-User	OpenAI	On-demand URL fetching by ChatGPT
claude-web	Anthropic	General web crawling
Googlebot	Google	Traditional Google Search + AI-assisted results
Bingbot	Microsoft	Bing Search and Microsoft Copilot index
Applebot	Apple	Siri and Spotlight Search
Bytespider	ByteDance	TikTok AI and recommendations
DuckAssistBot	DuckDuckGo	AI-powered answers
cohere-ai	Cohere	Language model training
AI2Bot	Allen Institute	Academic research indexing
FacebookBot	Meta	Link preview generation

Recommended robots.txt Configuration

User-agent: *
Allow: /

# Citation bots - MUST allow for AI search visibility
User-agent: OAI-SearchBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

# Training bots - block if desired (does not affect citation)
User-agent: GPTBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

Sitemap: https://example.com/sitemap.xml

Next: AI Context Setup -- detailed platform-by-platform setup guide with usage examples.

The 9 Princeton GEO Methods​

Priority 1: High-Impact Methods​

1. Cite Sources (+30 to +115% visibility)​

2. Statistics (+40% average visibility)​

3. Quotation Addition (+30 to +40% visibility)​

Priority 2: Moderate-Impact Methods​

4. Authoritative Tone (+6 to +12% visibility)​

5. Fluency Optimization (+15 to +30% visibility)​

6. Easy-to-Understand (+8 to +15% visibility)​

7. Technical Terms (+5 to +10% for specialized queries)​

8. Unique Words (+5 to +8% visibility)​

Methods to Avoid​

9. Keyword Stuffing (~0% -- Neutral or Negative)​

Methods Summary Table​

Scoring Algorithm​

Point Distribution​

Detailed Breakdown​

robots.txt (20 points)​

llms.txt (20 points)​

JSON-LD Schema (25 points)​

Meta Tags (20 points)​

Content Quality (15 points)​

Score Bands​

AI Bot Ecosystem​

Citation Bots (Must Allow)​

Training Bots (Optional to Block)​

Additional AI and Search Bots​

Recommended robots.txt Configuration​

The 9 Princeton GEO Methods

Priority 1: High-Impact Methods

1. Cite Sources (+30 to +115% visibility)

2. Statistics (+40% average visibility)

3. Quotation Addition (+30 to +40% visibility)

Priority 2: Moderate-Impact Methods

4. Authoritative Tone (+6 to +12% visibility)

5. Fluency Optimization (+15 to +30% visibility)

6. Easy-to-Understand (+8 to +15% visibility)

7. Technical Terms (+5 to +10% for specialized queries)

8. Unique Words (+5 to +8% visibility)

Methods to Avoid

9. Keyword Stuffing (~0% -- Neutral or Negative)

Methods Summary Table

Scoring Algorithm

Point Distribution

Detailed Breakdown

robots.txt (20 points)

llms.txt (20 points)

JSON-LD Schema (25 points)

Meta Tags (20 points)

Content Quality (15 points)

Score Bands

AI Bot Ecosystem

Citation Bots (Must Allow)

Training Bots (Optional to Block)

Additional AI and Search Bots

Recommended robots.txt Configuration