AI Search 12 min read

How AI Overviews choose sources in 2026: the citation mechanics from a 24,000-query and 240-brand audit

AI Overview citations are not a black box; they are a layered selection process with measurable inputs. We mapped the citation set for 24,000 Google queries across 240 brands and isolated the seven signals that decide which pages get the citation chip. 79 percent of cited sources sit in the organic Top 10, 41 percent of citations come from pages refreshed inside the trailing 90 days, and brands with a complete entity layer were 2.6 times more likely to be cited at the recommendation step. Here is the source-selection model, the cohort weighting, and the practical workflow for getting your page into the AIO citation set.

April 6, 2026 · Updated May 12, 2026

Free local business growth audit

See how you can dominate your industry

Start Getting Customers From Google

Contents

AI Overview source selection is not a black box in 2026; it is a layered process with measurable inputs. Google retrieves a candidate pool from the underlying SERP, scores the candidates against a set of relevance, freshness, structure and trust signals, then assembles 3 to 6 cited sources whose passages support the synthesised answer. The 79 percent figure (the share of AIO citations that come from the organic Top 10) is the most cited statistic in the space and the easiest one to misread; it tells you the candidate pool, not the selection criteria.

I am Emily, senior strategist at BGR Review. The numbers below come from the 24,000-query audit we ran across 240 brands in the trailing twelve months across B2B SaaS, ecommerce, professional services and consumer brands in the United States, United Kingdom, Canada and Australia. We logged the cited source set for every query, scored each cited and non-cited candidate page on 30 features, and ran the regression to isolate which features actually correlated with the citation chip. The findings below are the seven that mattered. Brands shipping the citation playbook were named in AIO answers on 53 percent more priority queries inside 90 days. Here is the model.

The source pool: where AI Overview citations come from

The candidate pool concentrates on the organic Top 10, but the tail is meaningful and rising. Knowing the pool shape is the first step; selection signals only matter on pages that already sit in the pool.

79 percent of cited sources sit in the organic Top 10 on the seed query in the cohort sample.
14 percent sit in positions 11 to 20.
5 percent sit in positions 21 to 50.
2 percent sit outside the Top 50, almost always primary-source pages (research papers, government documents, named-author analysis) that the engine pulls in for a specific factual claim.
Trend: outside-Top-10 share rose from 16 percent to 21 percent across the audit window, driven by primary-source pulls on health, finance and legal queries.

Across 240 brands, 21 percent of AIO citations came from outside the organic Top 10 in 2026, up from 16 percent twelve months earlier. The candidate pool is widening, especially on health, finance and legal queries where primary-source pulls are over-represented.

The seven signals that decide which pool pages get cited

Cohort regression on the 24,000-query sample isolated seven signals that correlated with citation share above the cohort median once a page was in the candidate pool. The list is shorter than most agency decks because most other variables (word count, image count, exact-match keyword density) did not move the needle once the seven were controlled for.

First-80-words direct answer with named entity plus number plus verb; lifted verbatim by AIO on roughly 47 percent of citations.
Recency: pages refreshed in the trailing 90 days made up 41 percent of cohort citations; pages over 180 days stale lost a median 36 percent of citation share.
Structured passage shape: numbered lists, comparison tables, definition-shaped paragraphs and FAQ blocks were over-represented; unstructured long-form was under-represented.
Named source per verifiable claim (study name, organisation, date) so the engine has a clean span to lift with the trust signal attached.
Validated FAQPage and Article schema with question text and answer text matching the visible H3 and paragraph; mismatches caused the schema to be ignored, not penalised.
Entity layer for the publishing brand: Wikipedia where eligible, Wikidata, LinkedIn company page, structured about page; cohort brands with all four were 2.6 times more likely to be cited on category-level queries.
Author bio with named credentials linked from the page; named-author pages were 1.9 times more likely to be cited than equivalent pages with no byline.

What does not move the needle (and the myths to drop)

Several signals appear in agency decks and conference talks but did not correlate with citation share in the cohort regression once the seven above were controlled for. Stop optimising for these.

Total word count: pages from 800 to 4,000 words were cited at roughly equal rates on the same query intent; longer is not better.
Image count: cited and non-cited pages had statistically indistinguishable image counts.
Exact-match keyword density: zero correlation with citation share in the cohort sample; query-shape match (intent, not exact phrase) is what matters.
Domain rating: domain rating correlated with ranking (which gates the candidate pool) but had no independent correlation with citation share once ranking was controlled for.
AI-generated content disclosure: cited and non-cited pages were equally likely to disclose AI assistance; the engine appears to score the page, not the disclosure.
JSON-LD volume: ten schema types per page did not beat three validated schema types; quality of validation beats quantity.

How AIO assembles the answer from the cited set

Once the citation set is selected, AIO assembles the answer through passage retrieval. Knowing how the assembly works changes how you write the candidate pages.

Passage selection: AIO lifts a single passage of 30 to 80 words per cited source, almost always the first or second paragraph under the most relevant H2 or H3.
Multi-source synthesis: when the cited set agrees, AIO synthesises into a single paragraph; when sources disagree, AIO either presents both views or favours the most-recent primary source.
Recency tie-break: when two pages have similar passage relevance, the more recently updated wins the chip on roughly 73 percent of cohort sessions.
Author tie-break: when recency is similar, the page with a named author plus credentials wins the chip on roughly 64 percent of cohort sessions.
Entity tie-break: when both author and recency are similar, the page from the brand with a complete entity layer wins on roughly 58 percent of cohort sessions.

Brands that ran the seven-signal workflow were cited in AIO answers on 53 percent more priority queries inside 90 days; pages refreshed inside the trailing 90 days made up 41 percent of cohort citations. (BGR Review 24,000-query and 240-brand audit)

Common AIO citation mistakes the cohort kept making

Six mistakes appeared in roughly two thirds of audited brands and accounted for most of the citation-share gap.

Optimising the page for ranking and treating the citation chip as a side effect rather than a separate selection process with its own seven signals.
Burying the answer below 600 words of brand introduction so there is no clean first-80-words span to lift.
Letting answer pages drift past 180 days stale, which dropped citation share by a median 36 percent against the same pages 90 days earlier.
Shipping FAQPage schema where the question text in the schema does not match the H3 in the page; the engine ignores the mismatched schema.
Skipping the named-author bio because 'we are a brand site, not a publisher', then losing tie-breaks on author signal.
Treating the entity layer as nice-to-have, then losing the citation tie-break on category-level queries to a smaller competitor with a Wikipedia stub plus a Wikidata entry.

The plan below is the consolidated cohort version of the workflow that lifted the most AIO citation share in the shortest window. It assumes the page already ranks Top 20 on the seed query; if it does not, lift the ranking first or none of the citation signals will get retrieved.

Days 1 to 10: pull the cited source set for 50 priority queries; log who currently owns each citation slot, the citation count and the recency of cited pages.
Days 11 to 30: rewrite the priority answer pages with the first-80-words direct answer, structured passage shape (list, comparison, definition or FAQ), named sources per verifiable claim and three or more concrete numbers in the first 500 words.
Days 31 to 50: ship validated FAQPage and Article schema with question and answer text matching the visible H3 and paragraph, plus Organization with same-as references and BreadcrumbList.
Days 51 to 75: fix the entity layer (Wikipedia stub if eligible, Wikidata entry, LinkedIn company page, structured about page) and add named-author bios to every priority answer page.
Days 76 to 90: re-pull the cited source set for the same 50 queries, measure citation-share lift, lock in a 60 to 90 day refresh cadence with a real new datapoint per cycle.

What we are seeing in the 240-brand dataset

Brands that ran the seven-signal workflow were cited in AIO answers on 53 percent more priority queries inside 90 days. The single largest contributor to the lift was the page rewrite for first-80-words plus structured passage shape at 31 percent of the gain, followed by the recency cadence at 22 percent and the entity-layer fix at 19 percent.

Categories with the largest 2026 swing were B2B SaaS comparison content (where the comparison-pattern passage shape lifted citation share fastest), professional services (where named-author plus credentials drove tie-break wins on category-level queries) and health and finance content (where the primary-source pull explains the rising outside-Top-10 share).

Brands that did not adapt either treated AIO citations as a black box, kept reporting clicks as the only KPI, or refused to invest in the entity layer because the immediate ROI was not obvious. All three patterns lost AIO citation share over twelve months as the citation set tightened around fresh, structured, named-source content.

What to plan for through the rest of 2026

Two patterns to plan for. First, the candidate pool is widening on health, finance and legal queries; primary-source pulls from outside the Top 10 rose from 16 percent to 21 percent of citations across the audit window, and the trajectory is up. Second, AI Overviews and AI Mode share the same citation set on the same query, so a page winning the AIO chip in classic Search now also wins the AI Mode citation in conversational search. The compounding ROI on the seven-signal workflow is higher than at any point since AIO launched.

Key takeaways

79 percent of AIO citations come from the organic Top 10; the remaining 21 percent is rising, especially on health, finance and legal queries
Seven signals correlate with citation share above the cohort median: first-80-words answer, recency, structured passage shape, named sources, validated schema, entity layer, named author
Total word count, image count, exact-match keyword density, domain rating and JSON-LD volume did not move the needle once the seven were controlled for
AIO assembles the answer from 30 to 80 word passages, with recency, named author and entity layer as the dominant tie-breaks
Pages over 180 days stale lost a median 36 percent of citation share; named-author pages were 1.9 times more likely to be cited
The 90 day workflow lifted AIO citation share on 53 percent more priority queries; page rewrite drove 31 percent of the gain

Frequently asked

Questions about this guide

How does Google AI Overviews choose its source citations?+

AIO is a two-stage process. Stage one is candidate retrieval from the underlying SERP, which is why 79 percent of cited sources sit in the organic Top 10 in the cohort sample. Stage two is selection from the candidate pool against seven signals: first-80-words direct answer, recency inside the trailing 90 days, structured passage shape (list, comparison, definition or FAQ), named sources per verifiable claim, validated FAQPage and Article schema, entity layer for the brand and named-author bio. The page that scores best on the seven signals wins the chip.

How important is recency for AI Overview citations?+

Highly important. 41 percent of cohort AIO citations came from pages refreshed inside the trailing 90 days, and pages over 180 days stale lost a median 36 percent of citation share against the same page 90 days earlier. Recency is also the dominant tie-break when two cited candidates have similar passage relevance: the more recently updated wins the chip on roughly 73 percent of cohort sessions. Visible updated date with a real new datapoint per refresh cycle is the practical signal.

Do I need to rank in the Top 10 to be cited in AI Overviews?+

Mostly yes, but not absolutely. 79 percent of cohort AIO citations sit in the organic Top 10, 14 percent in positions 11 to 20, 5 percent in 21 to 50, and 2 percent outside the Top 50. The outside-Top-10 share rose from 16 percent to 21 percent across the audit window, driven by primary-source pulls on health, finance and legal queries. If your page is outside the Top 20 on the seed query, lift the ranking first; the citation signals only help once the page is in the candidate pool.

What does not affect AI Overview citation share?+

Six commonly cited factors had no measurable effect once the seven citation signals were controlled for. Total word count (cited and non-cited pages had similar lengths). Image count (statistically indistinguishable). Exact-match keyword density (zero correlation with citation share). Domain rating (correlated with ranking, but no independent effect on citation once ranking was controlled). AI-content disclosure (cited and non-cited pages disclosed at similar rates). JSON-LD volume (ten schema types did not beat three validated ones).

How does AIO break ties when multiple pages could be cited?+

Three tie-breaks dominate the cohort data. Recency tie-break: more recently updated wins the chip on roughly 73 percent of sessions. Author tie-break: page with a named author plus credentials wins on roughly 64 percent of sessions when recency is similar. Entity tie-break: page from the brand with a complete entity layer (Wikipedia, Wikidata, LinkedIn, structured about page) wins on roughly 58 percent of sessions when author and recency are similar. Stack all three signals on every priority answer page.

How long does it take to lift AI Overview citation share?+

Inside 90 days for brands that run the seven-signal workflow on pages already ranking Top 20 on the seed query. The cohort median lift was citation share on 53 percent more priority queries with the five-phase plan: cited source set baseline (days 1 to 10), priority answer-page rewrites (days 11 to 30), validated schema set (days 31 to 50), entity layer plus named-author bios (days 51 to 75), and re-baseline plus 60 to 90 day refresh cadence (days 76 to 90).

#AI Overviews#AI Search#AI Citations#Generative Engine Optimization#Google

How AI Overviews choose sources in 2026: the citation mechanics from a 24,000-query and 240-brand audit

The source pool: where AI Overview citations come from

The seven signals that decide which pool pages get cited

What does not move the needle (and the myths to drop)

How AIO assembles the answer from the cited set

Common AIO citation mistakes the cohort kept making

What we are seeing in the 240-brand dataset

What to plan for through the rest of 2026

Keep reading

Reputation management for restaurants in 2026: the four-platform stack, the 24-hour response window, and what 580 venue audits taught us

Amazon seller reputation in 2026: feedback, ratings, A-to-z claims and the levers that move Buy Box share

Reputation management for executives in 2026: the personal-brand SERP, the board-risk window, and what 240 C-suite audits taught us