Analysis — How to Turn Test Data Into Actionable Findings for an AI Visibility Audit

Key Takeaways

  • A single AI visibility score can hide the real problem; separate buckets reveal where visibility is actually breaking down.
  • Brand-led visibility is not the same as discovery; the harder question is whether AI mentions you when buyers do not name you.
  • Showing up is only half the story; the way AI frames your brand can still limit how buyers perceive you.
  • Brand absence matters most when you know who appears instead, which personas they win, and in which question contexts.
  • The end goal of analysis is not a prettier report — it is a prioritized build list your team can act on.

Introduction

By the time the test step finishes, you're holding something that looks like an answer: a stack of structured conversation records, a brand that appeared in a clear majority of them, and a handful of competitor names that kept showing up when it didn't. The instinct is to call that the finding. Pull the overall appearance rate, round it to a clean number, put it in a slide, move on.

That instinct produces a statistic, not an analysis. A single blended appearance rate collapses dozens of genuinely different situations into one number — a brand that's invisible to buyers who haven't reached for a category name yet but recognized instantly the moment a competitor enters the frame will land on the exact same percentage as a brand with the opposite problem. The number is real. It just doesn't tell you what to do next, which is the only reason this step exists.

This post covers how to take the structured data from the test step and turn it into something a team can actually act on: a small set of analysis buckets, each answering a different question about your visibility, and a build list that comes out the other side. It continues the previous posts on brand discovery, persona design, intent mapping, question development, and testing.


One Number Hides More Than It Reveals

The raw output of the test step is a pile of conversations, each tagged with a persona, an intent, a question context, and a provider. Read top to bottom, that pile tells you almost nothing — there's too much of it, and no two conversations sit at quite the same point in a buyer's journey. Averaged into a single appearance rate, it tells you something, but the wrong thing: it answers "how often did the brand show up," when the questions that actually drive a decision are narrower and more specific than that.

The fix isn't a bigger number. It's a small number of separate buckets, each built to answer one specific question, computed the same way every time so they're comparable across personas, providers, and future runs. A bucket isn't a metric — it's a lens. Visibility asks whether you showed up. Framing asks what got said about you once you did. Displacement asks who showed up instead, when you didn't. Citation asks whose content is actually backing the answer. None of these collapse into each other, and none of them is optional if the goal is a finding you can act on rather than a number you can report.

Why it matters: A team that skips straight to "we appeared in 60% of conversations" has a statistic. A team that can say "we appear consistently once a competitor is already in the conversation, but rarely before that, and when we do appear we're framed as the cheaper alternative rather than the stronger choice" has a diagnosis. Only one of those produces a next action.


The Visibility Bucket: Did You Show Up, and Where

This is the foundational bucket, but it only earns that role if it's broken apart correctly. Don't compute one appearance rate — compute it separately for each of the question contexts your test set covered: the unbranded conversations where no category or vendor language was present, the category-led conversations where the buyer named the space but not a vendor, the competitor-led conversations where a rival was already in frame, and the brand-led conversations where your own name was already part of the question.

These aren't variations on the same number. They're answering different questions. Brand-led appearance tells you almost nothing about discoverability — of course the brand shows up when it's named directly; that bucket is closer to a validation check than a visibility finding. Unbranded appearance is the hardest and most important number in the set, because it's the only one measuring whether content authority alone, with no vendor frame to lean on, is enough to get the brand into the conversation.

Within each context, slice again by persona and by provider before you trust the number. A persona-blended or provider-blended rate inside any one context bucket can still hide a real gap — one persona carrying the average for the rest, or one provider doing all the discovering while the others stay quiet.

Why it matters: A brand that appears in 60% of all tested conversations but in only a third of unbranded ones doesn't have a 60% visibility problem — it has close to a two-thirds unbranded visibility problem, wearing a healthier-looking number as camouflage. Separating discovery from validation is the difference between knowing you're winning a conversation and knowing you're just polite when invited into one you didn't earn a seat in.


The Framing Bucket: What Gets Said When You Do Show Up

Appearing isn't the same as appearing well. For every conversation where the brand showed up, this bucket asks what the AI actually said about it. Capture the dominant tone across those appearances — favorable, mixed, or unfavorable — and don't let a high appearance rate stand in for that judgment; it's a separate question with a separate answer.

Capture the positioning language that recurs across multiple conversations and multiple providers, since language that shows up once is a phrasing quirk and language that shows up repeatedly is a frame the AI has actually settled on. Capture the caveats and limitations that get attached to the brand on repeat — the soft qualifiers that follow an otherwise positive mention. And capture whether the brand is genuinely the subject of the answer, or just one name in a list with nothing distinguishing it from the others next to it.

Why it matters: A brand can appear in most of its tested conversations and still be quietly capped by a frame that more appearances alone won't fix. "Often appears, consistently framed as the cheaper or simpler alternative to the real choice" is a finding the appearance rate will never surface on its own — and it's the kind of finding that tells you the next move is a positioning fix, not a content-volume fix.


The Displacement Bucket: Who Shows Up When You Don't

Every conversation where the brand was absent is also a conversation where the AI answered with something. This bucket asks what that something was. For each brand-absent conversation, capture which competitor — or, just as often, which non-competitor option, a generic suggestion, a different category of tool entirely — appeared in its place, how frequently, and which personas and question types that displacement concentrates in.

Look for whether one entity dominates the displacement pattern everywhere, or whether different competitors are winning different rooms — a leadership-facing persona losing to one name and an operations-facing persona losing to a different one is a meaningfully different problem than a single competitor sweeping every persona.

Why it matters: This is the bucket that tells you exactly who's winning the conversations you're losing, not just that you're losing them. A displacement pattern concentrated in one competitor across every persona points toward a single comparison fix; a pattern that splits by persona points toward several narrower ones. Without this bucket, "we didn't appear" is the end of the finding instead of the beginning of one.


The Citation Bucket: Whose Content Is Doing the Work

Citations are a different question from appearances, and worth a bucket of their own. Across every conversation, capture which domains the AI is actually drawing on — split into the brand's own properties, competitor-owned properties, and third-party sources such as review sites, community forums, documentation hubs, and comparison aggregators. Then compute what share of all citations the brand's own domains hold, broken out by context: unbranded, competitor-led, and brand-led conversations will rarely look alike here.

Why it matters: This bucket tells you whether an appearance, when it happens, rests on the brand's own authority or someone else's. A brand with a strong appearance rate but a citation mix dominated by competitor-owned and third-party domains has a more fragile position than the appearance number suggests — it's showing up in the conversation, but the AI isn't treating the brand's own content as the reason why. That's a different problem from not appearing at all, and it points toward a different fix: not more content, but content positioned to actually get cited.


The Provider Bucket: Does the Pattern Hold Everywhere

Every bucket above needs to be run again, sliced by provider, before any of it gets treated as a brand-wide finding. Visibility, framing, displacement, and citation behavior can all diverge meaningfully between providers — a brand can be the default answer on one and an afterthought on another, for the same question, asked the same way.

Why it matters: A finding that holds on only one provider is a provider-specific finding, not a brand-wide one, and treating it as the latter leads to fixes aimed at the wrong target. A content gap that's actually a Perplexity-specific citation pattern won't be solved by general-purpose content work, and a team that doesn't separate this out will ship a fix and then be confused when the next run shows only partial movement.


The Trend Bucket: What Changed Since Last Time

This bucket doesn't exist on a first run — it only activates once you've run the test more than once, on the same scenarios, at the cadence the previous post in this series laid out. From the second cycle on, capture every bucket above not just as a current snapshot but as a change from the prior cycle: visibility up or down, framing language shifting or holding steady, displacement easing or worsening, citation share gaining or losing ground.

Be honest about what counts as movement. AI responses vary enough from run to run that small shifts in any one number can be noise rather than signal. A change worth acting on is one that shows up consistently across the persona, provider, and context slices it should affect — and ideally one that lines up with something you actually shipped between cycles, not a shift with no corresponding action behind it.

Why it matters: A snapshot tells you where you stand. A trend tells you whether what you built actually worked. Without this bucket, every re-run is just another sample of one, dressed up as a tracking system — which is precisely the problem this whole methodology exists to solve.


From Buckets to a Build List

Each bucket above produces findings. None of those findings are useful until they're converted into actions, and the conversion has to be explicit — not "we should fix our Zendesk comparison page" scrawled in a meeting, but a written record of the problem, why it matters, what's currently missing, the specific steps to take, and what success would look like if the action lands.

Write that record for every finding worth acting on, then prioritize across the list rather than working it in the order the buckets happened to produce it. Weight a pattern that repeats across most personas and most providers above a pattern confined to one narrow slice — the former is a structural gap, the latter might just be a quirk of one audience or one model. Weight a finding that maps cleanly onto something buildable — a missing content page, an outdated comparison, an unclaimed citation opportunity — above a finding that's technically true but not actionable, like a caveat traceable to something outside your control.

Why it matters: Analysis that doesn't end in a list of things to build is a report, not an audit. The five steps before this one exist so that this step can produce something with deadlines attached to it — not a more sophisticated description of a problem the team already suspected it had.


What This Looks Like in Practice

Below is a condensed analysis summary from a real run on Freshdesk, organized by the buckets above rather than as a raw export. The underlying conversation data and computation are internal to how the audit is run; what's shown here is the structure a team would actually work from.

FRESHDESK — ANALYSIS SUMMARY, ONE TEST CYCLE (152 conversations, 4 providers)

VISIBILITY BUCKET
Overall appearance: 60%
— Unbranded (no category, no vendor language): 34%
— Category-led: 53%
— Competitor-led: 75%
— Brand-led (validation, not discovery): 100%, cited in 95% of those
Weakest persona: Support Leader / Analyst — 30% organic appearance
Strongest persona: Support Agent — 60% organic appearance
Provider range on organic appearance: roughly 2x between the low and high end

FRAMING BUCKET
Dominant narrative: an easy-to-adopt, value-oriented omnichannel platform
with practical AI, in a lower-complexity package
Recurring caveat, attached to most appearances: positioned as the
*simpler alternative* to a heavier incumbent rather than the stronger
choice on its own terms
Weakest framing: leadership- and analytics-facing conversations, where the
brand reads as "good enough" rather than category-leading

DISPLACEMENT BUCKET
Top displacer: one legacy competitor, present in roughly half of all
brand-absent conversations, across every persona and most question types
Two secondary displacers, each concentrated in a specific persona —
one in operations-focused conversations, one in analytics-focused ones

CITATION BUCKET
Brand-owned citation share: 9% of all citations
— 55% of citations in brand-led conversations
— 3% of citations in both unbranded and competitor-led conversations
Most-cited non-brand domains: the top displacing competitor's own site,
plus a recurring set of community and comparison-aggregator domains that
show up across nearly every persona

PROVIDER BUCKET
Organic appearance varies by more than 2x across the providers tested
One provider over-indexes specifically on competitor-led conversations,
surfacing the brand there well above the rate the others do

TREND BUCKET (vs. previous cycle)
Overall appearance: essentially flat, slightly down
Organic appearance: up slightly
Displacement rate: down — fewer brand-absent conversations show a
competitor stepping directly into the gap
Brand-owned citation share: up meaningfully — the clearest improvement
in this cycle

→ TOP 3 BUILD LIST ITEMS
1. Content gap — build an explainer hub for the intent-and-persona
   combination with the weakest organic appearance, where competitors
   are currently the ones teaching the category.
2. Competitive framing — rebuild comparison pages against the top two
   displacing competitors around decision criteria the brand can
   actually win, instead of resting on "simpler."
3. Persona narrative — build persona-specific proof blocks so the
   strongest persona's narrative reinforces the weakest one instead of
   leaving it isolated.

The pattern across buckets tells a coherent story here, which is the point: visibility is uneven by context and persona, the framing has a ceiling even where appearance is strong, one competitor is doing most of the displacing, and citations show the brand borrowing more authority than it owns when discovery is organic. None of that comes through in a single appearance-rate headline.


What You Have at the End of This Step

A completed analysis step gives you two things that make the rest of the audit worth having run.

A bucketed set of findings — visibility, framing, displacement, citation, and provider-level results, each computed the same way every cycle, so a blended number stops standing in for the real picture and the actual gaps become visible on their own terms.

A prioritized build list — every finding worth acting on, mapped to a specific action with the reasoning behind it and what to expect if it lands, ranked by how widely the pattern repeats and how directly it maps to something the team can actually build.

This also answers the cadence question from the testing step. The right time to run the test again isn't a fixed date on a calendar — it's once enough of this build list has shipped that there's something real to measure. Re-running before that just re-tests work that hasn't had a chance to change anything yet. The build list this step produces is what should set the date for the next one, not the other way around.

If you'd rather see what your brand's analysis and build list look like before doing this yourself, request a diagnostic run.

AI Visibility Audit
Analysis

By Gaurav