• The Product-Led Geek
  • Posts
  • How to Benchmark (and Boost) Your Brand’s Visibility in AI Recommendations

How to Benchmark (and Boost) Your Brand’s Visibility in AI Recommendations

Welcome folks! 👋

This edition of The Product-Led Geek is a practical guide for product and developer tool companies to win the new LLM discovery game and be on the right side of the next big distribution shift. It will take 8 minutes to read and you’ll learn:

  • How to systematically benchmark your brand's visibility in AI recommendations across ChatGPT, Gemini, and other LLMs.

  • The five-step process for an effective benchmarking process, starting with creating authentic prompt suites that mirror real user behaviour and track your performance against competitors.

  • The timeframe-specific strategies to influence your LLM discoverability from quick 30-day wins to long-term ecosystem dominance.

Let’s go!

GEEK OUT

How to Benchmark (and Boost) Your Brand’s Visibility in AI Recommendations

I speak with a lot of founders in my work, and something I’m now hearing with increasing regularity is concern about how they (and their competitors) show up in AI search queries.

In the first half of this year I’ve had at least ten such conversations.

It’s a humbling and eye-opening experience for a founder to ask ChatGPT for the best SDK for their use case - only to see one of their competitors recommended, while their own product is nowhere to be found.

You can have a better product with more attractive pricing, but that can all be made irrelevant when your competitor has the LLM distribution advantage.

If you’re reading this, you probably aware of the core problem: the way people discover products, especially developer tools and SaaS, is changing fast.

It’s a fundamental distribution shift.

Five years ago, you obsessed over your Google ranking.

Today, your prospects are asking ChatGPT, Gemini, or Perplexity things like “What’s the best CRM for startups?” or “Recommend a robust authentication library for my Next.js app?”

If your brand isn’t mentioned, you’re invisible.

Yet the vast majority of product and growth teams (and for developer tools, devrel teams, docs teams and technical GTM teams) are flying blind.

They’re hoping their existing SEO and content strategies will somehow translate to LLM recommendations.

Spoiler: they won’t.

This guide is your practical, research-backed playbook for:

  • Systematically benchmarking your brand’s AI visibility (across ChatGPT, Gemini, Perplexity, Claude, and more)

  • Crafting prompts that reflect real user behaviour

  • Analysing results to find actionable gaps

  • Special tactics for developer tools and SDKs

  • Iterating and improving your “AI share of voice” over time

My approach draws on my experience working with clients over the last 12 months on this problem, and now building DevTune - a product designed to solve this critical GTM challenge.

While I’ve been working with devtool companies in this area, most of what I’ve included in this post is relevant to other verticals too.

Here’s what we’ll cover:

  1. Why AI recommendations are now as critical as SEO

  2. How to build a prompt suite that mirrors real user intent

  3. Running your benchmark: platforms, process, and tracking

  4. Special focus: developer tools & SDKs (where the rules are different)

  5. Analysing results and closing the gap

  6. The iterative improvement loop

  7. Next steps and resources

1. Why AI Recommendations Are The New SEO (And Why You’re Probably Invisible)

Let’s set the stakes.

When someone asks ChatGPT “What’s the best social media scheduler?” or “Which JavaScript library should I use for billing and entitlements?”, the response is curated, authoritative-sounding recommendations.

Unlike traditional search, where users see a buffet of links and make their own choices, AI assistants present a filtered answer.

This creates a winner-takes-most dynamic.

If your brand appears, you capture mindshare before prospects even know to look for you.

If not, you’re out of the running.

What I’ve learnt:

  • SEO ≠ LLM visibility: High Google rankings correlate with LLM mentions, but not perfectly. Some brands dominate search but are invisible in AI answers, while others punch above their weight in LLMs.

  • Third-party content matters: LLMs draw from “best of” lists, reviews, Stack Overflow, Reddit, GitHub ‘awesome’ repos, and community forums - not just your website.

  • For developer tools: LLMs lean heavily on technical documentation, example code, Q&A, and open-source presence.

If you’re not actively benchmarking and optimising for LLM recommendations, you’re leaving a massive new discovery channel to chance.

2. Build a Prompt Suite That Mirrors Real User Intent

Why Your Prompts Matter

Not all prompts surface brands.

To benchmark your visibility, you need to mimic real user behaviour - not a marketing wishlist.

How to Find Authentic Prompts

  • SEO keyword research: Convert long-tail, question-style keywords into natural, conversational queries (e.g. “best email marketing tool for solopreneurs” → “What’s the best email tool for solopreneurs?”).

  • Google Search Console: Filter for queries starting with who/what/why/how/best/top.

  • Customer support & sales logs: Mine transcripts for the exact language prospects use.

  • Community forums (Reddit, Stack Overflow, Discord): Look for how users actually ask for recommendations.

Example Prompt Types

Prompt Type

General SaaS Example

Developer Tool Example

Best/top X

What are the best project management tools?

What’s the best Ruby auth SDK for React?

Best X for Y

Best CRM for solopreneurs?

Best logging library for Node.js?

Alternatives to Z

Alternatives to Salesforce?

Alternatives to Firebase for mobile apps?

Problem-solution

How do I automate invoice reminders?

How do I add 2FA to my Python app?

Comparison

Monday vs Asana - which is better for SMBs?

Sentry vs Rollbar for error tracking?

It’s critical that your prompts are based on real research and observed user behaviour, not hypotheticals.

How to Structure Prompts for Unbiased Benchmarking

  • Be specific and contextual: “Best analytics platform for small e-commerce websites” will surfaces niche players better than generic prompts.

  • Ask for a list: “Give me five options for…” increases the chance of seeing multiple brands.

  • Include use-case, audience, or requirements: E.g., “for indie developers”, “for HIPAA compliance”.

  • Vary terminology: For devtools, test “SDK”, “library”, “package”, “API”, etc.

  • Avoid brand-leading prompts: Don’t ask “Why is [YourBrand] the best X?” unless testing for brand knowledge.*

Note: *Caveat: for dev tool SDKs, there’s an important category of LLM benchmarking beyond Discovery where brand-leading prompts are essential - testing the implementation accuracy of AI responses, and tracking both Discovery and Implementation scores vs competitors is recommended:

3. Running Your Benchmark: Platforms, Process, and Tracking

Where to Test

Different LLMs have different knowledge and behaviours. For a complete picture, test across multiple:

  • OpenAI’s ChatGPT: Use the latest model (e.g. GPT-5), both with and without web browsing.

  • Perplexity & Bing Chat: Search-augmented, often with citations - shows what content influences answers.

  • Google Gemini/SGE: Summarises from Google-indexed sites - good for seeing SEO-to-AI translation.

  • Claude etc.: Adds breadth, especially for international audiences.

Pro tip: Run prompts in a fresh session/incognito to avoid context bias.

You can use a multi-LLM chat tool (I like Chorus) to ask the same question of multiple models at once:

How to Track Results

The most basic approach is to set up a simple spreadsheet or tracker in Notion or Airtable with these columns:

  • Prompt and ID

  • Provider/model

  • Date/Time

  • Was your brand mentioned? (Yes/No)

  • Position in list (if ranked)

  • Was your own site cited?

  • LLM Response

  • Which competitors were included?

  • Notes on answer context

You can download this sheet here to get started.

This is a good start but you’ll inevitably quickly hit scalability issues where this approach becomes clumsy and unwieldy.

As a minimum you’ll soon want:

  • Many test suites, with multiple prompts per suite to cover different use cases

  • A way to schedule the test suites to run automatically defined sets of providers/models

  • A way to record results

  • Scoring of results

  • Historical analysis to understand if your efforts to improve LLM discoverability are having an impact

And that’s when you should start to look at a dedicated tool to manage the process.

4. Special Focus: Developer Tools & SDKs – The Rules Are Different

If you’re building SDKs/APIs, the game changes significantly.

Developers ask different questions, use different language, and value different signals.

Problem-First Thinking

Developers rarely start with something as generic as “What’s the best authentication SDK?” Instead, they ask:

  • “How can I add user authentication to my React app?”

  • "Recommend a well maintained library for user analytics in React Native."

  • “What’s the easiest way to handle payments in an Android mobile app?”

  • “How do I implement real-time features in my Next.js app without building my own infrastructure?”

Your prompt suites should reflect this problem-first mentality.

Note: Experienced devs will often phrase their questions quite differently to less experienced devs in vibe-coding environments. I recommend catering for these differences with specific prompt suites.

Technical Context Is Key

Developer prompts often include:

  • Programming languages (“Python OCR library” vs “JavaScript image processing”)

  • Frameworks (“React state management” vs “Vue.js state management”)

  • Platform constraints (“iOS push notifications” vs “cross-platform notifications”)

  • Scale (“library for small projects” vs “enterprise-grade solution”)

Test prompts that mirror these contexts.

Community and Documentation Signals

LLMs trained on public data heavily weight:

  • Community discussions (Stack Overflow, Reddit, GitHub Issues)

  • Documentation quality and clarity

  • Blog posts and comparison articles

  • GitHub presence (README quality, stars, community activity)

If your SDK documentation is excellent but your community presence is minimal, you might appear in some contexts but not others.

Example Developer-Focused Prompts

Intent

Example Prompt

Problem-solution

How do I implement SSO in a Next.js app?

Direct recommendation

What’s the best image processing library for Python?

Alternatives

Alternatives to Stripe for payment APIs in Europe?

Feature comparison

Which Node.js auth SDK has the best TypeScript support?

Community wisdom

What’s the most popular logging library for Go?

Temperature

For devtools, most prompts are executed by developers within their day-to-day workflows, typically within their IDE or builders like Lovable, v0, or Bolt.

To improve determinism in these environments, they typically call LLM APIs with a temperature of 0, so for devtool discovery benchmarking, replicating this in your tests is critical to build an accurate picture.

Note: Some platforms (I build DevTune to do this) will ensure zero-temperature defaults for testing, but also go further like simulating the context that IDEs like Cursor and platforms like Lovable include in their prompts.

5. Analysing Results and Closing the Gap

Once you’ve run your prompt suite, review your findings:

  • Which prompts trigger your brand?

  • Where are you invisible?

  • Which competitors appear consistently?

  • How do you rank versus your competitors?

  • Are specific sources cited (blogs, docs, forums)?

Content Gap Analysis

When competitors appear but you don’t, reverse-engineer their success:

  • What sources are LLMs citing when recommending them?

  • What features or benefits do LLM responses highlight?

  • What language patterns are used in their descriptions?

This is all about understanding which content signals drive AI recommendations in your category.

The Feedback Loop

AI systems aren’t neutral.

They reflect the biases, gaps, and patterns in their training data.

Brands with strong content footprints and community presence get recommended more, which drives more discussion and content creation, further improving their AI visibility.

6. The Iterative Improvement Loop: From Insights to Action

AI visibility isn’t a one-time project.

As you improve your content and community presence, re-run your benchmark prompts regularly.

Remember there are multiple cycles in which you can influence your LLM discoverability.

  • 0-30 days: Docs/search hacks, metadata tweaks, connectors, Q&A planting.

  • 1-2 months: Ecosystem traction, wrappers, early dev buzz.

  • 3-6 months: Search dominance, curated lists, ecosystem integrations.

  • 6m+: Pre-training ingestion, industry adoption, canonical references.

The trick is to layer them:

  • Use short-term levers (docs, metadata, connectors) to get quick discoverability.

  • Push medium-term levers (tutorials, wrappers, ecosystem traction) to stabilise visibility.

  • Build long-term inevitability (integration, adoption, retrain presence) so you’re embedded in the next GPT/Claude/Gemini/Grok/… checkpoint.

Timeframe

What You Can Influence

Why It Shows Up in LLMs

0–30 days

Update docs, metadata, integrations, and seed Q&A mentions

Retrieval-augmented LLMs and search connectors re-index quickly (daily/weekly); no retraining needed.

1–2 months

Publish blogs, tutorials, and generate early traction signals

Needs search engine cycles + user adoption; LLMs ingest once usage and backlinks stabilise.

3–6 months

Gain inclusion in curated lists and stable search rankings

Multiple crawl cycles and sustained presence are needed before results bias consistently.

6–12 months+

Secure ecosystem integrations, case studies, and educational use

Larger adoption signals feed into next model fine-tuning or pre-training windows (6–12m cadence).

12m+

Become the “default” or industry-standard reference

Full integration into LLM pre-training → responses surface even without retrieval.

7. The LLM Visibility Benchmarking Process

Let’s distil this process into a repeatable, actionable framework you can use immediately:

The 5-Step LLM Visibility Benchmarking Process

  1. Prompt Suite Design: Build sets of 5-10 prompts that reflect real user language for your category and use cases (see Section 2).

  2. Multi-Platform Testing: Run each prompt across at least 3-4 major LLMs (ChatGPT, Gemini, Grok, Claude, etc.), both with and without web access if possible.

  3. Systematic Tracking: Log every result: brand mention, position, context, citations, and competitors.

  4. Gap Analysis: Identify where you’re missing, which competitors dominate, and what content or signals are cited.

  5. Iterative Optimisation: Update your content, documentation, and community presence based on findings. Re-test regularly to track improvements.

Conclusion: Be On The Right Side Of The Next Big Distribution Shift

Your AI recommendation share of voice is now as important as your Google ranking.

By testing, tracking, and iterating around LLM visibility, you ensure your brand or SDK surfaces when it matters most - at the moment of user discovery.

Your next steps:

  1. Build your prompt suite (start with 5-10 real-world queries)

  2. Run your benchmark across 3-4 LLMs

  3. Document and analyse the results

  4. Pick one or two high-impact gaps to address first (e.g., missing from “best X” prompts, invisible in developer forums)

  5. Re-test and track progress monthly

The companies that master AI-driven discovery early will have a significant advantage as this channel grows and the distribution shift sets in.

Those that ignore it will find themselves increasingly invisible to prospects who’ve already moved beyond traditional search.

The question isn’t whether AI will transform how customers discover products - it’s whether you’ll be visible when they do.

If you’re a devtool company and want help in this area, get in touch, or check out DevTune for yourselves - the waitlist is now open!

Enjoying this content? Subscribe to get every post direct to your inbox!

THAT’S A WRAP

Before you go, here are 3 ways I can help:

Take the FREE Learning Velocity Index assessment - Discover how your team's ability to learn and leverage learnings stacks up in the product-led world. Takes 2 minutes and you get free advice.

Book a free 1:1 consultation call with me - I keep a handful of slots open each week for founders and product growth leaders to explore working together and get some free advice along the way. Book a call.

Sponsor this newsletter - Reach over 7600 founders, leaders and operators working in product and growth at some of the world’s best tech companies including Paypal, Adobe, Canva, Miro, Amplitude, Google, Meta, Tailscale, Twilio and Salesforce.

That’s all for today,

If there are any product, growth or leadership topics that you’d like me to write about, just hit reply to this email or leave a comment and let me know!

And if you enjoyed this post, consider upgrading to a VIG Membership to get the full Product-Led Geek experience and access to every post in the archive including all guides.

Until next time!

— Ben

RATE THIS POST (1 CLICK - DON'T BE SHY!)

Your feedback helps me improve my content

Login or Subscribe to participate in polls.

Reply

or to participate.