- The Product-Led Geek
- Posts
- How to Benchmark (and Boost) Your Brand’s Visibility in AI Recommendations
How to Benchmark (and Boost) Your Brand’s Visibility in AI Recommendations
Welcome folks! 👋
This edition of The Product-Led Geek is a practical guide for product and developer tool companies to win the new LLM discovery game and be on the right side of the next big distribution shift. It will take 8 minutes to read and you’ll learn:
How to systematically benchmark your brand's visibility in AI recommendations across ChatGPT, Gemini, and other LLMs.
The five-step process for an effective benchmarking process, starting with creating authentic prompt suites that mirror real user behaviour and track your performance against competitors.
The timeframe-specific strategies to influence your LLM discoverability from quick 30-day wins to long-term ecosystem dominance.
Let’s go!

GEEK LINKS
3 of the best growth reads from this week
1. Don’t Let Culture Drift as You Scale
2. How AI Agents Generate $1.5M+ Pipeline Every Month
3. How Linktree is Using AI to Accelerate and Do More With Less

GEEK OUT
How to Benchmark (and Boost) Your Brand’s Visibility in AI Recommendations
I speak with a lot of founders in my work, and something I’m now hearing with increasing regularity is concern about how they (and their competitors) show up in AI search queries.
In the first half of this year I’ve had at least ten such conversations.
It’s a humbling and eye-opening experience for a founder to ask ChatGPT for the best SDK for their use case - only to see one of their competitors recommended, while their own product is nowhere to be found.
You can have a better product with more attractive pricing, but that can all be made irrelevant when your competitor has the LLM distribution advantage.
If you’re reading this, you probably aware of the core problem: the way people discover products, especially developer tools and SaaS, is changing fast.
It’s a fundamental distribution shift.
Five years ago, you obsessed over your Google ranking.
Today, your prospects are asking ChatGPT, Gemini, or Perplexity things like “What’s the best CRM for startups?” or “Recommend a robust authentication library for my Next.js app?”
If your brand isn’t mentioned, you’re invisible.
Yet the vast majority of product and growth teams (and for developer tools, devrel teams, docs teams and technical GTM teams) are flying blind.
They’re hoping their existing SEO and content strategies will somehow translate to LLM recommendations.
Spoiler: they won’t.
This guide is your practical, research-backed playbook for:
Systematically benchmarking your brand’s AI visibility (across ChatGPT, Gemini, Perplexity, Claude, and more)
Crafting prompts that reflect real user behaviour
Analysing results to find actionable gaps
Special tactics for developer tools and SDKs
Iterating and improving your “AI share of voice” over time
My approach draws on my experience working with clients over the last 12 months on this problem, and now building DevTune - a product designed to solve this critical GTM challenge.
While I’ve been working with devtool companies in this area, most of what I’ve included in this post is relevant to other verticals too.
Here’s what we’ll cover:
Why AI recommendations are now as critical as SEO
How to build a prompt suite that mirrors real user intent
Running your benchmark: platforms, process, and tracking
Special focus: developer tools & SDKs (where the rules are different)
Analysing results and closing the gap
The iterative improvement loop
Next steps and resources
1. Why AI Recommendations Are The New SEO (And Why You’re Probably Invisible)
Let’s set the stakes.
When someone asks ChatGPT “What’s the best social media scheduler?” or “Which JavaScript library should I use for billing and entitlements?”, the response is curated, authoritative-sounding recommendations.
Unlike traditional search, where users see a buffet of links and make their own choices, AI assistants present a filtered answer.
This creates a winner-takes-most dynamic.
If your brand appears, you capture mindshare before prospects even know to look for you.
If not, you’re out of the running.

What I’ve learnt:
SEO ≠ LLM visibility: High Google rankings correlate with LLM mentions, but not perfectly. Some brands dominate search but are invisible in AI answers, while others punch above their weight in LLMs.
Third-party content matters: LLMs draw from “best of” lists, reviews, Stack Overflow, Reddit, GitHub ‘awesome’ repos, and community forums - not just your website.
For developer tools: LLMs lean heavily on technical documentation, example code, Q&A, and open-source presence.
If you’re not actively benchmarking and optimising for LLM recommendations, you’re leaving a massive new discovery channel to chance.
2. Build a Prompt Suite That Mirrors Real User Intent
Why Your Prompts Matter
Not all prompts surface brands.
To benchmark your visibility, you need to mimic real user behaviour - not a marketing wishlist.
How to Find Authentic Prompts
SEO keyword research: Convert long-tail, question-style keywords into natural, conversational queries (e.g. “best email marketing tool for solopreneurs” → “What’s the best email tool for solopreneurs?”).
Google Search Console: Filter for queries starting with who/what/why/how/best/top.
Customer support & sales logs: Mine transcripts for the exact language prospects use.
Community forums (Reddit, Stack Overflow, Discord): Look for how users actually ask for recommendations.
Example Prompt Types
Prompt Type | General SaaS Example | Developer Tool Example |
---|---|---|
Best/top X | What are the best project management tools? | What’s the best Ruby auth SDK for React? |
Best X for Y | Best CRM for solopreneurs? | Best logging library for Node.js? |
Alternatives to Z | Alternatives to Salesforce? | Alternatives to Firebase for mobile apps? |
Problem-solution | How do I automate invoice reminders? | How do I add 2FA to my Python app? |
Comparison | Monday vs Asana - which is better for SMBs? | Sentry vs Rollbar for error tracking? |
It’s critical that your prompts are based on real research and observed user behaviour, not hypotheticals.
How to Structure Prompts for Unbiased Benchmarking
Be specific and contextual: “Best analytics platform for small e-commerce websites” will surfaces niche players better than generic prompts.
Ask for a list: “Give me five options for…” increases the chance of seeing multiple brands.
Include use-case, audience, or requirements: E.g., “for indie developers”, “for HIPAA compliance”.
Vary terminology: For devtools, test “SDK”, “library”, “package”, “API”, etc.
Avoid brand-leading prompts: Don’t ask “Why is [YourBrand] the best X?” unless testing for brand knowledge.*
Note: *Caveat: for dev tool SDKs, there’s an important category of LLM benchmarking beyond Discovery where brand-leading prompts are essential - testing the implementation accuracy of AI responses, and tracking both Discovery and Implementation scores vs competitors is recommended:

3. Running Your Benchmark: Platforms, Process, and Tracking
Where to Test
Different LLMs have different knowledge and behaviours. For a complete picture, test across multiple:
OpenAI’s ChatGPT: Use the latest model (e.g. GPT-5), both with and without web browsing.
Perplexity & Bing Chat: Search-augmented, often with citations - shows what content influences answers.
Google Gemini/SGE: Summarises from Google-indexed sites - good for seeing SEO-to-AI translation.
Claude etc.: Adds breadth, especially for international audiences.
Pro tip: Run prompts in a fresh session/incognito to avoid context bias.
You can use a multi-LLM chat tool (I like Chorus) to ask the same question of multiple models at once:
How to Track Results
The most basic approach is to set up a simple spreadsheet or tracker in Notion or Airtable with these columns:
Prompt and ID
Provider/model
Date/Time
Was your brand mentioned? (Yes/No)
Position in list (if ranked)
Was your own site cited?
LLM Response
Which competitors were included?
Notes on answer context
You can download this sheet here to get started.
This is a good start but you’ll inevitably quickly hit scalability issues where this approach becomes clumsy and unwieldy.
As a minimum you’ll soon want:
Many test suites, with multiple prompts per suite to cover different use cases
A way to schedule the test suites to run automatically defined sets of providers/models
A way to record results
Scoring of results
Historical analysis to understand if your efforts to improve LLM discoverability are having an impact
And that’s when you should start to look at a dedicated tool to manage the process.

4. Special Focus: Developer Tools & SDKs – The Rules Are Different
If you’re building SDKs/APIs, the game changes significantly.
Developers ask different questions, use different language, and value different signals.
Problem-First Thinking
Developers rarely start with something as generic as “What’s the best authentication SDK?” Instead, they ask:
“How can I add user authentication to my React app?”
"Recommend a well maintained library for user analytics in React Native."
“What’s the easiest way to handle payments in an Android mobile app?”
“How do I implement real-time features in my Next.js app without building my own infrastructure?”
Your prompt suites should reflect this problem-first mentality.
Note: Experienced devs will often phrase their questions quite differently to less experienced devs in vibe-coding environments. I recommend catering for these differences with specific prompt suites.
Technical Context Is Key
Developer prompts often include:
Programming languages (“Python OCR library” vs “JavaScript image processing”)
Frameworks (“React state management” vs “Vue.js state management”)
Platform constraints (“iOS push notifications” vs “cross-platform notifications”)
Scale (“library for small projects” vs “enterprise-grade solution”)
Test prompts that mirror these contexts.
Community and Documentation Signals
LLMs trained on public data heavily weight:
Community discussions (Stack Overflow, Reddit, GitHub Issues)
Documentation quality and clarity
Blog posts and comparison articles
GitHub presence (README quality, stars, community activity)
If your SDK documentation is excellent but your community presence is minimal, you might appear in some contexts but not others.
Example Developer-Focused Prompts
Intent | Example Prompt |
---|---|
Problem-solution | How do I implement SSO in a Next.js app? |
Direct recommendation | What’s the best image processing library for Python? |
Alternatives | Alternatives to Stripe for payment APIs in Europe? |
Feature comparison | Which Node.js auth SDK has the best TypeScript support? |
Community wisdom | What’s the most popular logging library for Go? |
Temperature
For devtools, most prompts are executed by developers within their day-to-day workflows, typically within their IDE or builders like Lovable, v0, or Bolt.
To improve determinism in these environments, they typically call LLM APIs with a temperature of 0, so for devtool discovery benchmarking, replicating this in your tests is critical to build an accurate picture.
Note: Some platforms (I build DevTune to do this) will ensure zero-temperature defaults for testing, but also go further like simulating the context that IDEs like Cursor and platforms like Lovable include in their prompts.
5. Analysing Results and Closing the Gap
Once you’ve run your prompt suite, review your findings:
Which prompts trigger your brand?
Where are you invisible?
Which competitors appear consistently?
How do you rank versus your competitors?
Are specific sources cited (blogs, docs, forums)?
Content Gap Analysis
When competitors appear but you don’t, reverse-engineer their success:
What sources are LLMs citing when recommending them?
What features or benefits do LLM responses highlight?
What language patterns are used in their descriptions?
This is all about understanding which content signals drive AI recommendations in your category.
The Feedback Loop
AI systems aren’t neutral.
They reflect the biases, gaps, and patterns in their training data.
Brands with strong content footprints and community presence get recommended more, which drives more discussion and content creation, further improving their AI visibility.
6. The Iterative Improvement Loop: From Insights to Action
AI visibility isn’t a one-time project.
As you improve your content and community presence, re-run your benchmark prompts regularly.
Remember there are multiple cycles in which you can influence your LLM discoverability.
0-30 days: Docs/search hacks, metadata tweaks, connectors, Q&A planting.
1-2 months: Ecosystem traction, wrappers, early dev buzz.
3-6 months: Search dominance, curated lists, ecosystem integrations.
6m+: Pre-training ingestion, industry adoption, canonical references.
The trick is to layer them:
Use short-term levers (docs, metadata, connectors) to get quick discoverability.
Push medium-term levers (tutorials, wrappers, ecosystem traction) to stabilise visibility.
Build long-term inevitability (integration, adoption, retrain presence) so you’re embedded in the next GPT/Claude/Gemini/Grok/… checkpoint.
Timeframe | What You Can Influence | Why It Shows Up in LLMs |
---|---|---|
0–30 days | Update docs, metadata, integrations, and seed Q&A mentions | Retrieval-augmented LLMs and search connectors re-index quickly (daily/weekly); no retraining needed. |
1–2 months | Publish blogs, tutorials, and generate early traction signals | Needs search engine cycles + user adoption; LLMs ingest once usage and backlinks stabilise. |
3–6 months | Gain inclusion in curated lists and stable search rankings | Multiple crawl cycles and sustained presence are needed before results bias consistently. |
6–12 months+ | Secure ecosystem integrations, case studies, and educational use | Larger adoption signals feed into next model fine-tuning or pre-training windows (6–12m cadence). |
12m+ | Become the “default” or industry-standard reference | Full integration into LLM pre-training → responses surface even without retrieval. |
7. The LLM Visibility Benchmarking Process
Let’s distil this process into a repeatable, actionable framework you can use immediately:
The 5-Step LLM Visibility Benchmarking Process
Prompt Suite Design: Build sets of 5-10 prompts that reflect real user language for your category and use cases (see Section 2).
Multi-Platform Testing: Run each prompt across at least 3-4 major LLMs (ChatGPT, Gemini, Grok, Claude, etc.), both with and without web access if possible.
Systematic Tracking: Log every result: brand mention, position, context, citations, and competitors.
Gap Analysis: Identify where you’re missing, which competitors dominate, and what content or signals are cited.
Iterative Optimisation: Update your content, documentation, and community presence based on findings. Re-test regularly to track improvements.

Conclusion: Be On The Right Side Of The Next Big Distribution Shift
Your AI recommendation share of voice is now as important as your Google ranking.
By testing, tracking, and iterating around LLM visibility, you ensure your brand or SDK surfaces when it matters most - at the moment of user discovery.
Your next steps:
Build your prompt suite (start with 5-10 real-world queries)
Run your benchmark across 3-4 LLMs
Document and analyse the results
Pick one or two high-impact gaps to address first (e.g., missing from “best X” prompts, invisible in developer forums)
Re-test and track progress monthly
The companies that master AI-driven discovery early will have a significant advantage as this channel grows and the distribution shift sets in.
Those that ignore it will find themselves increasingly invisible to prospects who’ve already moved beyond traditional search.
The question isn’t whether AI will transform how customers discover products - it’s whether you’ll be visible when they do.
If you’re a devtool company and want help in this area, get in touch, or check out DevTune for yourselves - the waitlist is now open!
Enjoying this content? Subscribe to get every post direct to your inbox!

THAT’S A WRAP
Before you go, here are 3 ways I can help:
Take the FREE Learning Velocity Index assessment - Discover how your team's ability to learn and leverage learnings stacks up in the product-led world. Takes 2 minutes and you get free advice.
Book a free 1:1 consultation call with me - I keep a handful of slots open each week for founders and product growth leaders to explore working together and get some free advice along the way. Book a call.
Sponsor this newsletter - Reach over 7600 founders, leaders and operators working in product and growth at some of the world’s best tech companies including Paypal, Adobe, Canva, Miro, Amplitude, Google, Meta, Tailscale, Twilio and Salesforce.
That’s all for today,
If there are any product, growth or leadership topics that you’d like me to write about, just hit reply to this email or leave a comment and let me know!
And if you enjoyed this post, consider upgrading to a VIG Membership to get the full Product-Led Geek experience and access to every post in the archive including all guides.
Until next time!

— Ben
RATE THIS POST (1 CLICK - DON'T BE SHY!)Your feedback helps me improve my content |
Reply