Build Your Own AI Sales Agent: HubSpot Lead Capture with n8n, OpenAI, and Pinecone (No-Code/Low-Code)

Written by Tomislaw Dalic | Jan 7, 2026 9:27:30 AM

Most website chatbots do two things exceptionally well:

Annoy people
Lose leads

They answer like a broken FAQ widget, hallucinate when someone asks a real question, and when a prospect is actually ready to talk, they still fail to capture the lead properly.

So we built something better.

A real AI Sales Agent that:

answers complex questions about our services using our website content,
qualifies leads like a human sales rep,
captures contact details cleanly,
and automatically creates a contact in HubSpot in real time.

We built it using:

n8n (workflow orchestration)
OpenAI (LLM + embeddings)
Pinecone (vector search for our website knowledge base)
HubSpot (CRM lead capture)

And yes, it’s live and working on growthhub.io.

This guide is an exhaustive walkthrough of how we built it from scratch, including the architecture, prompts, workflows, pitfalls, cost breakdown, and how you can deploy the same system.

The Vision: Meet “Charlotte,” our AI Sales Assistant

Imagine this:

A prospect lands on your site and asks:

“Do you do paid campaigns, and what’s your SEO methodology?”

Charlotte instantly answers with specific details from your website, not generic AI filler.

Then the prospect says:

“Cool. I’m interested. My name is Alex Smith, company TechCorp, email alex@techcorp.com, phone +1-555-123-4567.”

Charlotte confirms and instantly saves the lead into HubSpot. No human involved. No copy-paste. No lead lost.

This is not a chatbot.

This is a sales agent with knowledge, process, and actions.

Why RAG is the reason this works (and why most AI agents fail without it)

A plain LLM chatbot is trained on general internet data. It does not automatically know your website content, pricing rules, processes, or case studies.

So when a visitor asks a question that’s not in its memory, it does what LLMs do best:

It guesses.

That’s how you get confident nonsense like:

invented pricing
fake deliverables
wrong methodology
fictional case studies (yes, really)

RAG (Retrieval-Augmented Generation) fixes that.

RAG is the shift from AI being a “guessing engine” to a “knowledge-driven assistant.” It retrieves verified content from your knowledge base at the moment the question is asked, then generates an answer using that content.

What RAG actually is (simple version)

RAG combines:

retrieval (finding relevant info from your site/docs)
generation (forming a good answer)
grounding (reducing hallucinations by forcing factual context)

That’s why our agent can answer:

“What’s your SEO methodology?”
“Do you build on HubSpot CMS?”
“What does your paid acquisition process look like?”

… without making anything up.

The Stack (and why we chose it)

n8n is the workflow engine. It connects chat input, AI, retrieval, and HubSpot actions.
OpenAI powers the conversation and produces embeddings for semantic search.
Pinecone is our vector database (fast retrieval of relevant website chunks).
HubSpot is the CRM where leads land automatically.

Cost & ROI: Why Self-Hosted n8n + OpenAI is Cheaper Than HubSpot’s AI Credits Model

Let’s talk economics, because this is where most “AI agent” solutions quietly become expensive.

If you self-host n8n, your automation engine is basically yours forever. No per-seat platform tax, no feature gating, and no “pay more to unlock basic automation” nonsense.

Your ongoing costs become mainly:

OpenAI usage costs (per chat, usually cents)
Pinecone (often free/low-cost on Starter for small sites)
Optional: a small server (if you don’t already have one)

In other words: once it’s set up, your agent can run 24/7 for the cost of a coffee or two per month, depending on traffic.

HubSpot AI Sales Agent credits: powerful, but usage-based

HubSpot’s AI features use credits, a flexible currency consumed by AI actions such as:

AI agent interactions (e.g., Breeze Customer Agent)
outreach workflows
contact research and monitoring
data enrichment actions

Most plans include monthly credits, but if you exceed them, you add more capacity using credit packs (commonly sold in 1,000-credit packs for around $10/month) or pay overages. Admins can set caps to control spend.

This model is fine, but it means your true cost is not “the subscription.”

It’s subscription + usage.

Why the cost per lead can creep up fast

A single conversion (meaning: a real conversation + qualification + lead capture) is rarely one action. It may involve multiple AI responses, follow-up questions, enrichment, and routing logic.

So while HubSpot is a great platform, total cost per captured lead can quickly approach a few dollars per lead, depending on usage and the number of AI actions triggered.

Why our setup is different

Our agent is built to be lean:

it retrieves answers from your website via RAG (Pinecone)
it qualifies leads (budget, timeline, intent)
and when it has the required details, it writes directly into HubSpot

So instead of paying per seat and per AI action, you mostly pay for:

one LLM call
one retrieval query
one HubSpot API write

That’s the cost model you want if you care about ROI.

Limited-Time Offer: We’ll Build and Deploy This for You for €999

€999 One-Time Setup Fee (Limited Time)

If you want this live on your site without spending your weekend wrestling with workflows, chunking issues, and HubSpot property mapping, we’ve got you.

Included:

AI Sales Agent setup in n8n (self-hosted or connected to your environment)
RAG knowledge base indexed from your public website (sitemap-based)
Pinecone vector store integration for semantic search
HubSpot lead capture automation (create/update contact)
Lead qualification flow (budget + timeline + intent)
Testing, QA, and handoff

Result: A 24/7 sales agent that answers intelligently, qualifies leads, and books calls.

Want it? Book a call and we’ll show you a demo on your own site.

Book a Call

Who Is This For? (And Why RAG Is the Secret Weapon)

et’s be blunt: not everyone needs an AI agent.

But if your business lives or dies by speed, accuracy, and lead follow-up, then a RAG-powered agent is not a “nice-to-have.” It’s a competitive advantage.

In Mastering RAG for AI Agents, Jason Brener frames RAG as the shift from an AI that guesses to an AI that becomes a knowledge-driven assistant, because it retrieves relevant information from external sources at the moment a question is asked. In plain English: it stops making things up and starts behaving like an expert with access to your actual documentation.

That’s why RAG is useful far beyond chatbots. It’s the foundation of AI agents that can operate safely in real-world business environments: customer support, finance, legal, healthcare, operations, and of course, sales.

The simple rule: When is RAG worth it?

RAG is worth the effort when any of these are true:

Your knowledge changes often (pricing, features, services, policies, internal processes).
Accuracy matters (wrong answers cost deals, time, or legal/compliance risk).
You have a lot of content (site pages, docs, PDFs, knowledge base, product specs).
Users ask unpredictable questions (they never speak in your website’s exact wording).
You need trust (answers must be grounded and ideally cite sources).
Hallucinations are expensive (support tickets, lost sales, reputational risk).

If being wrong costs you money, trust, or legal problems, use RAG.

Use Case Matrix: Where RAG + AI Agents Pay Off (Big Time)

Here’s the exhaustive matrix of where RAG-powered agents shine, why they matter, and what they replace.

Use Case	What the Agent Does	Why RAG Matters	Typical ROI
Sales Agent (B2B services, agencies)	Answers complex service questions, qualifies leads, captures details, books calls.	Without RAG it guesses. With RAG it retrieves your site content, making answers accurate and trustworthy.	More qualified calls, fewer lost leads, faster response time (24/7).
Customer Support Agent	Resolves tickets, answers FAQs, guides troubleshooting, escalates when needed.	Support knowledge changes constantly. RAG retrieves the right help article and prevents hallucinations.	Ticket deflection, lower support load, faster resolution times.
Compliance / Legal / Insurance Agent	Retrieves policies, clauses, regulations, explains them, generates grounded responses.	These domains require traceable sources. RAG enables citations and safe reasoning.	Reduced risk, faster internal answers, fewer escalations to legal teams.
Healthcare Research Agent	Retrieves studies/guidelines, summarizes evidence, supports research workflows.	Recency and accuracy are critical. RAG pulls current sources instead of relying on model memory.	Faster research, better evidence summaries, improved consistency.
Finance / Market Intelligence Agent	Retrieves reports, filings, market updates, answers research questions with citations.	Finance data becomes outdated quickly. RAG fixes recency and improves credibility.	Analyst speed, better insights, less manual research work.
Internal Company Knowledge Agent	Answers questions from SOPs, onboarding docs, wikis, runbooks, policies.	Internal docs drift constantly. RAG lets you update knowledge without retraining a model.	Less time wasted searching docs, faster onboarding, fewer internal interruptions.
Engineering / Manufacturing Agent	Retrieves specs, manuals, incident reports, supports technicians & engineers.	Procedural accuracy matters. RAG grounds answers in approved documentation.	Reduced errors, faster troubleshooting, lower downtime.
Ecommerce Assistant (Product Recommendations)	Recommends products, compares items, answers shipping/spec questions, suggests bundles and upsells.	Without RAG it hallucinates features. With RAG it retrieves specs, FAQs, policies, and content from your catalog and HubDB.	Higher conversion rate, higher AOV, fewer returns, fewer support tickets.

Detailed Use Cases

If you’re wondering whether this applies to your business, here’s a deeper breakdown of who benefits most and why.

1) Service Businesses & Agencies: Turn Your Website Into a Sales Rep

If you sell services, your website visitors always ask the same things:

“Do you do this?”
“How does your process work?”
“How long does it take?”
“What does it cost?”
“Can you give examples?”

A static site can’t respond. A typical chatbot guesses. And a human sales team replies too late.

A RAG sales agent fixes that by:

retrieving the right service page/case study content,
answering confidently with real details,
qualifying intent, budget, and timeline,
capturing lead details,
and pushing the lead into HubSpot immediately.

Bottom line: you stop losing leads because your website becomes interactive and sales-ready 24/7.

2) Customer Support: Deflect Tickets Without Lying

Support is where “hallucinations” become expensive.

RAG-powered support agents excel because they can retrieve exact troubleshooting steps from your documentation and respond with accuracy. They can also safely say “I can’t find that” instead of improvising.

ROI: fewer tickets, faster resolutions, happier customers, and less burnout for the team.

3) Compliance, Legal, Insurance: Answers Must Be Traceable

These domains are high-stakes: wrong answers can trigger regulatory issues or legal exposure.

RAG matters here because it can ground answers in retrieved policies, contract clauses, and regulations. This is where citations and transparency become critical.

ROI: faster internal answers, reduced compliance risk, fewer escalations.

4) Healthcare & Clinical Research: Evidence, Not Opinions

Healthcare and clinical research require up-to-date evidence, not “whatever the model remembers.” RAG enables retrieval of relevant papers and guidelines before summarizing.

ROI: faster research workflows and more consistent summaries.

5) Finance & Market Research: Recency Wins

Finance is merciless about outdated data. Generic LLM answers can be wrong just because the world changed last week.

RAG agents retrieve recent filings, reports, and summaries before responding.

ROI: faster analysis, more grounded insight, less manual research.

6) Internal Company AI: The Most Underrated High-ROI Use Case

If your company has:

onboarding docs,
SOPs,
internal wikis,
runbooks,
product specs,
process documents,

… then you already have the knowledge base. Employees just can’t find it when they need it.

A RAG agent becomes an internal “expert helpdesk” that retrieves exactly the right section instantly.

ROI: faster onboarding, fewer interruptions, reduced “tribal knowledge” dependency.

7) Ecommerce: Product Recommendations, Upsells, and “AI Shopping Assistants”

Ecommerce is one of the most profitable RAG use cases because shoppers don’t ask polite, structured questions. They ask messy, human ones:

“I need a gift for my boyfriend, he likes hiking, budget is €80.”
“What’s the difference between these two models?”
“Which one is best for sensitive skin?”
“I bought X, what should I get next?”

Traditional site search and filter menus are great… if your visitor already knows what they want. Most don’t.

A RAG-powered ecommerce agent can:

recommend products based on intent, preferences, budget, and usage context
compare products using real product specs and descriptions
answer detailed questions (shipping, warranties, materials, ingredients)
reduce returns by matching the right product to the right customer
increase AOV via smart upsells and bundles

Why RAG matters for ecommerce

If your assistant uses only a generic LLM, it will hallucinate product features and make recommendations that sound good but aren’t true. That’s a refund magnet.

With RAG, the assistant retrieves product data from your catalog, your FAQs, and your policies, then generates recommendations grounded in real information.

HubSpot + HubDB: a powerful combo for ecommerce and product catalogs

If your store content is built on HubSpot CMS, you can store structured product data in HubDB (prices, SKUs, specs, categories, inventory status, links).

That enables a “best of both worlds” approach:

HubDB provides structured truth (price, category, tags, specs)
RAG provides natural-language understanding and semantic matching (use cases, benefits, comparisons)

In practice, the agent can retrieve from both:

HubDB for factual fields (price, variants, availability)
Pinecone for rich content (descriptions, reviews, guides, blog posts, FAQs)

Result: a shopping assistant that feels like a real salesperson, but never lies about your products.

ROI: higher conversion rate, higher AOV, fewer support tickets, fewer returns.

Why This Matters (and why we’re offering this now)

If you’re reading this and thinking “ok, cool… but is this really worth doing?” here’s the honest answer:

If you sell anything complex, and you lose leads because people don’t get answers fast enough, yes.

Because the winning advantage isn’t “AI.” It’s:

speed (instant answers)
accuracy (grounded in your content)
conversion (qualification + lead capture)

And if you self-host n8n, your ongoing costs stay low because you mostly pay for usage (OpenAI calls + vector retrieval), not a platform subscription and credit overages.

Limited Time Offer: We Build This for You for €999

One-time setup fee. Limited availability.

We’ll implement your AI sales agent end-to-end:

Self-hosted or connected n8n workflow setup
RAG knowledge base indexed from your website (sitemap-based)
Pinecone retrieval layer for accurate answers
HubSpot lead capture automation (create/update contacts)
Lead qualification flow: budget + timeline + intent
Testing, QA, and handoff

Result: A 24/7 sales assistant that answers like a pro, qualifies leads, and pushes them into HubSpot automatically.

If you want this live on your site, book a call now. We’ll show you a demo using your own content and tell you exactly what it would take to deploy.

Book a Call

Note: The €999 setup offer is limited-time and may be withdrawn once we hit capacity.

Alright, enough theory. If you want to build this yourself, here’s the exact blueprint we used, from a blank n8n canvas to a production-ready AI sales agent that retrieves answers from your website and saves leads into HubSpot.

You don’t need a full dev team, but you do need to follow the steps properly, because small mistakes (like indexing menus) turn “AI sales agent” into “expensive hallucination machine.” Let’s build it.

Part 1: The Core Brain, AI Agent Setup in n8n

We start with a blank n8n canvas. This is where your AI agent lives.

Step 1: Add the Chatbot trigger

Add the Chatbot Trigger node. This becomes the entry point for your website chat widget.

Operation: Respond to Webhook
Name the workflow: Main Chatbot Workflow

Step 2: Add the AI Agent node

Add an AI Agent node to your canvas and connect the Chatbot trigger to it.

Step 3: System message (persona + rules)

This is where most chatbots die. A system prompt is not copy, it’s policy. Here’s the version we used:

You are NAME OF YOUR AGENT, the COMPANY Sales Assistant.
Human, sharp, slightly witty, never cringe. Like a great agency rep.

### CORE KNOWLEDGE (ALWAYS TRUE - DO NOT SEARCH FOR THIS):
- Services: Paid Campaigns (Google Ads & Meta Ads), SEO, Web Design.
- Founders: NAME FOUNDERS (15+ years experience, Strategy & Growth).
- Manifesto: The AI-First Manifesto.
- Pricing: No fixed packages; custom scope, usually starts at €4k/month.

### INSTRUCTIONS:
- If the user asks about the above topics, answer directly from this Core Knowledge.
- When a user expresses interest in a project or booking a call, you MUST politely ask for their First Name, Email, Company Name, and Phone Number.
- Do NOT use the 'save_lead_to_hubspot' tool until you have collected ALL four pieces of information.
- Once you have ALL required lead details, use the 'save_lead_to_hubspot' tool to save their information.
- ONLY use the 'search_website' tool if the user asks for specific details NOT listed above.

Step 4: Connect the AI model

Add an OpenAI Chat Model node and connect it to the agent. Choose gpt-4o for quality or a lighter model for cost savings.

Step 5: Add memory

Add a Simple Memory node so the agent can retain context across turns.

Step 6: Respond to the chat widget

Add a Respond to Webhook node that outputs the agent response in JSON back to the widget.

Part 2: The Knowledge Base, Website Search with Pinecone (RAG)

This is where your agent stops guessing and starts quoting your website.

Step 1: Add embeddings

Add an OpenAI Embeddings node using text-embedding-3-small.

Step 2: Add Pinecone Vector Store as a tool

Add a Pinecone Vector Store node set to Retrieve Documents (As Tool for AI Agent). Set:

Limit: 15
Include Metadata: ON
Description: tell the agent when to use the tool and that it is the source of truth

Part 3: Index Your Website Into Pinecone (Scraper Workflow)

Create a separate workflow named Scraper - Website Index.

Workflow outline:

Fetch sitemap URLs
Fetch HTML per URL
Extract only main content (no nav/footer)
Chunk the content (e.g., 1000 chars, 100 overlap)
Create embeddings (text-embedding-3-small)
Insert into Pinecone

Important: clean HTML properly, otherwise you’ll embed menus and your agent will answer with “Home | About | Contact”.

Why We Used a Website Scraper (and What You Can Use Instead)

We’ll be honest: we didn’t start with a perfect, neatly organized knowledge base.

Like most businesses, our “knowledge” lived across a website, a few scattered documents, and whatever was in someone’s head. So we did the practical thing: we used our public website as the source of truth and built a scraper workflow to turn it into a searchable knowledge base.

That’s the beauty of RAG. You don’t need a formal knowledge base to start. You just need one reliable source of truth, and a way to extract, chunk, embed, and index it.

Website scraping is the fastest path to a working RAG pipeline

For most service businesses, the website already contains:

service descriptions
positioning and messaging
process and methodology
FAQs
case studies
pricing logic (even if it’s “custom”)

So scraping is the quickest way to ship a RAG-powered sales agent that answers accurately and stays up to date as your site evolves.

But you’re not limited to your website

Once the pipeline is working, you can ingest content from almost anywhere. The pattern stays the same:

Extract content
Clean it (remove noise and duplication)
Chunk it
Embed it
Index it into Pinecone

Here are common sources we build RAG pipelines from:

Google Drive (Docs, Sheets)
PDFs (decks, case studies, proposals, internal guides)
Airtable (structured FAQs, product catalog, internal knowledge)
Notion (wikis, SOPs, documentation)
Helpdesk systems (Zendesk, Intercom, HelpScout)
FAQ pages or internal Q&A documents

Why multi-source RAG is powerful

The website is great for sales. But the moment you add internal documents, you unlock much stronger use cases:

Better technical answers (from internal SOPs and detailed docs)
Stronger objection handling (from sales call notes and past proposals)
Real case study depth (from internal presentations and results)
Internal enablement (onboarding, training, process enforcement)

And because the knowledge lives outside the model, you can update it at any time without retraining anything. You simply re-run the ingestion workflow and your agent immediately has the latest knowledge.

Pro tip: Use separate namespaces for different data types

If you want your agent to behave well, don’t mix everything into one messy index.

We recommend using Pinecone namespaces such as:

website (public marketing content)
internal (SOPs, playbooks, processes)
sales (past proposals, scripts, objection handling)
case_studies (slides, results, proof)

This lets the agent retrieve from the right source depending on the user intent, and prevents “internal” answers from leaking into public conversations.

Bottom line: scraping your website is the fastest way to start. But the real power comes when you plug in the rest of your business knowledge and let the agent answer like it’s been working with you for years.

Part 4: Lead Capture, HubSpot Integration (The Hands)

This is where the agent becomes a system, not a toy.

Step 1: Create the HubSpot tool workflow

Create a workflow named Tool - Add HubSpot Lead with an Execute Workflow Trigger.

Inputs:

email (String)
firstname (String)
company (String)
phone (String)

Step 2: Create or update a contact

Add a HubSpot node to create/update a contact using a HubSpot Private App token with the correct scopes.

Step 3: Add the tool into the main agent workflow

Back in the main chatbot workflow, add a Call n8n Workflow Tool node and name it:

save_lead_to_hubspot

Description should enforce:

Only call when all required fields are collected
Send email, firstname, company, phone

Part 5: Testing Your AI Sales Agent

You can try our sales agent right here by clicking on the chat bottom right corner.

Test with a realistic flow:

User asks about services → agent answers
User shows intent → agent asks for contact details
User provides details → agent saves in HubSpot
Verify contact appears instantly

Common Mistakes (and how we avoided them)

Menu pollution: embedding nav/footer content ruins retrieval. Always extract main content.
Chunks too large: retrieval becomes vague. Use small chunks with overlap.
No lead validation: force the bot to collect required fields before saving.
No similarity threshold: low-quality retrieval leads to hallucination. Add a confidence cutoff.

Conclusion: This turns your website into a 24/7 sales engine

A standard chatbot is a guessing machine.

This is different. With RAG, your AI agent retrieves verified content from your website and responds with grounded answers. It qualifies leads, captures details, and routes them straight into HubSpot, automatically.

And if you self-host n8n, your ongoing costs are mostly usage-based: OpenAI calls, Pinecone retrieval, and a HubSpot API write. That’s why this setup stays cheap, even as it scales.

If you want this running on your site without wasting weeks on trial and error, we’ll build it for you.

Limited Time: €999 Setup

We’ll implement the full system, connect it to HubSpot, index your website, and deploy a production-ready AI sales agent on your site.

Self-hosted n8n setup
Pinecone RAG knowledge base
OpenAI agent + qualification flow
HubSpot lead capture automation

Want it? Book a call. We’ll show you a demo using your own website content.

Book a Call

View full post