Most AI chatbots are glorified search bars wrapped in a chat bubble. They hallucinate. They dodge questions. They frustrate the exact people they're supposed to help.
We built something different. Max is a free AI-powered HubSpot support assistant trained on 1,600+ knowledge base articles, basically our website sitemap and all our internal knowledge.
Ask it about workflows, lifecycle stages, custom reports, HubDB, lead scoring, integrations — it gives you a real answer grounded in actual documentation. No login. No email gate. No paywall.
This is the full technical breakdown of how we built it, what decisions we made, and why. If you're considering building a support chatbot for your own product or service, this is your blueprint.
Running an agency where your customers use a complex product like HubSpot means fielding the same questions over and over. "How do I set up a workflow trigger?" "What's the difference between lifecycle stage and lead status?" "Can I create a custom report that does X?"
The answers exist. They're scattered across the Internet in knowledge bases, Reddit, fora, eBooks, websites — thousands of articles, constantly updated, organized in ways that make them hard to find. We needed an assistant that could surface the right answer instantly, based on the actual documentation, not on whatever an LLM hallucinated from its training data.
The requirements were simple:
The first step is getting your documentation into the pipeline. We built an automated n8n workflow that ingests content from multiple source formats — web pages, PDFs, EPUBs, internal docs, help articles, FAQs, SOPs — whatever we had and whatever is available on the Internet.
The ingestion workflow handles:
For Max, we ingested 1,600+ resources covering every product, feature, and workflow. The full ingestion takes roughly 4 hours for a knowledge base of this size.
You can't feed an entire article into a language model and expect precise answers. Articles range from 500 to 5,000+ words. The model needs small, focused chunks of text to retrieve the most relevant information.
We use a Recursive Character Text Splitter that breaks each article into chunks of approximately 2,000 characters with 200 characters of overlap. The overlap ensures that context isn't lost at chunk boundaries — if a sentence spans two chunks, both chunks contain the full sentence.
Each chunk gets transformed into a vector embedding using OpenAI's text-embedding-3-small model. This converts the text into a 1,536-dimensional numerical representation that captures its semantic meaning. Two chunks about "setting up HubSpot workflows" will have similar vectors, even if they use different words.
Why text-embedding-3-small instead of the larger model? Cost and speed. For a knowledge base of this size, the smaller model delivers excellent retrieval accuracy at a fraction of the cost. The quality difference is negligible for support documentation.
The embedded chunks need to live somewhere they can be searched fast. We chose Qdrant — an open-source vector database that we self-host on our own infrastructure.
Why Qdrant over Pinecone, Weaviate, or ChromaDB?
The Qdrant collection is configured with Cosine distance similarity, which works best for normalized text embeddings.
Each stored point contains the text chunk, the embedding vector, and metadata including the source URL, article title, and category.
RAG stands for Retrieval-Augmented Generation. It's the architecture that makes the chatbot accurate instead of hallucinatory.
Here's exactly what happens when a user asks a question:
This is fundamentally different from a standard chatbot. A standard ChatGPT wrapper generates answers from whatever it learned during training — which might be outdated, incomplete, or simply wrong. A RAG-powered chatbot generates answers from your actual, current documentation. If HubSpot updates their UI tomorrow, we re-ingest, re-embed, and the chatbot knows about the change.
We're model-agnostic. The RAG architecture works with any LLM. For Max, we evaluated several options:
For Max, we went with Grok. The reasoning: support questions need fast answers. Users won't wait 10 seconds. Grok delivers sub-3-second responses with quality that's more than sufficient for documentation-based Q&A.
The model doesn't need to be creative or philosophical — it needs to accurately relay what's in the retrieved documentation.
For clients who need absolute data privacy, we can deploy open-source models on dedicated infrastructure. The LLM is a swappable component. Change the model, keep everything else.
Max runs on a custom chat interface embedded directly on our HubSpot website. Not an off-the-shelf widget. Not a third-party iframe. A purpose-built frontend that matches our brand and gives us full control over the user experience.
The interface handles:
The entire pipeline is orchestrated by n8n — an open-source workflow automation platform we self-host. n8n connects every component:
Why n8n over building custom code? Speed of iteration. We can modify the system prompt, swap the LLM, adjust retrieval parameters, or add new tools without touching code. The visual workflow makes it easy to debug — click on any node, see exactly what went in and what came out.
A chatbot trained on stale documentation is worse than no chatbot at all. We built an automated ingestion pipeline that keeps the knowledge base current.
The update workflow monitors source content and processes changes automatically:
This same architecture powers our ebook knowledge base — a separate Qdrant collection containing 10 own HubSpot ebooks we send to customers for onboarding (1,690+ vector points) that we use
Here's everything that powers Max, laid out plainly:
Total monthly infrastructure cost: under €30. That covers the VPS, the API calls for embeddings and LLM inference, and nothing else.
No SaaS subscriptions. No per-seat licensing. No vendor lock-in.
A few things became clear during the build that weren't obvious at the start:
Max is a showcase. It demonstrates what a production-grade AI support chatbot looks like when it's built on real architecture instead of duct-taped onto a ChatGPT wrapper.
The same pipeline works for any business with a knowledge base. SaaS products, professional services, e-commerce, healthcare, fintech — if you have documentation, FAQs, or support articles, you can have an AI assistant that actually knows your product.
What changes per client is the data source, the LLM selection, the branding, and the compliance requirements.
The architecture stays the same. The pipeline stays the same. The result stays the same: instant, accurate answers grounded in your actual content.
We typically deploy a production chatbot in 1-2 days depending on the client's tech stack and content volume. That includes scraping and ingesting the knowledge base, configuring the RAG pipeline, building the chat interface, testing retrieval quality, and tuning the system prompt.
If your support team is drowning in repetitive tickets, or your customers are waiting hours for answers that already exist in your documentation, this is the fix. Not a chatbot that guesses. A chatbot that knows.
Right now Max handles each conversation independently — it doesn't remember you from last time.
That's fine for one-off support questions, but we want to go further.
We're experimenting with persistent memory layers like Supermemory that would let Max remember your HubSpot setup, your subscription tier, the issues you've dealt with before, and the solutions that worked.
Imagine a support assistant that already knows your tech stack when you come back with a new question.
That's where we're headed.
Try Max for free and see for yourself. Then book a strategy call if you want one for your business.