Two-Step RAG System for Knowledge Base Retrieval
This project consists of a production-grade Retrieval Augmented Generation (RAG) system with a two-step retrieval architecture, designed to provide highly accurate answers from a company's support documentation.

Details
Industry:
Overview
This project consists of a production-grade Retrieval Augmented Generation (RAG) system with a two-step retrieval architecture, designed to provide highly accurate answers from a company's support documentation.
The system automatically ingests articles from Zendesk Help Center, processes the content through an AI embedding pipeline, and stores structured data inside Supabase PostgreSQL with pgvector vector search.
Instead of performing a simple vector search on document chunks, the system implements a two-phase retrieval strategy:
First retrieving the most relevant documents using summary embeddings
Then searching within the chunks of those documents to extract the most relevant context for the AI model
The entire architecture is orchestrated through n8n workflows, enabling automated ingestion, incremental updates, and scalable AI retrieval.
The goal of this project was to build a scalable AI-powered knowledge retrieval system capable of answering user questions using the company's official Zendesk documentation.
The system continuously ingests knowledge articles using the Zendesk Help Center API. Instead of reprocessing the entire knowledge base, the pipeline uses a cursor-based timestamp mechanism to detect only newly created or updated articles.
This incremental ingestion strategy significantly reduces processing time and API usage.
During ingestion the system also applies several validation rules:
Draft articles are automatically excluded
Content hashing prevents duplicate insertions
Chunk-level deduplication ensures storage efficiency
Each document is processed through an AI pipeline that generates both document summaries and semantic embeddings using the OpenAI API.
The architecture separates embeddings into two levels:
Document-level embeddings
Used to identify the most relevant articles.
Chunk-level embeddings
Used to extract the exact passages needed to answer the user query.
All embeddings and metadata are stored in Supabase PostgreSQL using pgvector, enabling fast semantic search directly inside the database.
Custom PostgreSQL RPC functions perform vector similarity searches and are triggered through n8n workflows, which orchestrate the retrieval process and pass the relevant context to the AI model.
Tools Used / Stack
Automation & Orchestration
n8n
Knowledge Source
Zendesk Help Center API
AI & Embeddings
OpenAI API
OpenAI Embedding Models
Database & Vector Search
Supabase
PostgreSQL
pgvector
Backend Logic
PostgreSQL RPC Functions
REST API integrations
Key Features
Automated Zendesk Knowledge Ingestion
Articles are automatically extracted from Zendesk Help Center and converted into AI-readable knowledge through a fully automated pipeline.
Incremental Knowledge Sync
The system uses a cursor-based timestamp to detect newly created or updated articles, preventing unnecessary reprocessing of the entire knowledge base.
Draft Content Filtering
Articles marked as draft in Zendesk are automatically excluded, ensuring that only verified documentation is available to the AI system.
Hash-Based Deduplication
To maintain data integrity and avoid redundant storage, the pipeline implements content hashing mechanisms.
Two levels of deduplication are used:
Document-level hash
Prevents duplicate article insertions.
Chunk-level hash
Ensures that identical text chunks are not stored multiple times.
This significantly improves database efficiency and prevents embedding duplication.
AI Embedding Pipeline
Each document is processed through the OpenAI API to generate semantic vector embeddings used for similarity search.
The pipeline generates two types of embeddings:
Document summary embeddings
Each article is summarized and embedded to represent the overall meaning of the document.
Chunk embeddings
Articles are split into smaller semantic chunks which are embedded individually.
Two-Step Retrieval Architecture
Instead of performing a direct chunk search across the entire database, the system uses a two-stage retrieval strategy that improves relevance and performance.
Step 1 — Document Retrieval
The system first searches using summary embeddings, identifying the most relevant documents related to the user's query.
Step 2 — Chunk Retrieval
Once the relevant documents are identified, the system searches only within the chunks belonging to those documents, retrieving the most relevant passages to construct the final AI response.
This architecture dramatically improves retrieval accuracy and reduces noise from unrelated documents.
PostgreSQL RPC Retrieval Functions
All vector searches are executed through custom PostgreSQL RPC functions, enabling efficient similarity search directly inside the database.
These functions are triggered via n8n workflows, which manage the entire interaction between the database and the AI model.
Fully Automated Workflow Orchestration
The entire pipeline — ingestion, embeddings generation, deduplication, and retrieval — is orchestrated using n8n, creating a maintainable and scalable automation system.
Outcome
The result is a production-ready AI knowledge retrieval system capable of delivering highly relevant answers based on a company’s official documentation.
By combining:
Zendesk knowledge ingestion
OpenAI embeddings
Supabase vector search
PostgreSQL RPC functions
n8n workflow orchestration
the system maintains an always-updated AI knowledge base with minimal operational cost.
The two-step retrieval architecture significantly improves answer relevance compared to traditional RAG systems, making the solution suitable for:
AI customer support assistants
internal knowledge copilots
automated help center chatbots
AI-powered documentation search systems




