Select Language

English

Select Language

English

Two-Step RAG System for Knowledge Base Retrieval

This project consists of a production-grade Retrieval Augmented Generation (RAG) system with a two-step retrieval architecture, designed to provide highly accurate answers from a company's support documentation.

Details

Role:

Service:

Industry:

Overview

This project consists of a production-grade Retrieval Augmented Generation (RAG) system with a two-step retrieval architecture, designed to provide highly accurate answers from a company's support documentation.

The system automatically ingests articles from Zendesk Help Center, processes the content through an AI embedding pipeline, and stores structured data inside Supabase PostgreSQL with pgvector vector search.

Instead of performing a simple vector search on document chunks, the system implements a two-phase retrieval strategy:

First retrieving the most relevant documents using summary embeddings
Then searching within the chunks of those documents to extract the most relevant context for the AI model

The entire architecture is orchestrated through n8n workflows, enabling automated ingestion, incremental updates, and scalable AI retrieval.

The goal of this project was to build a scalable AI-powered knowledge retrieval system capable of answering user questions using the company's official Zendesk documentation.

The system continuously ingests knowledge articles using the Zendesk Help Center API. Instead of reprocessing the entire knowledge base, the pipeline uses a cursor-based timestamp mechanism to detect only newly created or updated articles.

This incremental ingestion strategy significantly reduces processing time and API usage.

During ingestion the system also applies several validation rules:

Draft articles are automatically excluded
Content hashing prevents duplicate insertions
Chunk-level deduplication ensures storage efficiency

Each document is processed through an AI pipeline that generates both document summaries and semantic embeddings using the OpenAI API.

The architecture separates embeddings into two levels:

Document-level embeddings
Used to identify the most relevant articles.

Chunk-level embeddings
Used to extract the exact passages needed to answer the user query.

All embeddings and metadata are stored in Supabase PostgreSQL using pgvector, enabling fast semantic search directly inside the database.

Custom PostgreSQL RPC functions perform vector similarity searches and are triggered through n8n workflows, which orchestrate the retrieval process and pass the relevant context to the AI model.

Tools Used / Stack

Automation & Orchestration

Knowledge Source

Zendesk Help Center API

AI & Embeddings

OpenAI API
OpenAI Embedding Models

Database & Vector Search

Supabase
PostgreSQL
pgvector

Backend Logic

PostgreSQL RPC Functions
REST API integrations

Key Features

Automated Zendesk Knowledge Ingestion

Articles are automatically extracted from Zendesk Help Center and converted into AI-readable knowledge through a fully automated pipeline.

Incremental Knowledge Sync

The system uses a cursor-based timestamp to detect newly created or updated articles, preventing unnecessary reprocessing of the entire knowledge base.

Draft Content Filtering

Articles marked as draft in Zendesk are automatically excluded, ensuring that only verified documentation is available to the AI system.

Hash-Based Deduplication

To maintain data integrity and avoid redundant storage, the pipeline implements content hashing mechanisms.

Two levels of deduplication are used:

Document-level hash
Prevents duplicate article insertions.

Chunk-level hash
Ensures that identical text chunks are not stored multiple times.

This significantly improves database efficiency and prevents embedding duplication.

AI Embedding Pipeline

Each document is processed through the OpenAI API to generate semantic vector embeddings used for similarity search.

The pipeline generates two types of embeddings:

Document summary embeddings

Each article is summarized and embedded to represent the overall meaning of the document.

Chunk embeddings

Articles are split into smaller semantic chunks which are embedded individually.

Two-Step Retrieval Architecture

Instead of performing a direct chunk search across the entire database, the system uses a two-stage retrieval strategy that improves relevance and performance.

Step 1 — Document Retrieval

The system first searches using summary embeddings, identifying the most relevant documents related to the user's query.

Step 2 — Chunk Retrieval

Once the relevant documents are identified, the system searches only within the chunks belonging to those documents, retrieving the most relevant passages to construct the final AI response.

This architecture dramatically improves retrieval accuracy and reduces noise from unrelated documents.

PostgreSQL RPC Retrieval Functions

All vector searches are executed through custom PostgreSQL RPC functions, enabling efficient similarity search directly inside the database.

These functions are triggered via n8n workflows, which manage the entire interaction between the database and the AI model.

Fully Automated Workflow Orchestration

The entire pipeline — ingestion, embeddings generation, deduplication, and retrieval — is orchestrated using n8n, creating a maintainable and scalable automation system.

Outcome

The result is a production-ready AI knowledge retrieval system capable of delivering highly relevant answers based on a company’s official documentation.

By combining:

Zendesk knowledge ingestion
OpenAI embeddings
Supabase vector search
PostgreSQL RPC functions
n8n workflow orchestration

the system maintains an always-updated AI knowledge base with minimal operational cost.

The two-step retrieval architecture significantly improves answer relevance compared to traditional RAG systems, making the solution suitable for: