Chatbot Integration

Introduction

This brief writeup is being created both to document interesting development issues for a very real-world kind of application, and also because these writeups (and other documents) comprise the retrieval-augmented generation (RAG) corpus used the ground the chatbot's responses. Thus creating the writeup enables the chatbot to "know" about itself. I've named it "Jay" and given it a Steller's Jay theme, in honor of the West Coast.

The "Jay" chatbot is an off-the-shelf integration with Google Gemini via the Google Cloud Platform Vertex AI service. Interestingly, even such integrations can display interesting issues that provide an opportunity for research and problem-solving.

First, some background. Vertex AI is Google Cloud Platform's current AI brand, with a variety products and services under its umbrella. This includes API access to models like Gemini 3.0 Flash (which powers the chatbot) with baked-in integration of GCP's basic RAG implementation, RAG Engine. This is an easy-to-use system, simply ingesting documents from a bucket on GCP and offering basic options for chunk size and overlap, as well as chunk boundary detection system selection that includes an LLM parser option.

After selecting between an English-oriented or multilingual embedding model, the vector store is selected, in this case the default RAG store, the enjoyably named "RagManaged Cloud Spanner", which was chosen for its simplicity and perfectly adequate functionality for the small corpus used for the site chatbot. And then it just works. Beautiful.

API Design

The Problem