Site Design and RAG Chatbot Integration

Spoiler alert: these pages are used as the RAG corpus for the chatbot, and this page in particular was created to provide self-referential grounding.

Introduction

Software engineering tasks that used to take weeks can now take minutes. As the value of raw coding decreases, other skills become more needed. Coming from a background in requirements and integration, and having dealt with the needs of full-stack development for a customer-facing web portal, the shift to "AI-Native Fullstack Developer" as such has been fascinating.

This is a space to discuss my observations on these new tools, and explain my perspective on their use and potential pitfalls, in addition to documenting the site itself. I'll start with a meme that found me on the internet:

It's not that bad, assuming one gives the system clear requirements and limited scope. And maybe we didn't think of adding fins to the car but once they're there we realize it's good. Given the broad training of the coding assistants and strong increase in quality recently, allowing our suddenly-trusted advisors leeway to follow their bliss is, if nothing else, a valid form of research into architectural variations we may not have considered.

AI Native Software Development for this Site (and in general)

The title "AI Native Software Developer" has been appearing in job listings recently. We're going to take it to mean using AI assistants in a ground-up fashion. That's how this site was created. While letting agents run with a vague idea can be interesting, this site was created with a more standard, research-requirements-implementation-testing cycle.

Of course there are nuances. Agents may be spec'd and spun off for particular tasks like test and documentation writing, for example, and a few turns of "loading" the context with framing questions prior to setting the assistant loose helps maintain quality. But this is the general iterative workflow I use.

One practice I've found helpful in generating Python, and one I'd like to share, is similar to the comic above. Some first passes may overbuild the code, and create a profusion of fragmentary one-line functions. Remediation requires researching what was done, trimming and reframing requirements to make the system more efficient, and providing guidance such as "use a functional, minimalist style," which seems something of a magic incantation for improving the readability of LLM-generated code.

Despite these costs, the improvement in code robustness is welcome, and it's likely the increase in power will cut down on the release of half-baked MVPs that never get remediated. The problem is no longer adding features, but controlling the complexity.

Anyway, the main complexity of the site is that the project writeups are created in a different project and exported into the site. This segregates messy changes and large artifacts from the main site repo.

RAG Chatbot Integration

The markdown files comprising the writeup sources (and some other documents) are also the core of the retrieval-augmented generation (RAG) corpus used to ground the chatbot's responses. Thus creating the writeup enables the chatbot to answer questions about the site from its own source material.

The chatbot is an off-the-shelf integration with Google Gemini via the Google Cloud Platform Vertex AI service. Even such integrations can surface interesting issues that provide an opportunity for research and problem-solving.

First, some background. Vertex AI is Google Cloud Platform's current AI brand, with a variety of products and services under its umbrella. This includes API access to models like Gemini 3.0 Flash (which powers the chatbot) with baked-in integration of GCP's basic RAG implementation, RAG Engine. This is an easy-to-use system, simply ingesting documents from a bucket on GCP and offering basic options for chunk size and overlap, as well as chunk boundary detection system selection that includes an LLM parser option.

After selecting between an English-oriented or multilingual embedding model, the vector store is selected, in this case the default RAG store, the enjoyably named "RagManaged Cloud Spanner", which was chosen for its simplicity and perfectly adequate functionality for the small corpus used for the site chatbot. And then it just works. Beautiful.

Almost. The initial implementation had a miss in that "thinking" must be enabled and the "thought signature" must be threaded through the conversation history to enable coherent responses and proper RAG tool use.

A great example of the power of AI-enabled development was the ease with which Langfuse instrumentation could be enhanced with information on the RAG query generated by Gemini, which isn't a normal part of the metadata. Although this visibility did lead to a dead end of trying to manage the issues with prompt design, facilitated by Langfuse prompt versioning, the whole matter was dealt with relatively quickly through another round of research, which revealed the crucial thinking signature functionality of the Gemini API.