A reverse-vibe-coding workflow for refactoring

in Programming & Dev4 hours ago

image.png

(Image by Gemini AI)

In my previous blog post I introduced the Reverse Vibe-Coding proto-manifesto. My post was a relatively high level overview of the proposed workflow that close to reverses the human/AI worksplit in order to come to a solid workflow that is suitable for full product lifecycle production grade product development, geared to product-lifetime productivity, not quick prototyping.

While there actually is some vibe-coding in reverse vibe-coding (mainly for experiments and learning), and while core business logic is hand-coded with a strong focus on maintainability and the DRY principle, in full-lifetime development, refactoring is a huge part of the total development, and a place where the proposed RVC workflow has its actual AI backbone.

A backbone that revives a merge-based git workflow and rather than the ambiguous rebased blob commits now common in many AI assisted and vibe-coding workflows, a git history with complete provenance again, differentiating sharply between human contributions and contributions by AI.

In this post I want to dive into this backbone. All at a conceptual level, but at a deeper more technical conceptual level than what I outlined in the proto-manifesto.

Note that while this post outlines the refactor workflow, the workflow for scaffolding and boilerplate are very similar in most aspects, but to keep maximum clarity in this blog post, we only look at the refactoring workflow.

Prompt -> prompt-template -> parallel templates -> DSL

Prompting a LLM or SLM in English is great for tasks that explore novel approaches, but as you use a particular model for a while, you find wordings that work significantly better than other prompts. After having had a few reproducibility incidents when you have forgotten the proper wording, many developers start collecting those really prompts. When doing similar tasks having looked up old prompts in your collection, a lightbulb moment often occurs, that, hey, what if I turned this prompt into a template. So now instead of English we are prompting with what is basically a bit of code doing a template invocation that creates the prompt.

When we then have a collection of templates, sometimes a template doesn't quite work best, so we make a slightly tuned version of the original prompt into a second template, then a third. And instead of giving the LLM/SLM one prompt, we give it all variants and then as humans end up choosing the best of the results.

Then, the more we use the template invocation code to prompt, we think how inconvenient it is to write a small bit of Python/Jinja code, and a prompting DSL starts emerging, all in small steps, all at the same time. In Reverse Vibe-Coding, we embrace this process. No English prompts for our target project. English prompts are put into templates in a separate central DSL repo. When we have defined them, we extend the DSL to make invocation convenient.

As noted, RVC is still a moving target and so is the DSL sub-workflow, but by embracing this sub-workflow, users can start moving away from the imprecise artistic way of prompting to a more deterministic flow.

The bottom line:

"English is not the right language for non-creative AI code assistance tasks, we need a Domain Specific Language"

At first every organization, team or lone developer will have their own DSL, but likely a DSL will eventually emerge that is generic enough to be widely used, and the templating layer may become obsolete because the LLMs/SLMs could be trained to natively use that DSL.

A validation baseline

Every programming language is different in available tooling, so let us not get hung up on the exact set of validation baseline tools and provisioning. Some languages will need more, others will need less, but at the base, we need to define a validation baseline that helps us figure out one thing:

  • Did the AI break things?*

Let's look at one language, Python, as an example. What tools do we need to run to see if the AI broke something, it's just for illustration as for what kind of tools you may want in your validation baseline.

  • Linting, code complexity and coding conventions: pylint
  • Idiomatic coding style : pycodestyle
  • Checks for dead code : vulture
  • Basic code security checks : bandit
  • Property based testing : hypothesis
  • Basic unit tests : unittest

Before code is given to any AI (LLM or SLM), it should pass all these tools as a validation baseline. If the result of any AI action is to be presented to the user, it should first pass the validation baseline too. Consider the validation baseline, the handover contract between the human and the AI.

Passing the baseline does not mean “correct”, only “good enough to hand back to a human”.

Note also what is missing. No integration tests. We treat the AI like we treat human developers. this is not CICD yet, this is all pre-CICD human/AI handover. And the validation baseline is part of a multi-try setup for the AI. We will give the AI multiple chances to pass the validation baseline before giving up on a branch, and adding long running processes to such a pipeline would result in undesirable latency in the human AI interaction.

Protocol-in-a-file

In the refactor workflow of Reverse Vibe-Coding, there are no IDE hooks or integrations for AI. We want full provenance, and we want to do away with history-deleting rebase heavy git workflows. Rebase in git workflow is considered workflow smell. The only hook that the refactor workflow has is the git push the user does.

Because a git push in itself contains very little information, we need to run a protocol on top of it. For RVC, we call this protocol RVPP (Reverse Vibe-coding Prompt Protocol), and it is implemented in an RVP (Reverse Vibe-Coding Prompt) file, where the. One refactor task one RVP file, and we number the files sequentially, starting with R1.rvp for refactor prompt files where the 'R' stands for Refactor.

We define RVP files as append only files. The file is divided up in assignments and all assignment from inception until merge are referred to as the task.

An assignment is always started with a chunk of DSL code created by the user. If the assignment gets completed, a report gets added to the assignment and two two newlines create an empty line after which the user can add new assignment DSL code. In the next section we look at how an assignment is processed, broken into sub assignments, and how topic branches mix into the protocol.

On push to trunk after RVP creation.

So what should happen on a GIT push? The new commits are looked at and if there is a commit to trunk that creates a new RVP file, the following process starts:

  1. Baseline validation
  2. assignment to assignment prompt-set conversion
  3. topic-branch creations
  4. per-topic-branch assignment processor is started

While the implicit RVP workflow contract states that the repo should meet the baseline, we start off verifying by testing the baseline. If the baseline isn't met, then the hook processing is simply abandoned. The next step is conversion of the chunk of assignment DSL to all the variants of the task startup assignment prompt. Every variant gets its own, likely ephemeral, topic branch, and then for each variant topic-branch/prompt, a stateful assignment processor is started. I’m getting a bit ahead of myself, but for clarity's sake, all but the chosen branch are ephemeral by design.

What happens next happens in parallel for each of the variants/topic-branches, and for each variant it happens 1 up to N times, where 8 is suggested as default for N.

  1. The prompt is given to the SLM (or LLM)
  2. The result is reintegrated in the topic branch code.
  3. Baseline validation is run, if it fails and the try count is less than N, validation errors are fed back to the SLM and we continue once more at 1.
  4. On an Nth failure the topic branch is deleted
  5. On first baseline validation success, a report section is added to the RVP file.
  6. The changes are committed to the topic branch
  7. If trunk has seen commits since the start of the push trigger, trunk is merged into the topic branch and the validation baseline is validated once more.

Now once every variant topic branch has either been deleted or has been updated with the SLM output, the user is supposed to either merge one of the surviving topic branches, or add a new assignment to the RVP file in order to continue on the result from one specific topic branch.

On push to topic branch after RVP update

The hooks for user commits to a topic branch are slightly different but still mostly similar to those for trunk. The differences are marked in bold.

  1. Baseline validation
  2. Deletion of non-chosen topic branches
  3. Assignment to assignment prompt-set conversion
  4. Topic-branch creations
  5. per-topic-branch assignment processor is started

And for the per variant processor:

  1. The prompt is given to the SLM (or LLM)
  2. The result is reintegrated in the topic branch code.
  3. Baseline validation is run, if it fails and the try count is less than N, validation errors are fed back to the SLM and we continue once more at 1.
  4. On an Nth failure the topic branch is deleted
  5. On first baseline validation success, a report section is added to the RVP file.
  6. The changes are committed to the topic branch
  7. If trunk has seen commits since the start of the push trigger, trunk is merged into the topic branch and the validation baseline is validated once more.
  8. The parent topic branch is deleted.

As we see, it's all basically all the same, except for the topic branch deletions.

On merge to trunk.

When the final chosen topic branch finally concludes the refactoring task, nothing much is needed anymore. The topic branches get deleted and the normal CICD bit of the processing of the merge commences. That part falls outside of the RVC workflow, so we leave it unspecified in this blog post.

Workflow latency streamlining

We don't want to have the user that is refactoring waiting, but neither do we want to incept too many merge conflicts. To help with that, the user is expected to start on new tasks (RVP file) that touch different parts of the code while waiting for an assignment within an active task. As a rule of thumb, two to four parallel tasks are suggested for an optimal workflow.

Summarizing

In this post we looked at the Reverse Vibe-Coding git workflow for refactoring. The workflow for scaffolding and boilerplate is slightly more involved because of baseline validation bootstrapping, but most of the workflow for these is quite similar. I hope this outline shows how this part of the RVC workflow is a robust and highly productivity-efficient alternative to the use of IDE integrated AI, and how it allows us to return to a provenance preserving merge based workflow at the same time. I hope this post demonstrates that RVP is the right choice for AI enhanced productivity from a full product lifecycle perspective, and that moving away from English as a prompting language and from the IDE as integration point for AI are good choices that help move AI assisted coding for production deployment forward.