Reviewing my use of AI

Disclosure

Since this post is related to the usage of AI, it is appropriate to disclose that all the prompts are my own, and that only my locally hosted brain-model has been prompted to generate the text

Early in 2024 I started testing out AI assistance when coding at work. At first only with local models, but almost a year later my employer required us to only use one specific AI service and no local models. Through this service I have been able to experiment with multiple other models than the local models I have been running since I first started.

First impression

Initially I was really impressed, especially when experiencing the speed seemingly well-written text could be generated, and when using the assistant to generate documentation blocks for existing code. It seemed almost magical to get coherent documentation across our under-documented code base, both inline usually following established standards and inside our docs/ directory. Or when getting it to attempt to restructure or simplify entangled code, though only on smaller and usually independent files. Gradually I began to increase the number of files to be edited simultaneously, especially when I was nudged to add files into the AI-client by the editor (open/active files were shown as greyed-out inside the chat interface, one click on + to add). For our under-documented project this worked really well, for a while.

Agent Mode

When “agent mode” was introduced, the nudging in the direction to write to larger sets of files really increased, especially since the client automatically added other seemingly related files from the project when iteration over a given task. This seemed very risky, but was somewhat countered by the fact that the agent now could run simple commands and read the output to actually validate the code it introduced. It was then very tempting to just let the agent brute-force its way through the issues until the code worked.

One of the consequences of this approach was that when the agent started working, the chat very quickly filled up with walls of text describing what it was responding and how it was responding to its own responses, in iterative loops. At the same time, multiple files were edited multiple times, while the agent tried and retried some code until it seemed to work. Reviewing all of this became a real chore, reading multiple pages in the minimal chat interface on what changes the agent was doing and reviewing the files that was edited. But since the code seemed to work, was this a problem? Why not ignore the feeling of needing control and “vibe code” the problems out of existence? We do have tests that will catch any mistakes, right?

Planning Assistant

Later, I started working on a migration project. This seemed like something that would be really straight forward for an AI assistant or agent to handle, right? Translating hundreds of files from one language-dialect and framework into another while adhering to our current coding standard.

I started with the planning phase, trying to analyze the current state and plan the migration effort, using the agent as a planning assistant. The assistant generated an analysis of our code base, a suggested end-state for the migration, and generated a multi-step implementation plan for how to get there. The text it generated was impressive, long, detailed and very persuasive. It even suggested creating scripts and documents to track the migration effort. It seemed too good to reject.

Was this a good idea? The legacy code was somewhat unfamiliar to me, as it was written long before I joined the company. It was therefore difficult to evaluate the issues that the assistant raised for the legacy code. My initial expectation was that the migration work would be fairly straight forward, since the goal was only to translate from one framework to another, not to fix all issues with the existing structure or logic. Still, the assistant’s suggested plan included fixing outdated structures, to untangle complicated logic, to rearrange, to establish “best practices”, to document inline and in dedicated documentation files, etc. Every single suggestion seemed reasonable at face-value, but I completely overlooked the scope increasing.

When discussing the plans, I tried to play the devil’s advocate and pushed back on several suggestions, in an iterative feedback loop to gradually improve the initial plan. The assistant provided well-structured and persuasive responses that seemed to take into account all the scenarios I threw at it. I got lists of pros and cons, and a concluding suggestions of hybrid approaches that kept all future options open. But is a hybrid approach always the best approach? And importantly, how much time did I actually spend on this back-and-forth, compared to just planning using my own head directly?

Progress Tracking

The assistant was used to create tracking scripts, and it even created a web interface with dashboards to track the migration effort. It suggested ingesting existing files to calculate the dependencies between them, and to track what areas where migrated. It seemed brilliant, flashy, organized. The vibe coding sessions were enjoyable, but time flied by when running multiple rounds with the agent assistant to fix the errors in the code to build the tracking solution. Was the tracking solution perfect yet? No, the assistant kept on suggesting improvements. Why not generate scripts that creates mermaid-diagrams from the dependencies? Why not improve the web interface with traffic lights to visualize the current state? Why not update the classification of the migration-tracking-script to improve the readability for other stakeholders when reading through the enormous block of generated documentation? They do read the work-in-progress-documents, right?

Captured by the vibes, I forgot to ask myself if this is actually needed. Was the migration project so large and complex that we needed high-level dashboards and automated tracking scripts to track the provenance of each individual file in the repository? What downstream consequences would such a strict approach have on our effort to migrate and the scope of the project? I at least, was mesmerized by the vibe and fell helpless down the rabbit hole until I realized too late that more time was used to fix the reporting dashboard than actually doing the work. Is this simply classic and willful procrastination?

Git Commit

Then the actual work started. I took the time to write detailed instruction files for how to migrate each type of file, and strict rules for how to handle references in the new framework. Then I instructed the assistant to follow the migration plan for one domain at a time, including rearranging the file and directory structure, documentation and the whole shebang. In theory this would save a lot of time and manual effort, especially if just vibe coding it.

During our migration, I frequently ran build commands to test the output, and it frequently failed, requiring me to either ask the assistant to give it another try or fix the files manually. Very often I needed to fix the files manually as well, which over time made it very difficult to distinguish our edits. Reverting just the edits of the assistant using plain git? Good luck with that!

Because I had let the scope increase beyond minimal migration, the migrated output consisted of multiple types of changes. Since changing framework, most files had changes on almost all lines. Using git diff then became almost useless, as the before-after-diff was so different. Committing this to our git log often became an all or nothing exercise.

When the migration effort had progressed for a while and pushing the code to our testing server, I realized that there was unexpected discrepancies between the legacy and the migrated output (unexpected database changes). Now I had to go back and debug to identify what changes had been made that actually changed logic and not only the structure. Since I at this point had limited my understanding of the data flow by vibe coding the migration, this was extremely time consuming, and likely alone took more than I saved during all the vibe sessions :-/

Risk Aversion

I must admit that I probably should have aborted the vibe coding approach earlier, but I was on a mission to test out the potential benefits of vibe coding and did not care (enough yet). For each domain, I ran through this whole process at least twice, and at least one of those attempts crash landed due to the end result not giving the same output as the legacy, and the git log being impossible to deconstruct in a way that I could debug the origin of the discrepancy without actually rewriting the migration manually. It might also be a symptom of overly complex code to begin with, but this was also the reason why I actually needed the migration to be just that, only migration and nothing else.

To this day, when I try to discuss everything from architectural changes to minuscule single-file migrations, the AI assistant consistently increases the scope of the assignment. Doing a simple migration using agent mode also typically results in some other issues with the original code and it automatically attempts to fix it resulting in additional code I was not asking for. Discussing larger architectural or strategic changes consistently results in well-structured pros and cons with a summary recommendation of a “hybrid approach” keeping all options open. What is this behavior really?

I think my primary takeaways from this small adventure are to be critical of suggestions that increase scope, to still hand-code, work incrementally, capture small and distinct changes in git commits, and to write good commit messages.

First impression

Agent Mode

Planning Assistant

Progress Tracking

Git Commit

Risk Aversion

More posts

Reviewing my use of AI

Joining the IndieWeb

Testing Debian Trixie

How do I… enable GPU passthrough from Proxmox to unprivileged LXC?