AI Coding Tools, One Year Later: What’s Really Changed?

As an engineering leader, I am always looking for ways to help my engineering team not only achieve the goals of the organization but really enjoy what they are doing. They love to code, and I want to help them do that. Part of that means taking time to find ways and tools to get the right outcomes. So how can AI help with that?

A year or so ago, I set out to see how great the AI tools really were at replacing software engineering efforts. I learned a lot, and they absolutely made me a better engineer, but it still felt like they had a long way to go. Recently, I’ve been hearing from some big hitters in the industry that AI coders are now much closer to replacing engineers. Some claims even say that premium services offer AI coding bots that could actually replace a lot of engineering work. Even though I didn’t have access to those premium services, I decided it was time to jump back in and retest the waters with the newly improved tools. How much better could they really be?

My Project Setup

This time, I wanted to expand my use of AI tools, so I decided to use ChatGPT 4o, the free version of Copilot embedded in VS Code, and a new tool called Cursor (which is a paid service). Last year, I had only used Copilot and ChatGPT 4.

For the project, I chose an old codebase I had built years ago, a full-stack app with React, Express/NodeJS, and MySQL. My goals were simple:

Upgrade the React app to the latest version and patterns (ES6)
Build a more scalable, modern service architecture, including separate UI and service microservices
Migrate from MySQL to PostgreSQL

I started the work by first using the free Copilot inside VS Code. It was honestly a great place to get my feet wet. My first objective was to rework the structure of the application: cleaner folders, better organization, and better overall separation of concerns. I set up an initial project that mirrored the original, just to get it working first, and planned to start upgrading it from there.

At this stage, I leaned heavily on Copilot, prompting it for help and sometimes copying code into ChatGPT directly when I needed a deeper second opinion. Most of this work was really focused on solving single problems within single areas of code.

More AI Horsepower

After some slower-than-hoped progress with the free tools, I moved over to Cursor (paid for tool). This was a noticeable upgrade. Cursor is built to look across the entire codebase, not just the file you’re working on, and it made a real difference.

The first thing I asked it to do was update the React Router setup and centralize the authentication configuration across several files. Cursor nailed it right away. It found the correct solution, made the changes across multiple files, and barely needed me to step in.

From there, I kept giving Cursor more and more complex tasks. I started to notice a pattern: it would handle about 80% of the work correctly. I still needed to supervise and clean up the final 20%, but honestly, it saved a huge amount of time getting the heavy lifting done upfront. Initially, I got into a groove and had high hopes.

The First Major Roadblock

Eventually, I ran into a much bigger issue: authentication callback problems.

In my app, authentication is handled through an external provider using OpenID Connect. When the user signs in, the third-party authentication provider initiates a callback that is handled by my React app. The problem was that during the callback, the token just wasn’t being returned and set properly.

No matter what tool I used — Cursor, ChatGPT, Copilot, even Gemini & Claude...none of them could solve it. It started with an extended session trying to figure it out, then I spent hours, which turned into a few days, trying different prompts and approaches. Each AI tool would take me down different paths, but nothing truly worked. I spent significant amounts of time writing elaborate prompts with very detailed information, hoping that if I was clear enough, the AI tools could find the pattern causing the problem and resolve it. I really hoped I could "prompt" my way out of it.

Ultimately, I had to fix it myself. After a lot of frustration with endless back and forths with the AI tools, I started a troubleshooting approach, simplified the code, broke down the complexity into single areas of focus and then I found the real issue: a set of double calls that would reset the token cookie immediately after it was set. None of the AI tools caught this. It was pure manual debugging on my part.

A Pattern Emerges

As I worked through the project, I noticed a strange pattern. When my prompts were extremely specific and detailed, the tools would mostly complete the tasks. But as I asked for follow-up changes, they would often fix one thing and accidentally break something else they had previously done correctly. It usually took about three to eight iterations of back and forth to get any non-trivial change fully completed and bug-free. It became almost predictable — ask, fix, re-fix, adjust, double-check.

So what did I learn? Any Findings?

Copilot (Free Version). The free version of Copilot provided very limited support. Since it couldn’t view the entire codebase, it often lost context even within the same chat session. I found myself re-prompting it frequently with information it had forgotten. It was great for quick snippets and small tasks, but larger architectural changes were a real struggle.

Overall Observations Across All Tools. Every tool I used needed a high level of supervision. Part of that might have been due to imperfect prompting on my part, but even with well-structured, detailed requests, the tools often made errors or missed key steps. It was amazing — and a little frustrating — to see the same pattern: they would fix something but then break it again on a subsequent request.

Prompting Still Matters. Just like last year, prompting is absolutely critical. Success with these tools requires being specific, providing lots of context, and being extremely clear about the expected outcomes. Vague or casual prompts usually led to worse results.

Comparing to Last Year. Overall, the tools are definitely better. I would estimate about a 15–20% improvement in their ability to produce accurate and useful code. This time around, the AI tools were able to help me complete a higher percentage of tasks without major corrections. However, for the truly complicated parts of the app, they still couldn't fully solve them without my intervention.

Final Thoughts

I’m still amazed at how much I was able to complete using these AI tools. There’s absolutely no way I could have moved this fast without them. That said, my experience was critical. Knowing what I wanted, reviewing their output carefully, and guiding the process is what made the project a success. These tools didn’t replace me — they amplified me. I honestly felt like a 3–5x engineer because of them.

I definitely saw about a 15–20% productivity gain compared to a year ago. AI coding tools are getting better and better, but strong engineering fundamentals and experience are still absolutely essential for guiding them to success.

Search This Blog

Bunkers Blog