Exploring AI Coding Tools: Can They Really Build Software for You?

TL;DR

AI tools like ChatGPT-4 and GitHub Copilot, in their current state, have incredible potential to boost productivity. While they still require substantial hand-holding, reviewing, and slight alterations at nearly every step, they enabled me to complete my project 75%-80% faster thanks to my coding expertise. By using these tools, I was significantly more capable of getting features done in far less time. Though AI coding companions are promising, they are not yet fully autonomous—engineers remain essential in guiding and refining the work to achieve optimal results. It will be interesting to see how long it takes before these tools evolve to a point where engineers may not be needed for the majority of the development process. For the full story, continue reading below.

Introduction: My Objective

As an engineering leader with 20 years of development experience, I aim to help my engineering teams work faster and more efficiently. To do that, I wanted to explore the current state of AI coding tools, specifically ChatGPT-4 and GitHub Copilot, and determine how close we are to using them to build software with minimal intervention. While I didn’t try every AI tool available—there are many other options out there that might yield different results—I used the tools that were easily accessible to me. The best way to understand their capabilities was to dive in and try it myself.

About Me

I’m an engineering leader passionate about software development and building high-performing teams. Over my 20-year career, I’ve tackled numerous projects across different tech stacks and domains. My goal is to leverage the latest AI tools and techniques to speed up development cycles, ensure high quality and improve productivity across teams. To see if these AI tools are genuinely ready, I decided to put them to the test. What better way to test it out than by building something end-to-end?

The Project: Building an LLM Aggregator

For this experiment, I chose a project that would involve a bit of everything: front-end development, backend services, and database integration. I set out to build an LLM aggregator—a platform designed not only to provide users with a single chat interface for interacting with multiple Large Language Models (LLMs) but also to serve as a unified API. The goal was to simplify the development process for other teams by allowing them to code once and access various LLMs through this aggregator, instead of integrating with multiple, different LLM interfaces. Additionally, I wanted to ensure the application offered a seamless user experience, making it easier for developers and users alike to interact with the platform. The application would have:

Frontend: Built using ReactJS to create a user-friendly chat interface.
Backend: Implemented with Node.js and Express for RESTful services.
Database: Integrated with PostgreSQL for storing user and configuration data.
Automated Testing: Ensuring quality through automated tests.

The aggregator would allow users to configure both external LLMs (like Gemini, Groq or ChatGPT) and internally hosted LLMs (such as Llama). It was a fun and challenging project, and it gave me a comprehensive way to test how AI tools could help across various layers of the stack.

The Tech Stack

I decided to use Node.js for the backend services, utilizing Express for setting up APIs. ReactJS was chosen for the frontend to create the chat interface, providing a seamless and interactive user experience. For the database, I integrated PostgreSQL to store configuration and user data efficiently. Finally, I used Jest for automated testing, ensuring that the application met quality standards throughout its development.

So How Did it Go?

Getting a project up and running takes a significant amount of effort, particularly when it comes to setting up the initial structure and organization. I decided to leverage AI right from the start, using ChatGPT-4 to help create the project structure, set up folders, and establish the basic organization of my coding environment. My hope was that AI could jumpstart the process, saving me time and giving me a solid foundation.

However, I quickly realized that it wasn't as simple as just “ask for code,” “copy code,” and “paste code” into the project. The output I received was often incomplete or didn’t fully align with my requirements. To make the AI-generated code functional, I needed to review, tweak, and troubleshoot regularly. This process involved a lot of trial and error, but after a few iterations, I found a rhythm. I began by clearly defining the feature I needed, then asked ChatGPT to generate the code. Following that, I would troubleshoot any issues, sometimes with additional help from ChatGPT, and then refine the output until it was seamless.

As I progressed, I discovered that “prompt specifics” were crucial. Initially, I was vague in my requests, expecting the AI to fill in the gaps. But the more precise and detailed my prompts became, the less time I spent adjusting the code myself. It became evident that effective “prompt engineering” was key to making the process efficient. I had to specify not only the functionality I wanted but also coding conventions, frameworks, and dependencies. With practice, I got much better at crafting these prompts, and the results improved significantly. In retrospect, it seems obvious, but at the time, I had higher expectations of the system’s capabilities without needing as much guidance.

I also used GitHub Copilot to provide real-time coding suggestions as I worked. While it was useful for smaller pieces of code, such as functions or single lines, integrating it with the broader structure often required further refinement. It was clear that while these AI tools could accelerate development, they weren’t yet at a stage where you could fully automate the initial setup without careful human oversight and direction.

Where Things Were Difficult

The development process revealed several areas where AI tools struggled, each presenting its own set of challenges.

Simple Cut and Paste. First, I observed that about 95% of the time when I asked for code from the AI, I couldn’t simply take the code and paste it directly into my project. While AI tools like ChatGPT and Copilot provided useful code snippets, they often lacked the specificity or context needed to fit seamlessly. For instance, when building out backend routes or database interactions, the AI sometimes omitted essential configuration details or used outdated methods. This meant I had to spend additional time debugging and adjusting the code manually to ensure it was up-to-date and consistent with the rest of my codebase
Coding Styles & Prompting. Another challenge was the inconsistency in coding styles. AI often produced different coding approaches for similar tasks, which made the overall codebase feel fragmented. To address this, I started to be more deliberate in my prompts, specifying coding conventions, file structures, and preferred frameworks. This helped align the generated code with the standards I wanted, but it also highlighted that without precise guidance, AI can drift into varying styles. This lack of consistency required me to invest extra time in aligning and refactoring code segments.
Code Flow Through Multiple Files. Additionally, when I requested code that spanned multiple files or involved multiple components working together, the AI struggled. For instance, if I needed a feature that involved both frontend and backend code or needed integration with a database, the AI often missed key pieces of the implementation. It might provide the backend API correctly but fail to generate the corresponding frontend hooks or vice versa. This meant that I had to manually fill in the missing elements and debug the integration points, which was time-consuming and sometimes complex. The AI tools are clearly more proficient when dealing with isolated pieces of code rather than interconnected components.
LLM Strengths. I also discovered differences in tool capabilities. ChatGPT performed significantly better when generating full class files or handling more complex logic, such as React component structures or backend services with multiple endpoints. It could provide a broader view and more comprehensive solutions. On the other hand, Copilot excelled at writing concise, single-function code or making quick improvements to existing code snippets. It was highly effective when I needed small enhancements or modifications, but it struggled to provide the holistic solutions required for larger modules or files.
User Experience Development. Finally, working with UI components in React was another area where I noticed a stark difference. ChatGPT outperformed Copilot in generating complete, functional components. It provided coherent and often more accurate solutions that integrated well into my React project. Copilot, while helpful for individual methods or style adjustments, needed more context when handling the broader structure of React components, leading to partial solutions that I had to patch together.

These challenges emphasized that while AI tools are powerful, they are not yet ready to fully replace the human touch, especially when working with interconnected features or building out larger components. Engineers still need to guide the process, ensuring consistency and filling in the gaps where AI falls short.

Conclusion: Are AI Tools Ready for Engineers?

After working through the project, I was pleasantly surprised by how much faster I could develop features using AI coding companions. In past projects, building similar functionality would have taken me considerably longer. With AI, I estimated a 75%-80% improvement in speed for my capabilities.

So, should engineers use these tools now? Absolutely. They act as powerful companions, significantly accelerating development—especially when building small functions or UI components. However, engineers are still essential for shepherding projects from start to finish, ensuring consistency, and filling in gaps where AI currently falls short.

Another thing that struck me during this project was how different engineers might experience the benefits of AI coding assistance. For a 10x engineer—someone with deep experience and instinct for the right solutions—these tools may not provide as much of an advantage, since they can rely on their past knowledge and intuition. However, for entry- and mid-level engineers, AI tools can elevate their capabilities and help them perform at a much higher level than their current skillset might suggest. This makes a strong case for implementing processes to train younger engineers on how to effectively use these tools. But it's important to ensure they understand the code being generated so that they learn from the process and gain the experience necessary to progress in their careers.

A word of caution: developers must be mindful of the code they share with these third-party AI tools. Be smart and ensure you adhere to any policies your company may have regarding sharing code with external platforms. A good approach is to send small, isolated portions of code rather than entire files to protect your intellectual property while still leveraging the benefits of AI assistance.

While the promise of AI in software development is exciting, we’re not quite at the point of building full applications without human oversight. But the future looks promising, and for engineers who know how to code, AI tools are becoming invaluable partners in the development process.

I hope you enjoyed this journey into AI-assisted coding. Have you tried building with AI tools? I’d love to hear your experiences!

Search This Blog

Bunkers Blog