Even though AI is a now daily tool for me, I’m still trying to understand what is its true role.
Particularly when it comes to software development, and creating products that can be shipped to real users.
The eagerness of the AI to spit out hundreds of lines of code and files just at the prompt of a… prompt is amazing but we kind of understand by now that the results are often more impressive than useful (see recent DORA reports).
Enter PRDs and Spec-Driven Development where we are trying to set the stage for the LLM to do its thing but to follow our conventions, rules and guidelines so that the code that is generated follows our intent.
And we go on discovery journeys with our LLM of choice, sometimes feeding info from one LLM to the other, or creating specialized subagents that only think design, or dev, or business, or infrastructure and then consolidate everything to create the document that will make sure that we get what we want i.e. stuff that works.
We can quickly create loads of documentation that states what we want to build, how to do it, what rules to follow so that we can try to break the probabilistic nature of the LLM and turn it into more of a deterministic machine.
In a way it feels like using the !important rule in CSS when we want to quickly override some previously existing rule, both in its approach (don’t waste time looking for the cause of why this style is like this and not like that, just “!important” it) as well as in the final result (sometimes it works, sometimes it doesn’t).
## The BDUF Paradox
But most of all this feels like a Big Design Upfront (BDUF) strategy. Do we really know everything at the start of the project that we can inform the LLM so that it can create a viable PRD? Or does this mean that we have to review it periodically as we learn new stuff? And how big should this PRD or Spec be? And if its not always producing the output as we expected why use it altogether? Is this waterfall again?
I’m not saying its not worth it to use LLMs, but it feels that it has to be used differently in different settings (which makes sense), and that the gains are different in any of these settings:
– using it in a new project as a scaffolding tool?
– using it in an existing code base to add new stuff (a new module that can work independently of what already exists)?
– using it in an existing code base to change an existing behaviour
– using it to learn and understand existing code
I’m still learning, but for now I’m getting the most value from the last option, learning and understanding, but that’s probably because I’m not programming everyday.
## Testing Reality, Not Tokens
One of the things I still want to try is adding some additional tools that may serve as guardrails, and I don’t mean text as guardrails but actual tools.
One would be Mutation Testing, because of course the LLM will generate unit tests because we’re telling it to do so but are they effective? I suppose running something like Stryker (http://stryker-mutator.io/) after the tests may help in validating if we’re really testing what needs to be tested.
The other is to add something like Archunit (https://www.archunit.org/) to test if the application architecture still holds after each coding session.
Maybe adding this layer will force the LLM to align itself with reality and not with what would be the next best token…

Deixe um comentário