AWS AI Agent for Software Development Takes on More Complex Tasks
Amazon Web Services (AWS) has released an update to its Amazon Q Developer agent for software development that benchmark tests show can resolve 51% more tasks.
Using a benchmark, dubbed SWE-bench, created by OpenAI that evaluates the ability of an artificial intelligence (AI) platform to resolve software development issues that a Python developer might encounter, the Amazon Q Developer agent score has increased since first being made available from 25.6% tasks resolved to 38.8% on the verified dataset and from 13.82% to 19.75% on the full SWE-bench dataset.
Neha Goswami, director of engineering for Amazon Q Developer, said those results show that over time AI agents such as Amazon Q Developer continue to evolve in ways that will, for example, take advantage of advances in reasoning capabilities enabled by large language models (LLMs) to resolve increasingly more complex tasks.
In the meantime, many developers are already using the natural language interface that Amazon Q Developer exposes to analyze existing codebases and execute code changes in minutes, noted Goswami.
That capability, in turn, is making it easier for organizations to stay current with the latest generations of updates to programming languages by, for example, reducing the level of toil required to update to the latest version of Java, she added.
Longer term, generative AI tools such as Amazon Q Developer will make it easier to convert code written in one programming language to another, said Goswami.
In general, AI agents are trained to perform a much wider range of complex tasks. The Amazon Q Developer agent can open, create, and close files, select and deselect code chunks, find and replace code, and reverse changes if needed.
The response of the tools invoked is then incorporated in an updated prompt that is provided back to the LLM to decide its next actions. The agent will autonomously decide that it has generated the appropriate changes to fulfill a request that is then shared with a developer for review. The Q Developer agent is also infused with logic to prevent it from getting stuck in unproductive paths.
AWS has also developed a textcode framework for the Amazon Q Developer agent that makes use of tokens to create representations of code, files and workspaces, which makes it easier for an LLM to discover the elements of a software development environment.
It’s not clear just how many developers have embraced generative AI to not only write code but also manage tasks but as LLMs continue to evolve the pace at which applications are built and deployed should accelerate considerably. Today most of the benefits derived from generative AI result in more code being written faster, but as the reasoning capabilities of LLMs improve DevOps workflows that are used to deploy applications should also become more automated. The amount of software that might be deployed in the next few years could far exceed what has been deployed in the past decade.
The challenge is understanding what tasks generative AI agents are capable of performing well today versus planning for a forthcoming wave of more advanced agents that will drive the next era of DevOps.
link