Recently, Max Woolf, a senior data scientist at BuzzFeed, conducted an engaging experiment to explore the effects of improving code by repeatedly requesting AI. In the experiment, he used the Claude3.5 language model and proposed a classic programming challenge: writing Python code to find the maximum and minimum values of the sum of the numbers in a million random numbers. Difference.

In the initial version, Claude generated code runs at 657 milliseconds. However, as Wolf continued to enter the simple instruction "write better code", the resulting code was shortened to only 6 milliseconds, and the performance was improved by a full 100-fold. This result is not only eye-catching, but also shows unexpected changes in the process of defining "better code".
On the fourth request to "write better code", Claude unexpectedly transformed the code into a structure similar to an enterprise application, adding some typical enterprise features that Woolf didn't ask for. This suggests that AI may associate “better code” with “enterprise-level software”, reflecting the knowledge absorbed during its training process.
Developer Simon Willison conducted an in-depth analysis of this iterative improvement phenomenon, believing that the language model examines the code from a completely new perspective in every new request. Although each request contains the context of the previous conversation, Claude analyzes it as if it was the first time seeing the code, which allows it to be continuously improved.
However, Woolf found in an attempt to make more specific requests that while this would result in better results faster, there were still some subtle errors in the code that needed human fixes. Therefore, he stressed that precise prompt engineering is still crucial. Although simple follow-up questions can initially improve code quality, targeted prompts will bring significant performance improvements, although the risks will also increase accordingly.
It is worth noting that in this experiment, Claude skipped some optimization steps that human developers take for granted, such as deduplication or sorting numbers first. In addition, subtle changes in the way of asking questions will also significantly affect Claude's output.
Despite these impressive performance gains, Woolf reminds us that human developers are still indispensable in validating solutions and troubleshooting. He pointed out that while AI-generated code cannot be used directly, its ability to creative and tool recommendations is worthy of attention.
Key points:
AI improves code performance through repeated instructions, and the original code run time has dropped from 657 milliseconds to 6 milliseconds.
AI automatically adds enterprise features to code, demonstrating its unique understanding of "better code".
Prompt engineering is still important, and accurate requests can speed up the generation of results, but it still requires verification and repair by manual developers.