Did the makers of Devin AI lie about their capabilities?
Scott Wu, the CEO of Cognition Labs, had fervently championed Devin’s prowess, contending that it could rival human software engineers. This contention positioned Devin AI not merely as a technological advancement but as a potential substitute for human professionals in this highly specialized domain. The fundamental premise appeared too idealistic, and as events unfolded, it proved to be just that. A comprehensive video analysis, released by ‘Internet of Bugs,’ illustrates Devin AI grappling with elementary tasks on Upwork, a renowned freelancing platform where it underwent real-world testing. This performance starkly contradicted the touted capabilities, revealing Devin’s inability to accomplish tasks well within the purview of a proficient human software engineer.
- Devin is marketed as having the capability to tackle various tasks on Upwork. However, in the video demonstration, the task it attempted to solve didn’t align with the stated requirements of the client, who had requested setup instructions rather than code.
- Devin is depicted as rectifying errors in the source code of a GitHub repository. Yet, the files it purportedly edited do not exist in that repository, and some of the errors it corrected are nonsensical, unlike those typically made by humans. This suggests that Devin might be rectifying bugs in files it generated itself, although this is not explicitly stated.
- The task didn’t necessitate any coding at all because the README file in the repository contained all the necessary instructions to complete the task, with only a minor adjustment needed. Despite the repository being outdated, the instructions still functioned properly with just a simple tweak. Devin seemed to overlook or misunderstand this, as it didn’t read the README or comprehend that it only needed to execute a few pre-existing Python scripts. The video portrayal of the task suggests complexity and sophistication, with a lengthy plan and multiple checkboxes indicating completed work, but in reality, the effort was futile and redundant.
- Devin’s coding alterations are subpar, such as crafting its own low-level file read loop instead of utilizing the standard library correctly.
- Although the video implies that Devin swiftly completed the task, and the video creator managed to accomplish it in about 30 minutes, the chat timestamps reveal that the task actually spanned several hours and even extended into the next day.
- Devin executes nonsensical shell commands like head -n 5 foo | tail -n 5.
The peculiar errors raise questions about the underlying model it utilizes. It’s unlikely that GPT-4 would make such mistakes.
The individual from Internet of Bugs is an enthusiast of AI and utilizes coding AI themselves. However, they highlight that the company behind Devin claims you can “watch Devin get paid for doing work,” which isn’t substantiated by their careful analysis of the video evidence.