Devin & the future of AI agents
This was the first good example of an AI agent that can reliably plan and complete a complex task. Projecting this out, AI agents will do a lot of workflows vs. if then conditions, humans in the loop. RPA will be reinvented with AI and pushed downstream from the F500 to every company.
- AI-first, human in the loop. Co-pilot to pilot. Github copilot helps engineers code. With Devin, engineers help Devin as it codes. More job roles will move from doers to managers, from creators to auditors. I agree this is just a starting point (solving 13% of issues), and most agentic workflows get things wrong 9/10 times, but if we project this out we will get to a future where agents get things right most of the time.
- Incumbents or upstarts - a longstanding debate. If upstarts can innovate fast and unconstrained by prior baggage, they can beat incumbents with distribution but are slow to innovate. I’d imagine that Github was thinking copilots because the starting point is “we have the developers, let’s amplify how they work”. Devin is changing the jobs to be done, changing the workflow, and changing the UX. We will see more of this AI-first reimagination across business functions in the next few years. And this will come from the upstarts, not from the incumbents.
- Come for the model, stay for the workflow. Devin uses GPT4 under the hood but has done enough work on top to make it usable in a specific scenario.
- The economics don't work. For now. It costs too much, on top of GPT4. Applications still need LLM costs to come down by a factor of 10 to be economically viable in most cases. But we’ll get there.
- For AI to work, APIs are the key. What happens to all the APIs? AI workflows will only be successful if software remains open and accessible. But incumbents might build walled gardens to keep new companies out. We’ve seen this from Twitter, Reddit, etc., so prevent the use of data for training models. Will companies go this far to protect their business?
Other thoughts
- How will it work through more complex, planning heavy tasks?
- Will GPT5 train GPT6?
- Old-world PR is broken. B2B startups, esp. developer tools, need a Twitter + influencer launch strategy.
PS: In all the noise, many people missed the OpenAI+Figure.ai launch - I’m a lot more excited about this - Gen AI + physical world applications is a massive, massive opportunity. Pay more attention to this one.