Self-Improving Software

By Félix Malfait

When you learn something hard for the first time—driving a car, speaking a language, writing a function—you rely on conscious effort. You analyze, you hesitate, you make decisions. But over time, your brain does something clever: it compiles those decisions into reflexes. You stop thinking and just act.

That transition from reasoning to instinct is what lets humans operate with speed and confidence.

  1. When facing a new task, the brain uses the prefrontal cortex, which is slow, energy-intensive, and uncertain
  2. If the same choice or action consistently works, dopamine reinforces the neural pathway
  3. Eventually, control moves to basal ganglia / cerebellum, which are fast, automatic, and unconscious

When designing workflows for Twenty, we’ve seen seen a similar pattern emerge. If a user wants to set up a new automation, the quickest path is now to ask an LLM to perform the task (prefrontal cortex). But as the customer starts depending on it, cracks appear, and latency or costs can be millions of times higher than in a traditional workflow.

Fig. 1 – Stochastic workflow in Twenty Fig. 2 – Deterministic workflow in Twenty
Stochastic approach for Twenty workflows Deterministic approach for Twenty workflows

We discovered that the future of Twenty’s workflow feature lies in a hybrid approach, combining both AI and fixed code. This approach mirrors the neural pathway by analyzing patterns in stochastic processes and transforming them into established rules.
For example, a lead assignment system starts off being generated by an LLM. But once the behavior is stable—e.g. always assigning based on region and sales rep seniority—we remove the model entirely. What’s left is a clean function that’s easy to reason about. It behaves like a system written by a careful engineer. But it started with a vibe.

We believe the future lies in evolving software that integrates feedback loops and can self-improve, becoming more refined and personalized with each use while maintaining a specialized core. AI serves as an engine for continuous refinement, enabling the software to learn and adapt with every interaction. Over time, the system solidifies effective patterns and automates successful processes, balancing AI’s flexibility and creativity. This approach leads to software that is not AI-native, but AI-refined.


So what will new software architectures look like, and which concepts should we lean on to get there? Here are three themes I suspect will matter in the coming year to move our industry forward.

Multi-tenancy with Micro-VMs, Micro-Frontends and Shared Databases

One lesson from past software platforms is to avoid reinventing the wheel for each user or task. A well-designed multi-tenant system allows many use cases to share the same infrastructure safely and efficiently. For AI-powered applications, this means we can’t have isolated, monolithic AI agents each doing nearly the same work in parallel silos—we need shared resources and environments.

However, multi-tenancy comes with trade-offs: shared infrastructure limits how much each user can customize, in order to maintain stability, scalability, and ease of upgrades. For instance, Odoo (a popular open-source ERP), offers Odoo.sh which is a PaaS where each company gets its own server and can run custom code, while Odoo.com —the SaaS version— restricts customization but ensures seamless upgrades for all users. It is difficult to get the best of both worlds. Ultimately, more flexibility can mean a fragmented, buggy ecosystem, while stricter controls provide reliability at the expense of individual customization.

In today’s SaaS world, very few platforms manage to offer real extensibility within a multi-tenant model. Salesforce, ServiceNow, and Shopify do, but it’s rare and expensive to develop because it requires a carefully designed abstraction layer that exposes power without compromising the core.

To support this, there are areas of research we’ve been working on to build the future of Twenty:

  • Micro-VMs for backend-code isolation: A micro-VM is a lightweight virtual machine that combines the strong isolation of traditional VMs with the speed and low overhead of containers. Technologies like Firecracker or Lambda allow us to run many secure, sandboxed workloads on one host with minimal startup time. This provides safety while isolating each piece of custom code.

  • Micro-Frontends for modular UI: The goal here is to take a tree of DOM elements generated by untrusted code in a sandboxed JavaScript environment, and render them to the main page DOM in a different JavaScript environment. In practical terms, an AI-driven product can allow different components to be developed and deployed in isolation, then stitched together in the user’s browser. Each team or feature can iterate without affecting the others, yet the user sees a cohesive application. This decoupling can allow teams to rapidly deploy a new AI-driven UI panel without risking the stability of the entire app.

  • Shared database systems: A platform like Salesforce largely owes its success to the ability for every tenant to customize their experience via an app-defined metadata layer. The genius of Salesforce’s ecosystem relies on the combination of Standard SaaS-objects such as Contacts or Companies, to enable apps to speak a common language, with package or workspace-defined objects that lets each business customize its app to their exact needs.

Imperative engine vs. Declarative configuration

Fig. 3 – Prolog: a declarative programming language Fig. 4 – Twenty’s “core engine” vs “metadata”

The multi-tenant approach solves efficiency and reuse, but building this shared core helps us address a deeper problem we’ve encountered: LLMs are bad software architects. An LLM can generate a function that works in isolation, but it has no true grasp of which abstractions matter in a large system. It doesn’t see how state flows through the layers of your app. It doesn’t remember what assumptions were made two years ago. As a result, even when the AI’s code works, it can be brittle. It tends to violate abstractions, ignore edge cases, or introduce subtle bugs. It often creates a narrow, locally-valid solution that falls apart when integrated into the whole.

You can’t fix it just by prompting the AI to “write clean code” or “use the repository’s conventions”. The best answer we’ve found is to make the multi-tenant core more opinionated. The more structure we can impose — the more we rely on clear contracts instead of convention or guesswork — the better the AI performs.

In fact, we’ve found that the vast majority of AI-related feature work has nothing to do with LLMs directly. It’s about good old software architecture: API design, data modeling, interface contracts, traceability, error handling. In other words, we spend most of our time building the scaffolding so that when an LLM is asked to fill in the blanks, it doesn’t break the whole building. We’re effectively doing the same kind of work developers have always done. But now AI exposes every architectural weakness, making strong design even more essential.

Internally, we’ve started splitting our codebase into two parts:

  • The core engine, which is a more abstract layer with imperative programming
  • The business layer, which uses a declarative format to declare business object (e.g. we have an opportunity objects which is pre-provisioned with a Kanban board)

Creating this distinction has helped us a lot. We’ve observed that LLMs were excellent at transcribing business use-cases into declarative code. Declarative programming creates a constrained environment with few architecture decisions, in that context we could easily delegate entire parts of this codebase to be written by agents. Business apps will be mostly written by the community or AI through declarative code. On the other hand, the core engine is much more abstract and maintained by our core team, LLMs are then used as support to write code faster but never to handle an end-to-end task.

Each era of software gets a new layer of abstraction and I wouldn’t be surprised if declarative programming makes a comeback on the backend – we’ll see more full-stack frameworks emerge that lets LLMs define what needs to happen in high-level terms, then the framework figures out how to do it within safe boundaries. The more we can separate business rules from their implementation, the less room there is for AI to introduce chaos.

Embedded feedback loops as the new competitive moat

The biggest opportunity in AI products today is to create robust feedback loops. Think about how human developers work: no one writes perfect code on the first try. Instead, you write, run, observe what breaks, and iterate. Yet, many AI coding agents still operate in the dark—they generate code without ever running it or seeing the results. For example, tools like Cursor’s background agent don’t yet come with a seamless way to spin up docker environments or headless browsers, making real feedback integration cumbersome. While huge investments are made to enhance LLMs, the supporting software infrastructure lags behind and must catch up to leverage these advancements more than we need LLMs to improve.

To close this gap, we need to more tightly integrate our tools so that LLMs can receive immediate, actionable feedback from their outputs. If an error or unexpected result occurs, the system should capture this and feed it back into the AI’s process, enabling it to learn and improve with every iteration—just like a human developer does.

But there’s a deeper, meta-level opportunity beyond just closing the loop at the tool level. It’s not enough for the agent to simply record lessons in a persistent note (like adding observations to an Agent.md file), or even to remember facts across sessions, as with ChatGPT’s memory. True self-improvement comes when the meta-software can observe patterns in tool usage and outcomes, then proactively adjust its own settings, workflows, or tool selection based on that feedback. In other words, the system should not just accumulate lessons, but adapt its configuration and processes to become more effective over time.

That’s why so many recent YC startups are building “Cursor for X” products, they recognize the value of feedback loops. But as with the “swipe” in dating apps, each vertical will need its own interface patterns to make giving feedback easy and engaging. The future lies in designing systems that not only collect short-term feedback, but also use it to drive meaningful, automated improvements at both the tool and meta-software levels.


Good news: AI isn’t the end of SaaS!

There’s a common narrative that LLMs will “kill” all startups, that as models get smarter they will commoditize and squeeze out all software.

It is true that rigid traditional one-size-fits-all SaaS will look increasingly outdated. But this shift creates space for a new generation of software: flexible platforms that AI can configure, extend, and personalize on the fly.

Our job as developers is evolving. Instead of shipping monolithic, rigid products, we’re now responsible for designing the right abstraction layers and assembling the right set of “Lego blocks”—the primitives and interfaces that AI and users can combine in endlessly creative ways. A software that can be continuously reconfigured to fit each business, without the pain of multi-year implementation projects. AI won’t rebuild everything from scratch; it will just help snap together and personalize robust, well-designed foundations. And the winners in this new era will be those who build the best blocks—and the smartest systems for assembling them.