The answer to “how do I make my gen AI workflow get smarter over time?”

January 16, 2025

I’ve scoped hundreds of applied AI projects at Fractional AI, and one of the most common questions I get from customers looking to automate workflows is “how do I make my AI system self-reinforcing?” 

This question takes different forms:

  • “Will the model continuously improve?”
  • “We’re collecting all this valuable (proprietary!) data, how can we make sure we're incorporating it?”

The answer: there’s no magic, “set it and forget it” self-learning feature built into LLMs. 

LLMs have a fixed knowledge base after initial training and don’t inherently ‘learn’ from subsequent interactions out-of-the box. That said, there are practical approaches to incorporating continuous feedback into your AI project. 

These approaches broadly fit into three buckets: 

  1. Collect performance data and manually revisit your AI pipeline every n months
  2. Re-train any fine-tuned models in your AI pipeline
  3. Intentionally build automated reinforcement mechanisms into your AI pipeline

Table stakes for any approach is being intentional about data collection. This includes thoughtful planning around what data you can actually collect, how you’ll collect and maintain it, and then how you’ll actually use that data. 

1. Revisit your AI pipeline every 6 months

This approach isn’t the flashiest, but it’s the most practical and widely applicable. The idea is straightforward: launch your AI workflow, gather data consistently after deployment, and revisit the workflow at regular intervals (e.g., every six months) to iterate based on the data you’ve collected.

Example: Automating Content Moderation

Take the use case of automating content moderation, like we did for Change.org. Imagine you’ve built an AI system to automatically remove user posts that violate your website’s guidelines.

After launching the system, you’d collect ongoing feedback. Say the website allows users to flag inappropriate posts, which are then reviewed by the QA team to determine if the AI moderation system made an error and why. Every six months, you could revisit your AI pipeline to incorporate this feedback.

For example, if the AI system frequently fails to remove posts selling products or asking for money, you could refine the relevant prompts to explicitly instruct the LLMs to exclude sales-related content.

Pros and Cons

  • Pros:
    • This method is adaptable to nearly any AI workflow. 
    • It doesn’t require extensive development work to  build complex mechanisms into your system upfront; instead, it leverages periodic updates to drive improvements.
  • Cons:
    • You're collecting data that sits unused for a while – this is a periodic bulk update, not close to real-time learning. 
    • You’re not making the best use of the record-level data. To continue with the content moderation example, if you have 100 sales-related posts that were flagged as inappropriate, you’re condensing those posts into changes to prompts to better handle sales-related posts. You’re training the developer, not the model. 
    • You still have to go back and do more dev work to take advantage of the data. It may not be immediately clear what adjustments to make to the pipeline to address the concerns in the feedback, requiring experimentation. Realistically many orgs will say they'll do this and then will never prioritize it.

2. Re-train any fine-tuned models

This approach applies specifically to AI workflows that include a fine-tuned model—a foundational model that’s been trained on a specific set of input/output data tailored to your use case. Regularly re-fine-tuning this model with the most up-to-date data allows your AI workflow to stay relevant and improve over time.

Example: Automating Content Moderation

Building on the content moderation example, imagine your initial system includes a step in the pipeline with a fine-tuned model that was trained on a human-labeled dataset of historical user posts, the moderation outcomes of those posts (whether they were removed or allowed from the site), and the reasons for those moderation decisions.

At regular intervals, you can re-train this model using the latest data. For instance, if the past six months included an election cycle and the fine-tuned model struggled with political content, you could re-fine-tune it with user posts from that period and their subsequent, expert-approved moderation decisions.

Pros and Cons

  • Pros:
    • This approach ​​truly and directly leverages your proprietary data asset. It’s also relatively straight-forward to implement. Whereas technique 1 is more like a developer learning from the feedback and then running different experiments informed by that data to try to improve results, re-training is a systematic way to incorporate the latest data, requiring less guesswork and less developer time. 
  • Cons:
    • It’s only viable if your original pipeline includes a fine-tuned model, which isn’t always going to be the case.
    • Consistent data collection is doubly essential for this to work. You need to make sure that 6 months from now, you have access to the original fine-tuned dataset and any new data collected in exactly the same format. And you need to know exactly how to go in and update the model. There are ways to make this easier (e.g., using a tool like OpenPipe), but setting up these tools also requires foresight.

3. Intentionally build automated reinforcement mechanisms into your AI pipeline

This approach is harder to generalize, as it depends heavily on the specifics of your AI pipeline and use case. At its core, it involves designing steps in your workflow that directly incorporate feedback or results from prior runs to inform and improve future outputs.

Example: Automating Customer Support

Let’s say you have an AI-powered customer support bot that’s been built with a RAG approach based on an internal knowledge base of company policies. 

A traditional RAG workflow would involve breaking down content in the knowledge base into smaller chunks, vectorizing these chunks, and then comparing a given input to the most similar chunks amongst the vectorized content (e.g., what chunks of the knowledge base are most similar to a customer’s question?). 

If you’re thinking about continuous learning, you might build a UI that enables customers to rate the chatbot’s response on a scale of 1-5 stars. If a user rates a response a 1, your workflow would escalate it to QA for human review, where the reviewer creates a better response to answer that question. You’d then add a step to pull the new human-written response back into the database powering the RAG workflow so the next time a customer has a similar question, it’d pull from the updated database, which includes the better, human-written response.  Of course, there could be simpler workflows than pulling human re-written responses into the underlying RAG database – for example, you might see that 1 star responses just fall into 2 categories and choose to edit the prompts referring to those categories instead. 

Pros and Cons: 

  • Pros: 
    • This method offers ample flexibility and creativity to experiment with different AI engineering techniques based on the specifics of your workflow,  and depending on how you design it, can get you closer to something that is ‘self-reinforcing’ in real time.
  • Cons: 
    • There’s no ‘one-size-fits-all’ playbook – the specifics and implementation of this approach aren’t as repeatable, necessitating more custom AI engineering time to build in self-referencing steps that won’t always generate a large lift in results. 

Overall, while there’s no silver bullet, there are a number of ways to incorporate feedback to continuously improve your AI workflow. If you’re unsure where to start, it’s a safe bet to focus your energy on intentional data collection from the onset – this will give you the most options down the road.

Eddie Siegel is the Chief Technology Officer at Fractional AI. Before launching Fractional AI, Eddie was CTO of Xip, CEO of Wove, and VP of Engineering at LiveRamp.

Explore other blog posts

see all