I’ve scoped hundreds of applied AI projects at Fractional AI, and one of the most common questions I get from customers looking to automate workflows is “how do I make my AI system self-reinforcing?”
This question takes different forms:
The answer: there’s no magic, “set it and forget it” self-learning feature built into LLMs.
LLMs have a fixed knowledge base after initial training and don’t inherently ‘learn’ from subsequent interactions out-of-the box. That said, there are practical approaches to incorporating continuous feedback into your AI project.
These approaches broadly fit into three buckets:
Table stakes for any approach is being intentional about data collection. This includes thoughtful planning around what data you can actually collect, how you’ll collect and maintain it, and then how you’ll actually use that data.
This approach isn’t the flashiest, but it’s the most practical and widely applicable. The idea is straightforward: launch your AI workflow, gather data consistently after deployment, and revisit the workflow at regular intervals (e.g., every six months) to iterate based on the data you’ve collected.
Take the use case of automating content moderation, like we did for Change.org. Imagine you’ve built an AI system to automatically remove user posts that violate your website’s guidelines.
After launching the system, you’d collect ongoing feedback. Say the website allows users to flag inappropriate posts, which are then reviewed by the QA team to determine if the AI moderation system made an error and why. Every six months, you could revisit your AI pipeline to incorporate this feedback.
For example, if the AI system frequently fails to remove posts selling products or asking for money, you could refine the relevant prompts to explicitly instruct the LLMs to exclude sales-related content.
This approach applies specifically to AI workflows that include a fine-tuned model—a foundational model that’s been trained on a specific set of input/output data tailored to your use case. Regularly re-fine-tuning this model with the most up-to-date data allows your AI workflow to stay relevant and improve over time.
Building on the content moderation example, imagine your initial system includes a step in the pipeline with a fine-tuned model that was trained on a human-labeled dataset of historical user posts, the moderation outcomes of those posts (whether they were removed or allowed from the site), and the reasons for those moderation decisions.
At regular intervals, you can re-train this model using the latest data. For instance, if the past six months included an election cycle and the fine-tuned model struggled with political content, you could re-fine-tune it with user posts from that period and their subsequent, expert-approved moderation decisions.
This approach is harder to generalize, as it depends heavily on the specifics of your AI pipeline and use case. At its core, it involves designing steps in your workflow that directly incorporate feedback or results from prior runs to inform and improve future outputs.
Let’s say you have an AI-powered customer support bot that’s been built with a RAG approach based on an internal knowledge base of company policies.
A traditional RAG workflow would involve breaking down content in the knowledge base into smaller chunks, vectorizing these chunks, and then comparing a given input to the most similar chunks amongst the vectorized content (e.g., what chunks of the knowledge base are most similar to a customer’s question?).
If you’re thinking about continuous learning, you might build a UI that enables customers to rate the chatbot’s response on a scale of 1-5 stars. If a user rates a response a 1, your workflow would escalate it to QA for human review, where the reviewer creates a better response to answer that question. You’d then add a step to pull the new human-written response back into the database powering the RAG workflow so the next time a customer has a similar question, it’d pull from the updated database, which includes the better, human-written response. Of course, there could be simpler workflows than pulling human re-written responses into the underlying RAG database – for example, you might see that 1 star responses just fall into 2 categories and choose to edit the prompts referring to those categories instead.
–
Overall, while there’s no silver bullet, there are a number of ways to incorporate feedback to continuously improve your AI workflow. If you’re unsure where to start, it’s a safe bet to focus your energy on intentional data collection from the onset – this will give you the most options down the road.
–
Eddie Siegel is the Chief Technology Officer at Fractional AI. Before launching Fractional AI, Eddie was CTO of Xip, CEO of Wove, and VP of Engineering at LiveRamp.