Change.org faced a major challenge: moderating thousands of new campaigns daily to ensure they met community guidelines. The Change.org team built a strong foundation for automation, but they wanted to take their content moderation system out of spreadsheets and to the next level.
Enter Fractional AI. Through over 100 experiments, Fractional AI developed an AI system that detects 77% of content violations and reduces the false positive rate by 46%. This new system significantly reduces tedious moderation work, allowing the Change.org team to spend less time reviewing content and more time supporting changemakers.
Change.org is the world's largest platform for social change, enabling anyone to start campaigns and mobilize support.
More than half a billion people across more than 196 countries use Change.org’s petition and campaign tools to speak up on issues they’re passionate about. Approximately 70,000 petitions are created and supported on the Change.org platform every month, with 1.7 million new people joining Change.org’s global network of users every week.
That’s a whole lot of petitions and a whole lot of change.
With over 2,000 petitions a day, not all adhere to Change.org's Community Guidelines and Terms of Service.
Keeping the Change.org community safe means keeping content violations (hate speech, online intimidation, violence, misleading information) off the platform. What makes this challenge particularly tricky is that petitions aren't black and white -- there's a large gray area. What exactly qualifies as a "misleading claim" or as "non-sensical or random content"? Any moderation system has to deftly navigate this subtly and nuance.
To meet this challenge, the Change.org back-office team built an impressive system using Google Sheets, Zapier, and GPT. This moderation system used a series of prompts to produce a quality score to determine which petitions to send for human review.
With zero (!) code, their homegrown system automatically flagged about half of content violations. Change.org’s system was a masterclass in no-code solutions, and they wanted to take this strong foundation out of spreadsheets and to the next level.
Specifically, Change.org wanted to:
Change.org partnered with Fractional AI to build a more robust and scalable AI-powered content moderation system (goodbye spreadsheets!).
Here’s the final pipeline, which integrates directly into Change.org’s tech stack:
The new system keeps harmful content off the site, dramatically reduces human reviewer time, and seamlessly integrates into Change.org's tech stack.
Relative to the project goals:
To find the right balance of achieving goals 1-4 – catching more violations without being too conservative and creating false positives all while keeping costs low – we ran over 100 experiments.
Here's what we learned.
Using a fine-tuned model for Prompt 5 was key to reducing false positives.
Prompt 5 specifically asked the LLM to check for violations, “Check if the post violates any of our content guidelines around hate speech, illegal content, safety, violence, misinformation, fraud, IP infringement, spam, irrelevant to a petition or fraud,” and was yielding conservative results – disproportionately increasing the false positive rate.
To fine-tune a model for Prompt 5, Change.org provided us with 2 datasets that had been labeled by human moderators. Each dataset represented a single, full day of petitions (the quality of this dataset is key). We used one dataset to train and one to test.
Here’s a look under the hood on how we fine-tuned GPT 3.5 with OpenPipe.
Asking Anthropic to rewrite the prompt had a moderate effect. We found this specifically useful for Prompt 3.
Prompt 3 before
Prompt 3 after the Anthropic Prompt Generator
We used Langchain’s structured output parsing facility to coerce the LLM output of each question to a specific format including confidence score, rationale, and label decision (e.g. “remove”/“allow”) enabling better chain-of-thought prompting.
While certain steps require customization (e.g., fine-tuning GPT 3.5 on Change.org’s specific data), there’s a lot we could leverage ‘off the shelf.’
Specifically, the OpenAI API moderation endpoint is a tool that anyone can use to check if text is harmful on a number of common dimensions (harassment, hate, sexual content, minors, violence, self-harm). We inserted a step at the beginning of the pipeline calling to this endpoint.
Running Prompt 3 multiple times and taking the most common recommendation (e.g., if “Remove” was the recommendation 3 times and “Allow” was the recommendation 2 times, choosing “Remove”) did not provide better results.
Including example classifications of content types as part of Prompt 1 did not yield more accurate classifications.
This experiment came about as a way to test Anthropic’s Opus while minimizing cost per run by replacing 5+ sequential prompts with one large, detailed prompt. Initial results were promising, but we lost the granularity and specificity we were able to achieve with a multi-prompt pipeline.
AI for UGC - Moderating user generated content is a strong use case for genAI.
No-Code Options as a Starting Point- Non-technical teams can build impressive automated workflows using tools like Zapier, which can then be enhanced by AI engineers.
Fine-tuning was the big winner here - Fine-tuning GPT 3.5 with Change.org’s specific data was ultimately the unlock to reducing false positives.
Documentation is crucial - When balancing multiple goals (e.g., catch more violations, reduce false positives, keep cost low), robust documentation of experimentation is necessary to track regressions and incremental improvement.