How Fractional AI Automated Content Moderation for Change.org

Change.org faced a major challenge: moderating thousands of new campaigns daily to ensure they met community guidelines. The Change.org team built a strong foundation for automation, but they wanted to take their content moderation system out of spreadsheets and to the next level.

Enter Fractional AI. Through over 100 experiments, Fractional AI developed an AI system that detects 77% of content violations and reduces the false positive rate by 46%. This new system significantly reduces tedious moderation work, allowing the Change.org team to spend less time reviewing content and more time supporting changemakers.

Who is Change.org?

Change.org is the world's largest platform for social change, enabling anyone to start campaigns and mobilize support.

More than half a billion people across more than 196 countries use Change.org’s petition and campaign tools to speak up on issues they’re passionate about. Approximately 70,000 petitions are created and supported on the Change.org platform every month, with 1.7 million new people joining Change.org’s global network of users every week.

That’s a whole lot of petitions and a whole lot of change.

Problem

With over 2,000 petitions a day, not all adhere to Change.org's Community Guidelines and Terms of Service.

Keeping the Change.org community safe means keeping content violations (hate speech, online intimidation, violence, misleading information) off the platform. What makes this challenge particularly tricky is that petitions aren't black and white -- there's a large gray area. What exactly qualifies as a "misleading claim" or as "non-sensical or random content"? Any moderation system has to deftly navigate this subtly and nuance.

To meet this challenge, the Change.org back-office team built an impressive system using Google Sheets, Zapier, and GPT. This moderation system used a series of prompts to produce a quality score to determine which petitions to send for human review.

With zero (!) code, their homegrown system automatically flagged about half of content violations. Change.org’s system was a masterclass in no-code solutions, and they wanted to take this strong foundation out of spreadsheets and to the next level.

Goal

Specifically, Change.org wanted to:

Catch more content violations: The system caught about half of violations, but half of dangerous petitions were falling through the cracks and having to be flagged by end users. ‍
Reduce false positives: 50% of the petitions marked for manual review were false positives (petitions flagged by the homegrown system as violations that were not violations), meaning the team spent a lot of time on manual review of petitions that ultimately were harmless.‍‍
‍Build a more robust workflow: The current workflow was a series of steps in a Google sheet. The team wanted a robust system integrated with their production environment.
‍Keep costs low: The team wanted to achieve 1-3 while keeping daily run cost immaterial. Before their homegrown system, Change.org was using a vendor that cost $5,000 a month. Their homegrown system was coming in around $30/day.

Solution

Change.org partnered with Fractional AI to build a more robust and scalable AI-powered content moderation system (goodbye spreadsheets!).

Here’s the final pipeline, which integrates directly into Change.org’s tech stack:

‍

Impact

The new system keeps harmful content off the site, dramatically reduces human reviewer time, and seamlessly integrates into Change.org's tech stack.

Relative to the project goals:

Catch more violations: 77% of content violations are now caught with AI
Reduce false positives: False positive rate cut in half (before 1 of 2 petitions flagged by the system as violations were later deemed ok by human reviewers, now that number is 1 of 4, meaning human reviewer time is spent more efficiently)
Build a more robust workflow: Replaced a spreadsheet workflow with a REST API workflow
Keep costs low: the cost stayed around $30/day

Project Setup

Data: Change.org supplied us with two days worth of human-labeled data (based on 3 human moderators who manually reviewed all petitions). These datasets represented the full universe of all petitions for each of the two days. We used one day as our test dataset and one day as our training dataset.
Model: We used GPT 4o and the OpenAI API moderation endpoint. We fine tuned GPT 3.5 using OpenPipe. We experimented with Gemini and Anthropic but opted to stick with GPT 4o.
Tooling: We used Langchain to stitch together our prompts and coerce LLM output into structured text, Langsmith for observability and experimentation, and the Anthropic Prompt Generator.

Lessons from 100+ Experiments

To find the right balance of achieving goals 1-4 – catching more violations without being too conservative and creating false positives all while keeping costs low – we ran over 100 experiments.

The charts show the relative impact of certain interventions individually (upgrading to GPT 4o from GPT 3.5 [A], using the OpenAI moderation endpoint [B], improving Prompt 5 [C]) and collectively (with and without use of a fine-tuned model) on the rate of false negatives and false positives.

Here's what we learned.

What Worked

1. Fine-tuning GPT 3.5

Using a fine-tuned model for Prompt 5 was key to reducing false positives.

Prompt 5 specifically asked the LLM to check for violations, “Check if the post violates any of our content guidelines around hate speech, illegal content, safety, violence, misinformation, fraud, IP infringement, spam, irrelevant to a petition or fraud,” and was yielding conservative results – disproportionately increasing the false positive rate.

To fine-tune a model for Prompt 5, Change.org provided us with 2 datasets that had been labeled by human moderators. Each dataset represented a single, full day of petitions (the quality of this dataset is key). We used one dataset to train and one to test.

Here’s a look under the hood on how we fine-tuned GPT 3.5 with OpenPipe.

*This is the setup in OpenPipe with the Input and Output Labeled data compared to the output from the fine-tuned model*

2. Anthropic Prompt Generator

Asking Anthropic to rewrite the prompt had a moderate effect. We found this specifically useful for Prompt 3.

Prompt 3 before

Prompt 3 after the Anthropic Prompt Generator

3. Structured output and chain-of-thought

We used Langchain’s structured output parsing facility to coerce the LLM output of each question to a specific format including confidence score, rationale, and label decision (e.g. “remove”/“allow”) enabling better chain-of-thought prompting.

4. Using the OpenAI API moderation endpoint

While certain steps require customization (e.g., fine-tuning GPT 3.5 on Change.org’s specific data), there’s a lot we could leverage ‘off the shelf.’

Specifically, the OpenAI API moderation endpoint is a tool that anyone can use to check if text is harmful on a number of common dimensions (harassment, hate, sexual content, minors, violence, self-harm). We inserted a step at the beginning of the pipeline calling to this endpoint.

What Didn't Work

1. Averaging across multiple responses

Running Prompt 3 multiple times and taking the most common recommendation (e.g., if “Remove” was the recommendation 3 times and “Allow” was the recommendation 2 times, choosing “Remove”) did not provide better results.

2. Few shot prompting

Including example classifications of content types as part of Prompt 1 did not yield more accurate classifications.

3. Using one large prompt rather than a series of 5+ prompts

This experiment came about as a way to test Anthropic’s Opus while minimizing cost per run by replacing 5+ sequential prompts with one large, detailed prompt. Initial results were promising, but we lost the granularity and specificity we were able to achieve with a multi-prompt pipeline.

Other Tidbits

GPT’s translation abilities are top notch. Change.org users come from 196 countries and submit petitions in dozens of languages. The pipeline generally worked across all languages.
Including steps at the beginning of the pipeline that had a low false positive rate, like calling to the OpenAI moderation endpoint, allowed us to iterate on subsequent parts of the pipeline quickly without worrying about reversions.

Key Takeaways

AI for UGC - Moderating user generated content is a strong use case for genAI.

No-Code Options as a Starting Point- Non-technical teams can build impressive automated workflows using tools like Zapier, which can then be enhanced by AI engineers.

Fine-tuning was the big winner here - Fine-tuning GPT 3.5 with Change.org’s specific data was ultimately the unlock to reducing false positives.

Documentation is crucial - When balancing multiple goals (e.g., catch more violations, reduce false positives, keep cost low), robust documentation of experimentation is necessary to track regressions and incremental improvement.

Request a Consultation