Airbyte is the leading open-source data integration engine that helps you consolidate your data in your warehouses, lakes, and databases.
Imagine you're an e-commerce company looking to combine Shopify sales data with Zendesk customer support data to better understand customer behavior. Airbyte allows you to easily set up a data pipeline to extract customer order data from Shopify, pull customer support tickets from Zendesk, and load all this data into your Snowflake data warehouse.
Extracting this data requires building API integrations (or “connectors”) with your data sources (e.g., Shopify, Zendesk).
Airbyte already offers an impressive library of pre-built connectors, but there are thousands of connectors left to be built to support data connectivity across all data sources. Many of these are API integrations to SaaS products.
Ask anyone who has spent their day drudging through API documentation to build connectors if they'd like for someone else to handle it, and you'll get a resounding “Yes!”. You have to navigate lengthy API docs – all structured differently (see examples 1, 2, and 3), dig around to find the relevant details (How do I authenticate? How does pagination work for this API?), and then manually configure these and a dozen other fields. Beyond being time-consuming and complex, this process diverts technical talent from higher value work.
Airbyte engaged Fractional AI to help develop an AI-Powered Connector Builder, cutting down the time it takes to build a connector from hours to just a few minutes. Lowering the barrier to building connectors enables Airbyte to power even more data connectivity across more sources -- in fact, Airbyte is already seeing a marked increase in the number of connectors in the wake of the AI Assist release.
Check it out:
Building the end-to-end AI-Powered Connector Builder brought up a number of questions – from typical engineering considerations (e.g., How do we think about caching? Testing? Scalability?) to the broad range of AI questions necessary for production-ready AI features (e.g., How do we minimize hallucinations? How do we evaluate accuracy across connectors? How do we minimize model costs?). Read on for more detail on technical tradeoffs.
While the UI aims to make AI-assisted connector building as intuitive as possible, the AI-Powered Connector Builder is a highly complex product under the hood. The key question driving this build was:
“How can we take a vast array of inputs (e.g., documentation for any API) and reliably generate an equally broad range of outputs (e.g., the configuration for any API integration)?”
And, of course, there are edge cases. We use a different approach when an OpenAPI spec is provided, when the scraped docs don’t look right for a variety of reasons, or when we don’t find an answer in the section of the docs we’re looking at.
Building unique variations of this flow across components of the API
Ensuring compatibility with a large range of API docs
Here’s how we go from the URL to an API’s documentation to a populated connector spec for authentication:
Step 1: Reading API Docs
Step 2: Extracting Relevant Sections
Step 3: Parsing and prompting the exact details from the HTML chunks
This is a simplified illustration of the workflow for just authentication – as you’ll see as you use the product, the AI connector builder autogenerates not only authentication but also:
And then, as with any other product build, we had to think about deployment, permissioning, testing, scalability, user experience… the list goes on!
Supercharging developer productivity: There are many high-ROI places where the right AI applications can dramatically increase developer productivity.
Both an engineering and an AI problem: this project is a good reminder that the challenges getting AI into production aren’t pure issues from wrangling LLMs. In this case, quality crawling – a challenge as old as Google – posed a major challenge.
High-risk, high-reward projects: When we first connected with Airbyte less than 6 months ago, we didn’t even know an AI-powered connector builder at the level of accuracy to be userful was possible. Starting with a POC helped us realize that a few months of investment could be a gamechanger for the future of connector building.