February: a Month Of Stabilization For a New Acceleration Phase
During the month of February, we didn’t add as many new connectors as we have in the past few months. This is part of our plan to get to 200-300 connectors by the end of 2021. Let’s see what has happened behind the scenes.
Although our progress was not always visible, February was a very productive month on our side. It was what we call a stabilization month. After every acceleration period, you need a stabilization one. Otherwise, your product will never achieve the level of stability that your enterprise customers need, or you won’t be able to scale the product beyond a certain point.
In this article, we want to detail what we worked on during this stabilization period. But, first, let’s detail why we went into the stabilization period.
Being exposed to the thousand-paper-cut issues
As a reminder, we started working on this project in July 2020. So when we reached ~50 connectors in January, you can safely assume we have been in an acceleration period for those 6 months.
Data integration is a specific engineering problem. It’s a thousand-paper-cut problem, meaning that every user will use a connector differently. They might need different data, or might have higher data volume, or have a different infrastructure, than Airbyte would be running. Every use case is different and brings its own problems.
So, every time we have an inbound flow of new users, we will be exposed to more use cases, and new issues will be flooding in, as you can see in the charts below showing the number of issues created by the community:
When we launched on Hackernews, we knew that we would be exposed to a lot more use cases, and that it was time to go into a stabilization period. In the end, our ambition is to fully commoditize data integration, and to become the standard to replicate data. We can only do that if Airbyte works for any use case. So it needs to be the most reliable data integration tool out there.
1. Bug fixing and strengthening connectors
In February, we closed 94 issues.
Most of them were bugs related to connectors. In addition to fixing those issues and strengthening the connectors to support new use cases, we decided to do two things:
- Build a public connector health dashboard, in order to be very transparent with the community about the status of the connectors, and
- Start certifying the connectors against a set of best practices that we’re constantly improving with the use cases we’re being exposed to.
Now you should understand...before we can ramp up again on the number of connectors, we need to get a good grip on what those best practices should be so we can implement them on the new connectors. But, that’s actually not all.
2. Anticipating the next features to minimize risks
Best practices are good to have in mind when building new connectors, but they’re not enough. We need to anticipate what is coming in terms of features on those connectors, so when we’re ready to build new ones, we know the architecture we need to build.
So, we designed how OAuth would be implemented, how deletion would be handled (to improve the incremental append), how we would handle nested tables, etc.
Now that we have that in mind, the only big step towards building 200-300 reliable and scalable connectors by the end of 2021 is to make it a lot easier to build those connectors not only for us, but for the community, too.
3. Low-code connector building
As we mentioned in our article about how you can build thousands of connectors, we need to build abstractions to address families of connectors. This is something we’ve been prototyping on: enabling developers to build connectors within 15 minutes by only describing the source API.
The abstraction will facilitate building new connectors, and will also facilitate their maintenance. It will ensure that all those connectors are built following our set of best practices in a consistent manner.
To be honest, we’re not ready yet on that point; we’re still prototyping, but this is something we will be communicating more and more about. At first, the low-code process will only be able to address a set of APIs, but we hope we will be able to extend the breadth of connectors that we can address this way with time.
We’re almost ready for the next acceleration phase
In the end, all this to say that we’re slowly moving out of this stabilization period now, especially on the core platform. That’s why we have started working on Airflow integration, CDC (change data capture), schema adaptation, etc. We’ll have a lot to announce on that part in the coming weeks.
In regards to connectors, we’re not ready yet for the next acceleration phase, but we’re approaching it. When it is time, you will need to buckle your seat belt!
We hope this article shows you more what’s happening under the hood, i.e., our development and priorities. Don’t hesitate to leave a comment or any questions you might have.