New Feature

PyAirbyte brings the power of Airbyte to every Python developer

An open-source library that packages Airbyte connectors and makes them available in Python, while removing the need for hosted services. Compatible with Airbyte Cloud and Open Source.

How does PyAirbyte work?

Enhancing Python with Airbyte connectors for flexible, local data Integration.

Installation via PyPi

  • PyAirbyte is installed using pip, making it accessible to anyone with a setup that supports Python >=3.9.

Data ingestion with one Python statement

  • PyAirbyte offers straightforward source connector configuration, with flexible data stream management and versatile caching options.
  • Extract data from hundreds of sources and load it to a variety of SQL caches, like DuckDB, Postgres, Snowflake and BigQuery.

Interoperability with SQL, Python libraries and AI frameworks

  • PyAirbyte cached data is compatible with various Python libraries, like Pandas and SQL-based tools, as well as popular AI frameworks like LangChain and LlamaIndex, to facilitate building LLM powered applications.

Compatibility with Airbyte Cloud and Open Source jobs

  • PyAirbyte lets you run existing jobs in Airbyte Cloud & OSS, providing convenient access to synchronized datasets.
  • Deploy your PyAirbyte connections as Airbyte cloud or OSS jobs for seamless integration.

Say goodbye to
custom ETL scripting

Enable rapid prototyping, minimize ETL coding, boost AI development, and integrate with best data engineering practices.

Leverage the Ubiquity of Python

PyAirbyte's use of Python simplifies integration into existing workflows, benefiting from its widespread adoption and community support.

Decrease Time to Value by Enabling Fast Prototyping

PyAirbyte speeds up the process from development to insights by enabling quick setup and iteration of data pipelines.

Reduce the Need for Custom ETL Development

PyAirbyte reduces the need for costly and error-prone custom ETL coding by providing pre-built connectors.

Facilitate AI Use Cases

PyAirbyte connects to diverse structured and unstructured data sources, simplifying the development of AI and LLM powered applications.

Enable Data Engineering Best Practices

PyAirbyte integrates data pipelines with version control and CI/CD practices, enhancing collaboration and reliability.

Build to scale with your business

Build & Test your connections in PyAirbyte for a quick POC and deploy them to Airbyte cloud for scalability & peace of mind.

Frequently asked questions

Does PyAirbyte replace Airbyte?

No, PyAirbyte complements Airbyte by offering additional capabilities for Python environments. You can start prototyping with PyAirbyte and then transition to another Airbyte offering as your needs evolve or scale.

What is the PyAirbyte cache? Is it an Airbyte destination?

Yes, you can think of the PyAirbyte cache as a built-in destination implementation. We avoid the term “destination” to avoid confusion with our certified destinations.

Can I develop traditional ETL pipelines with PyAirbyte?

Yes, PyAirbyte supports traditional ETL pipeline development. Simply select a cache type that matches your data destination.

Can PyAirbyte import a source connector from a local directory that has python project files?

Yes, PyAirbyte can use any local install that has a CLI, and will automatically find connectors by name if they are on PATH.

Can I move millions of rows or TB of data using PyAirbyte?

PyAirbyte should be able to efficiently handle large data volumes by writing to disk first and compressing data. The native database provider implementations ensure fast and memory-efficient processing.

What are some potential use cases of PyAirbyte?

PyAirbyte is ideal for data experimentation and discovery outside traditional data-warehousing, and for testing data flows before production deployment.

How does PyAirbyte handle non-breaking schema changes?

We check for schema compatibility and plan to soon add support for handling additional columns added upstream.

Are you planning to add support for more cache types or allow custom cache implementations?

We're open to contributions! And if there's significant user demand, we may add the feature ourselves.

Can I use PyAirbyte to develop or test when developing Airbyte sources?

Absolutely. PyAirbyte is a useful tool for development and testing of Python-based sources.

Can I run my existing Airbyte Cloud or Open Source jobs from PyAirbyte?

Yes, PyAirbyte provides full interoperability with Airbyte Cloud and OSS. You can trigger existing hosted jobs and access resulting synced datasets. You can also deploy new jobs to Airbyte Cloud and OSS via PyAirbyte. Refer to documentation for usage.

Is PyAirbyte compatible with data orchestration frameworks like Airflow, Dagster, and Snowpark?

Yes, PyAirbyte is designed to work with various data orchestration frameworks.

Where does PyAirbyte store the state for incremental processing?

PyAirbyte stores the state in the _airbyte_state table, alongside the data, in databases like DuckDB, Postgres, Snowflake, BigQuery, or MotherDuck.

Is it possible to change the normalization step of a destination with PyAirbyte?

While direct modifications to property names aren't available, you can use the get_records() method to retrieve data as a Python dictionary and store it as desired.