Files
2026-03-02 21:58:51 +07:00

6.9 KiB

04-ingestion: Airbyte Data Ingestion

Airbyte OSS for data ingestion and ETL using abctl CLI tool.

Overview

This deployment uses Airbyte's official abctl command-line tool for easy installation and management. It's configured to use shared infrastructure from 01-infra:

  • PostgreSQL: Shared database for Airbyte metadata
  • Nginx Proxy Manager: Shared reverse proxy for external access
  • Network: shared_data_network for inter-service communication

Note: abctl deploys an internal airbyte-proxy container for routing between Airbyte microservices. External access is handled by the existing Nginx Proxy Manager in 01-infra - no additional nginx needed in this folder.

Prerequisites

  1. Docker Desktop installed and running
  2. Infrastructure services running (PostgreSQL from 01-infra)
  3. Linux or macOS (for Windows, install abctl manually)

Installation

First Time Setup

Run the automated setup script:

cd 04-ingestion
chmod +x *.sh
./setup-airbyte.sh

This script will:

  • Check prerequisites (Docker, PostgreSQL)
  • Install abctl if not present
  • Create required databases (airbyte, temporal, temporal_visibility)
  • Install Airbyte with custom configuration
  • Configure port mapping (8030 instead of default 8000)

Installation takes approximately 10-30 minutes depending on internet speed.

Manual Installation

If you prefer manual installation:

  1. Install abctl:

    curl -LsfS https://get.airbyte.com | bash -
    
  2. Create databases:

    docker exec postgres psql -U postgres -c "CREATE DATABASE airbyte;"
    docker exec postgres psql -U postgres -c "CREATE DATABASE temporal;"
    docker exec postgres psql -U postgres -c "CREATE DATABASE temporal_visibility;"
    
  3. Install Airbyte:

    abctl local install --port 8030 --insecure-cookies
    

Usage

Start Airbyte

./start-airbyte.sh

Stop Airbyte

./stop-airbyte.sh

Uninstall Airbyte

./uninstall-airbyte.sh

Access

Production (via Nginx Proxy Manager)

Development/Local

Configuration

Edit .airbyte.env to customize:

  • AIRBYTE_PORT: External port (default: 8030)
  • AIRBYTE_HOST: Domain name for external access (ai.sriphat.com)
  • LOW_RESOURCE_MODE: Enabled by default for systems with <4 CPU cores
  • AIRBYTE_VERSION: Uses latest stable version
  • ENABLE_BACKUP: Automated backup configuration (enabled)
  • Database connection settings (uses shared PostgreSQL)

Authentication

Airbyte does not natively support Keycloak. Authentication is handled via Nginx Proxy Manager:

  1. Recommended: OAuth2 Proxy with Keycloak integration
  2. Alternative: Basic Authentication via nginx
  3. Simple: IP whitelist for internal access

See NGINX-SETUP.md for detailed configuration instructions.

Database

Airbyte uses three databases in the shared PostgreSQL instance:

  • airbyte: Main application database
  • temporal: Workflow engine database
  • temporal_visibility: Temporal visibility database

All databases are automatically created during setup and backed up with the main PostgreSQL instance.

Architecture

Services Deployed by abctl

  • airbyte-server: Backend API and business logic
  • airbyte-worker: Executes sync jobs and manages connectors
  • airbyte-webapp: Web UI
  • airbyte-temporal: Workflow orchestration engine
  • airbyte-proxy: Nginx reverse proxy (public entrypoint)
  • airbyte-cron: Scheduled job runner
  • airbyte-connector-builder-server: Custom connector development
  • airbyte-api-server: REST API server

Network Architecture

All services connect to shared_data_network:

Internet → Nginx Proxy Manager (01-infra) → airbyte-proxy (internal) → Airbyte Services
          ai.sriphat.com/airbyte           port 8000

Shared Resources from 01-infra:

  • Nginx Proxy Manager: External reverse proxy (handles SSL, auth, routing)
  • PostgreSQL: Database server (airbyte, temporal, temporal_visibility)
  • Keycloak: Identity provider (optional, via OAuth2 Proxy)

Airbyte Components (deployed by abctl):

  • airbyte-proxy: Internal nginx for microservice routing (NOT for external access)
  • Airbyte services: server, worker, webapp, temporal, etc.

See ARCHITECTURE.md for detailed network flow diagram.

Troubleshooting

Installation Issues

Error: "Readiness probe failed: HTTP probe failed with statuscode: 503"

  • This is normal during installation. Allow installation to continue.
  • May need to allocate more resources to Docker Desktop.

Error: "PostgreSQL container is not running"

  • Start infrastructure first: cd ../01-infra && docker compose --env-file ../.env.global up -d

Error: "abctl: command not found"

Runtime Issues

Cannot access UI at localhost:8030

  • Check if Airbyte is running: abctl local status
  • Check Docker containers: docker ps | grep airbyte
  • View logs: abctl local logs

Sync jobs failing

  • Check worker logs: docker logs airbyte-worker
  • Verify database connectivity
  • Ensure sufficient disk space and memory

Low resource environments

  • Enable low-resource mode in .airbyte.env: LOW_RESOURCE_MODE=true
  • Note: Connector Builder will be disabled in low-resource mode

Upgrading

To upgrade Airbyte to a newer version:

abctl local upgrade

Backup

Automated Backups

The setup script creates backup-airbyte.sh which backs up all Airbyte databases:

  • airbyte - Main application database
  • temporal - Workflow engine database
  • temporal_visibility - Temporal visibility database

Manual Backup:

./backup-airbyte.sh

Automated Schedule: Add to crontab for daily backups at 2 AM:

crontab -e
# Add this line:
0 2 * * * cd /path/to/04-ingestion && ./backup-airbyte.sh

Backup Location:

  • Directory: ./backups/
  • Format: airbyte_backup_YYYYMMDD_HHMMSS.tar.gz
  • Retention: Last 7 days (older backups auto-deleted)

Restore from Backup:

# Extract backup
tar -xzf backups/airbyte_backup_20260227_020000.tar.gz

# Restore databases
docker exec -i postgres psql -U postgres airbyte < airbyte_20260227_020000.sql
docker exec -i postgres psql -U postgres temporal < temporal_20260227_020000.sql
docker exec -i postgres psql -U postgres temporal_visibility < temporal_visibility_20260227_020000.sql

Additional Resources