04-ingestion: Airbyte Data Ingestion
Airbyte OSS for data ingestion and ETL using abctl CLI tool.
Overview
This deployment uses Airbyte's official abctl command-line tool for easy installation and management. It's configured to use shared infrastructure from 01-infra:
- PostgreSQL: Shared database for Airbyte metadata
- Nginx Proxy Manager: Shared reverse proxy for external access
- Network:
shared_data_networkfor inter-service communication
Note: abctl deploys an internal airbyte-proxy container for routing between Airbyte microservices. External access is handled by the existing Nginx Proxy Manager in 01-infra - no additional nginx needed in this folder.
Prerequisites
- Docker Desktop installed and running
- Infrastructure services running (PostgreSQL from
01-infra) - Linux or macOS (for Windows, install abctl manually)
Installation
First Time Setup
Run the automated setup script:
cd 04-ingestion
chmod +x *.sh
./setup-airbyte.sh
This script will:
- Check prerequisites (Docker, PostgreSQL)
- Install
abctlif not present - Create required databases (airbyte, temporal, temporal_visibility)
- Install Airbyte with custom configuration
- Configure port mapping (8030 instead of default 8000)
Installation takes approximately 10-30 minutes depending on internet speed.
Manual Installation
If you prefer manual installation:
-
Install abctl:
curl -LsfS https://get.airbyte.com | bash - -
Create databases:
docker exec postgres psql -U postgres -c "CREATE DATABASE airbyte;" docker exec postgres psql -U postgres -c "CREATE DATABASE temporal;" docker exec postgres psql -U postgres -c "CREATE DATABASE temporal_visibility;" -
Install Airbyte:
abctl local install --port 8030 --insecure-cookies
Usage
Start Airbyte
./start-airbyte.sh
Stop Airbyte
./stop-airbyte.sh
Uninstall Airbyte
./uninstall-airbyte.sh
Access
Production (via Nginx Proxy Manager)
- Domain: https://ai.sriphat.com/airbyte
- Authentication: Configured via Nginx (see NGINX-SETUP.md)
- SSL: Enabled with Let's Encrypt
Development/Local
- Localhost: http://localhost:8030
- Direct IP: http://[SERVER_IP]:8030
- No authentication required for local access
Configuration
Edit .airbyte.env to customize:
AIRBYTE_PORT: External port (default: 8030)AIRBYTE_HOST: Domain name for external access (ai.sriphat.com)LOW_RESOURCE_MODE: Enabled by default for systems with <4 CPU coresAIRBYTE_VERSION: Uses latest stable versionENABLE_BACKUP: Automated backup configuration (enabled)- Database connection settings (uses shared PostgreSQL)
Authentication
Airbyte does not natively support Keycloak. Authentication is handled via Nginx Proxy Manager:
- Recommended: OAuth2 Proxy with Keycloak integration
- Alternative: Basic Authentication via nginx
- Simple: IP whitelist for internal access
See NGINX-SETUP.md for detailed configuration instructions.
Database
Airbyte uses three databases in the shared PostgreSQL instance:
airbyte: Main application databasetemporal: Workflow engine databasetemporal_visibility: Temporal visibility database
All databases are automatically created during setup and backed up with the main PostgreSQL instance.
Architecture
Services Deployed by abctl
- airbyte-server: Backend API and business logic
- airbyte-worker: Executes sync jobs and manages connectors
- airbyte-webapp: Web UI
- airbyte-temporal: Workflow orchestration engine
- airbyte-proxy: Nginx reverse proxy (public entrypoint)
- airbyte-cron: Scheduled job runner
- airbyte-connector-builder-server: Custom connector development
- airbyte-api-server: REST API server
Network Architecture
All services connect to shared_data_network:
Internet → Nginx Proxy Manager (01-infra) → airbyte-proxy (internal) → Airbyte Services
ai.sriphat.com/airbyte port 8000
Shared Resources from 01-infra:
- Nginx Proxy Manager: External reverse proxy (handles SSL, auth, routing)
- PostgreSQL: Database server (airbyte, temporal, temporal_visibility)
- Keycloak: Identity provider (optional, via OAuth2 Proxy)
Airbyte Components (deployed by abctl):
- airbyte-proxy: Internal nginx for microservice routing (NOT for external access)
- Airbyte services: server, worker, webapp, temporal, etc.
See ARCHITECTURE.md for detailed network flow diagram.
Troubleshooting
Installation Issues
Error: "Readiness probe failed: HTTP probe failed with statuscode: 503"
- This is normal during installation. Allow installation to continue.
- May need to allocate more resources to Docker Desktop.
Error: "PostgreSQL container is not running"
- Start infrastructure first:
cd ../01-infra && docker compose --env-file ../.env.global up -d
Error: "abctl: command not found"
- The setup script will install it automatically on Linux/macOS
- For Windows, download from: https://github.com/airbytehq/abctl/releases
Runtime Issues
Cannot access UI at localhost:8030
- Check if Airbyte is running:
abctl local status - Check Docker containers:
docker ps | grep airbyte - View logs:
abctl local logs
Sync jobs failing
- Check worker logs:
docker logs airbyte-worker - Verify database connectivity
- Ensure sufficient disk space and memory
Low resource environments
- Enable low-resource mode in
.airbyte.env:LOW_RESOURCE_MODE=true - Note: Connector Builder will be disabled in low-resource mode
Upgrading
To upgrade Airbyte to a newer version:
abctl local upgrade
Backup
Automated Backups
The setup script creates backup-airbyte.sh which backs up all Airbyte databases:
airbyte- Main application databasetemporal- Workflow engine databasetemporal_visibility- Temporal visibility database
Manual Backup:
./backup-airbyte.sh
Automated Schedule: Add to crontab for daily backups at 2 AM:
crontab -e
# Add this line:
0 2 * * * cd /path/to/04-ingestion && ./backup-airbyte.sh
Backup Location:
- Directory:
./backups/ - Format:
airbyte_backup_YYYYMMDD_HHMMSS.tar.gz - Retention: Last 7 days (older backups auto-deleted)
Restore from Backup:
# Extract backup
tar -xzf backups/airbyte_backup_20260227_020000.tar.gz
# Restore databases
docker exec -i postgres psql -U postgres airbyte < airbyte_20260227_020000.sql
docker exec -i postgres psql -U postgres temporal < temporal_20260227_020000.sql
docker exec -i postgres psql -U postgres temporal_visibility < temporal_visibility_20260227_020000.sql