# 04-ingestion: Airbyte Data Ingestion Airbyte OSS for data ingestion and ETL using `abctl` CLI tool. ## Overview This deployment uses Airbyte's official `abctl` command-line tool for easy installation and management. It's configured to use shared infrastructure from `01-infra`: - **PostgreSQL**: Shared database for Airbyte metadata - **Nginx Proxy Manager**: Shared reverse proxy for external access - **Network**: `shared_data_network` for inter-service communication **Note**: `abctl` deploys an internal `airbyte-proxy` container for routing between Airbyte microservices. External access is handled by the existing Nginx Proxy Manager in `01-infra` - no additional nginx needed in this folder. ## Prerequisites 1. Docker Desktop installed and running 2. Infrastructure services running (PostgreSQL from `01-infra`) 3. Linux or macOS (for Windows, install abctl manually) ## Installation ### First Time Setup Run the automated setup script: ```bash cd 04-ingestion chmod +x *.sh ./setup-airbyte.sh ``` This script will: - Check prerequisites (Docker, PostgreSQL) - Install `abctl` if not present - Create required databases (airbyte, temporal, temporal_visibility) - Install Airbyte with custom configuration - Configure port mapping (8030 instead of default 8000) Installation takes approximately 10-30 minutes depending on internet speed. ### Manual Installation If you prefer manual installation: 1. Install abctl: ```bash curl -LsfS https://get.airbyte.com | bash - ``` 2. Create databases: ```bash docker exec postgres psql -U postgres -c "CREATE DATABASE airbyte;" docker exec postgres psql -U postgres -c "CREATE DATABASE temporal;" docker exec postgres psql -U postgres -c "CREATE DATABASE temporal_visibility;" ``` 3. Install Airbyte: ```bash abctl local install --port 8030 --insecure-cookies ``` ## Usage ### Start Airbyte ```bash ./start-airbyte.sh ``` ### Stop Airbyte ```bash ./stop-airbyte.sh ``` ### Uninstall Airbyte ```bash ./uninstall-airbyte.sh ``` ## Access ### Production (via Nginx Proxy Manager) - **Domain**: https://ai.sriphat.com/airbyte - **Authentication**: Configured via Nginx (see NGINX-SETUP.md) - **SSL**: Enabled with Let's Encrypt ### Development/Local - **Localhost**: http://localhost:8030 - **Direct IP**: http://[SERVER_IP]:8030 - **No authentication** required for local access ## Configuration Edit `.airbyte.env` to customize: - `AIRBYTE_PORT`: External port (default: 8030) - `AIRBYTE_HOST`: Domain name for external access (ai.sriphat.com) - `LOW_RESOURCE_MODE`: **Enabled by default** for systems with <4 CPU cores - `AIRBYTE_VERSION`: Uses latest stable version - `ENABLE_BACKUP`: Automated backup configuration (enabled) - Database connection settings (uses shared PostgreSQL) ### Authentication Airbyte does not natively support Keycloak. Authentication is handled via **Nginx Proxy Manager**: 1. **Recommended**: OAuth2 Proxy with Keycloak integration 2. **Alternative**: Basic Authentication via nginx 3. **Simple**: IP whitelist for internal access See `NGINX-SETUP.md` for detailed configuration instructions. ## Database Airbyte uses three databases in the shared PostgreSQL instance: - `airbyte`: Main application database - `temporal`: Workflow engine database - `temporal_visibility`: Temporal visibility database All databases are automatically created during setup and backed up with the main PostgreSQL instance. ## Architecture ### Services Deployed by abctl - **airbyte-server**: Backend API and business logic - **airbyte-worker**: Executes sync jobs and manages connectors - **airbyte-webapp**: Web UI - **airbyte-temporal**: Workflow orchestration engine - **airbyte-proxy**: Nginx reverse proxy (public entrypoint) - **airbyte-cron**: Scheduled job runner - **airbyte-connector-builder-server**: Custom connector development - **airbyte-api-server**: REST API server ### Network Architecture **All services connect to `shared_data_network`:** ``` Internet → Nginx Proxy Manager (01-infra) → airbyte-proxy (internal) → Airbyte Services ai.sriphat.com/airbyte port 8000 ``` **Shared Resources from 01-infra:** - **Nginx Proxy Manager**: External reverse proxy (handles SSL, auth, routing) - **PostgreSQL**: Database server (airbyte, temporal, temporal_visibility) - **Keycloak**: Identity provider (optional, via OAuth2 Proxy) **Airbyte Components (deployed by abctl):** - **airbyte-proxy**: Internal nginx for microservice routing (NOT for external access) - **Airbyte services**: server, worker, webapp, temporal, etc. See `ARCHITECTURE.md` for detailed network flow diagram. ## Troubleshooting ### Installation Issues **Error: "Readiness probe failed: HTTP probe failed with statuscode: 503"** - This is normal during installation. Allow installation to continue. - May need to allocate more resources to Docker Desktop. **Error: "PostgreSQL container is not running"** - Start infrastructure first: `cd ../01-infra && docker compose --env-file ../.env.global up -d` **Error: "abctl: command not found"** - The setup script will install it automatically on Linux/macOS - For Windows, download from: https://github.com/airbytehq/abctl/releases ### Runtime Issues **Cannot access UI at localhost:8030** - Check if Airbyte is running: `abctl local status` - Check Docker containers: `docker ps | grep airbyte` - View logs: `abctl local logs` **Sync jobs failing** - Check worker logs: `docker logs airbyte-worker` - Verify database connectivity - Ensure sufficient disk space and memory **Low resource environments** - Enable low-resource mode in `.airbyte.env`: `LOW_RESOURCE_MODE=true` - Note: Connector Builder will be disabled in low-resource mode ## Upgrading To upgrade Airbyte to a newer version: ```bash abctl local upgrade ``` ## Backup ### Automated Backups The setup script creates `backup-airbyte.sh` which backs up all Airbyte databases: - `airbyte` - Main application database - `temporal` - Workflow engine database - `temporal_visibility` - Temporal visibility database **Manual Backup:** ```bash ./backup-airbyte.sh ``` **Automated Schedule:** Add to crontab for daily backups at 2 AM: ```bash crontab -e # Add this line: 0 2 * * * cd /path/to/04-ingestion && ./backup-airbyte.sh ``` **Backup Location:** - Directory: `./backups/` - Format: `airbyte_backup_YYYYMMDD_HHMMSS.tar.gz` - Retention: Last 7 days (older backups auto-deleted) **Restore from Backup:** ```bash # Extract backup tar -xzf backups/airbyte_backup_20260227_020000.tar.gz # Restore databases docker exec -i postgres psql -U postgres airbyte < airbyte_20260227_020000.sql docker exec -i postgres psql -U postgres temporal < temporal_20260227_020000.sql docker exec -i postgres psql -U postgres temporal_visibility < temporal_visibility_20260227_020000.sql ``` ## Additional Resources - [Airbyte Documentation](https://docs.airbyte.com/) - [Airbyte OSS Quickstart](https://docs.airbyte.com/platform/using-airbyte/getting-started/oss-quickstart) - [abctl CLI Reference](https://docs.airbyte.com/platform/deploying-airbyte/abctl)