236 lines
6.9 KiB
Markdown
236 lines
6.9 KiB
Markdown
# 04-ingestion: Airbyte Data Ingestion
|
|
|
|
Airbyte OSS for data ingestion and ETL using `abctl` CLI tool.
|
|
|
|
## Overview
|
|
|
|
This deployment uses Airbyte's official `abctl` command-line tool for easy installation and management. It's configured to use shared infrastructure from `01-infra`:
|
|
|
|
- **PostgreSQL**: Shared database for Airbyte metadata
|
|
- **Nginx Proxy Manager**: Shared reverse proxy for external access
|
|
- **Network**: `shared_data_network` for inter-service communication
|
|
|
|
**Note**: `abctl` deploys an internal `airbyte-proxy` container for routing between Airbyte microservices. External access is handled by the existing Nginx Proxy Manager in `01-infra` - no additional nginx needed in this folder.
|
|
|
|
## Prerequisites
|
|
|
|
1. Docker Desktop installed and running
|
|
2. Infrastructure services running (PostgreSQL from `01-infra`)
|
|
3. Linux or macOS (for Windows, install abctl manually)
|
|
|
|
## Installation
|
|
|
|
### First Time Setup
|
|
|
|
Run the automated setup script:
|
|
|
|
```bash
|
|
cd 04-ingestion
|
|
chmod +x *.sh
|
|
./setup-airbyte.sh
|
|
```
|
|
|
|
This script will:
|
|
- Check prerequisites (Docker, PostgreSQL)
|
|
- Install `abctl` if not present
|
|
- Create required databases (airbyte, temporal, temporal_visibility)
|
|
- Install Airbyte with custom configuration
|
|
- Configure port mapping (8030 instead of default 8000)
|
|
|
|
Installation takes approximately 10-30 minutes depending on internet speed.
|
|
|
|
### Manual Installation
|
|
|
|
If you prefer manual installation:
|
|
|
|
1. Install abctl:
|
|
```bash
|
|
curl -LsfS https://get.airbyte.com | bash -
|
|
```
|
|
|
|
2. Create databases:
|
|
```bash
|
|
docker exec postgres psql -U postgres -c "CREATE DATABASE airbyte;"
|
|
docker exec postgres psql -U postgres -c "CREATE DATABASE temporal;"
|
|
docker exec postgres psql -U postgres -c "CREATE DATABASE temporal_visibility;"
|
|
```
|
|
|
|
3. Install Airbyte:
|
|
```bash
|
|
abctl local install --port 8030 --insecure-cookies
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Start Airbyte
|
|
```bash
|
|
./start-airbyte.sh
|
|
```
|
|
|
|
### Stop Airbyte
|
|
```bash
|
|
./stop-airbyte.sh
|
|
```
|
|
|
|
### Uninstall Airbyte
|
|
```bash
|
|
./uninstall-airbyte.sh
|
|
```
|
|
|
|
## Access
|
|
|
|
### Production (via Nginx Proxy Manager)
|
|
- **Domain**: https://ai.sriphat.com/airbyte
|
|
- **Authentication**: Configured via Nginx (see NGINX-SETUP.md)
|
|
- **SSL**: Enabled with Let's Encrypt
|
|
|
|
### Development/Local
|
|
- **Localhost**: http://localhost:8030
|
|
- **Direct IP**: http://[SERVER_IP]:8030
|
|
- **No authentication** required for local access
|
|
|
|
## Configuration
|
|
|
|
Edit `.airbyte.env` to customize:
|
|
- `AIRBYTE_PORT`: External port (default: 8030)
|
|
- `AIRBYTE_HOST`: Domain name for external access (ai.sriphat.com)
|
|
- `LOW_RESOURCE_MODE`: **Enabled by default** for systems with <4 CPU cores
|
|
- `AIRBYTE_VERSION`: Uses latest stable version
|
|
- `ENABLE_BACKUP`: Automated backup configuration (enabled)
|
|
- Database connection settings (uses shared PostgreSQL)
|
|
|
|
### Authentication
|
|
|
|
Airbyte does not natively support Keycloak. Authentication is handled via **Nginx Proxy Manager**:
|
|
|
|
1. **Recommended**: OAuth2 Proxy with Keycloak integration
|
|
2. **Alternative**: Basic Authentication via nginx
|
|
3. **Simple**: IP whitelist for internal access
|
|
|
|
See `NGINX-SETUP.md` for detailed configuration instructions.
|
|
|
|
## Database
|
|
|
|
Airbyte uses three databases in the shared PostgreSQL instance:
|
|
- `airbyte`: Main application database
|
|
- `temporal`: Workflow engine database
|
|
- `temporal_visibility`: Temporal visibility database
|
|
|
|
All databases are automatically created during setup and backed up with the main PostgreSQL instance.
|
|
|
|
## Architecture
|
|
|
|
### Services Deployed by abctl
|
|
|
|
- **airbyte-server**: Backend API and business logic
|
|
- **airbyte-worker**: Executes sync jobs and manages connectors
|
|
- **airbyte-webapp**: Web UI
|
|
- **airbyte-temporal**: Workflow orchestration engine
|
|
- **airbyte-proxy**: Nginx reverse proxy (public entrypoint)
|
|
- **airbyte-cron**: Scheduled job runner
|
|
- **airbyte-connector-builder-server**: Custom connector development
|
|
- **airbyte-api-server**: REST API server
|
|
|
|
### Network Architecture
|
|
|
|
**All services connect to `shared_data_network`:**
|
|
|
|
```
|
|
Internet → Nginx Proxy Manager (01-infra) → airbyte-proxy (internal) → Airbyte Services
|
|
ai.sriphat.com/airbyte port 8000
|
|
```
|
|
|
|
**Shared Resources from 01-infra:**
|
|
- **Nginx Proxy Manager**: External reverse proxy (handles SSL, auth, routing)
|
|
- **PostgreSQL**: Database server (airbyte, temporal, temporal_visibility)
|
|
- **Keycloak**: Identity provider (optional, via OAuth2 Proxy)
|
|
|
|
**Airbyte Components (deployed by abctl):**
|
|
- **airbyte-proxy**: Internal nginx for microservice routing (NOT for external access)
|
|
- **Airbyte services**: server, worker, webapp, temporal, etc.
|
|
|
|
See `ARCHITECTURE.md` for detailed network flow diagram.
|
|
|
|
## Troubleshooting
|
|
|
|
### Installation Issues
|
|
|
|
**Error: "Readiness probe failed: HTTP probe failed with statuscode: 503"**
|
|
- This is normal during installation. Allow installation to continue.
|
|
- May need to allocate more resources to Docker Desktop.
|
|
|
|
**Error: "PostgreSQL container is not running"**
|
|
- Start infrastructure first: `cd ../01-infra && docker compose --env-file ../.env.global up -d`
|
|
|
|
**Error: "abctl: command not found"**
|
|
- The setup script will install it automatically on Linux/macOS
|
|
- For Windows, download from: https://github.com/airbytehq/abctl/releases
|
|
|
|
### Runtime Issues
|
|
|
|
**Cannot access UI at localhost:8030**
|
|
- Check if Airbyte is running: `abctl local status`
|
|
- Check Docker containers: `docker ps | grep airbyte`
|
|
- View logs: `abctl local logs`
|
|
|
|
**Sync jobs failing**
|
|
- Check worker logs: `docker logs airbyte-worker`
|
|
- Verify database connectivity
|
|
- Ensure sufficient disk space and memory
|
|
|
|
**Low resource environments**
|
|
- Enable low-resource mode in `.airbyte.env`: `LOW_RESOURCE_MODE=true`
|
|
- Note: Connector Builder will be disabled in low-resource mode
|
|
|
|
## Upgrading
|
|
|
|
To upgrade Airbyte to a newer version:
|
|
|
|
```bash
|
|
abctl local upgrade
|
|
```
|
|
|
|
## Backup
|
|
|
|
### Automated Backups
|
|
|
|
The setup script creates `backup-airbyte.sh` which backs up all Airbyte databases:
|
|
- `airbyte` - Main application database
|
|
- `temporal` - Workflow engine database
|
|
- `temporal_visibility` - Temporal visibility database
|
|
|
|
**Manual Backup:**
|
|
```bash
|
|
./backup-airbyte.sh
|
|
```
|
|
|
|
**Automated Schedule:**
|
|
Add to crontab for daily backups at 2 AM:
|
|
```bash
|
|
crontab -e
|
|
# Add this line:
|
|
0 2 * * * cd /path/to/04-ingestion && ./backup-airbyte.sh
|
|
```
|
|
|
|
**Backup Location:**
|
|
- Directory: `./backups/`
|
|
- Format: `airbyte_backup_YYYYMMDD_HHMMSS.tar.gz`
|
|
- Retention: Last 7 days (older backups auto-deleted)
|
|
|
|
**Restore from Backup:**
|
|
```bash
|
|
# Extract backup
|
|
tar -xzf backups/airbyte_backup_20260227_020000.tar.gz
|
|
|
|
# Restore databases
|
|
docker exec -i postgres psql -U postgres airbyte < airbyte_20260227_020000.sql
|
|
docker exec -i postgres psql -U postgres temporal < temporal_20260227_020000.sql
|
|
docker exec -i postgres psql -U postgres temporal_visibility < temporal_visibility_20260227_020000.sql
|
|
```
|
|
|
|
## Additional Resources
|
|
|
|
- [Airbyte Documentation](https://docs.airbyte.com/)
|
|
- [Airbyte OSS Quickstart](https://docs.airbyte.com/platform/using-airbyte/getting-started/oss-quickstart)
|
|
- [abctl CLI Reference](https://docs.airbyte.com/platform/deploying-airbyte/abctl)
|