update configuration docker setup for data platform
This commit is contained in:
226
03-apiservice/README-PAGES.md
Normal file
226
03-apiservice/README-PAGES.md
Normal file
@@ -0,0 +1,226 @@
|
||||
# Web Pages Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
The API service now includes web pages for easy access to all platform services and data management functionality.
|
||||
|
||||
## Pages
|
||||
|
||||
### 1. Landing Page (`/`)
|
||||
|
||||
Main dashboard with navigation to all services:
|
||||
- **Supabase**: PostgreSQL database with REST API
|
||||
- **API Docs**: Interactive API documentation
|
||||
- **Airflow**: Workflow orchestration
|
||||
- **Airbyte**: Data integration and ETL
|
||||
- **Data Management**: File upload and processing (with submenu)
|
||||
- **DBT**: Data transformation
|
||||
- **Superset**: Business intelligence and visualization
|
||||
|
||||
### 2. Finance Excel Upload (`/data-management/finance`)
|
||||
|
||||
Upload and process financial Excel files with the following features:
|
||||
|
||||
**Upload Form:**
|
||||
- Drag-and-drop or click to select Excel files (.xlsx, .xls)
|
||||
- Optional description field
|
||||
- File validation and size display
|
||||
|
||||
**Upload Processing:**
|
||||
- Files saved to `/data/uploads/` with timestamp prefix
|
||||
- Unique filename generation to prevent conflicts
|
||||
- Automatic Airflow job triggering (when configured)
|
||||
|
||||
**Upload History:**
|
||||
- Real-time list of all uploaded files
|
||||
- Status tracking: pending, processing, success, error
|
||||
- Expandable accordion for logs and error details
|
||||
- Auto-refresh every 10 seconds
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Page Routes
|
||||
|
||||
```
|
||||
GET / - Landing page
|
||||
GET /data-management/finance - Finance upload page
|
||||
POST /data-management/finance/upload - Upload file endpoint
|
||||
GET /data-management/finance/uploads - Get all uploads
|
||||
GET /data-management/finance/uploads/{upload_id} - Get specific upload
|
||||
```
|
||||
|
||||
### Upload Endpoint
|
||||
|
||||
**POST** `/data-management/finance/upload`
|
||||
|
||||
**Form Data:**
|
||||
- `file`: Excel file (.xlsx or .xls)
|
||||
- `description`: Optional description text
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "File 'example.xlsx' uploaded successfully",
|
||||
"upload_id": "upload_20260306_174500",
|
||||
"filename": "20260306_174500_example.xlsx"
|
||||
}
|
||||
```
|
||||
|
||||
**Error Response:**
|
||||
```json
|
||||
{
|
||||
"detail": "Invalid file type. Only .xlsx and .xls files are allowed."
|
||||
}
|
||||
```
|
||||
|
||||
## File Storage
|
||||
|
||||
Uploaded files are stored in:
|
||||
```
|
||||
/data/uploads/YYYYMMDD_HHMMSS_original_filename.xlsx
|
||||
```
|
||||
|
||||
Example:
|
||||
```
|
||||
/data/uploads/20260306_174530_finance_report.xlsx
|
||||
```
|
||||
|
||||
## Airflow Integration
|
||||
|
||||
The upload endpoint includes a placeholder for Airflow integration. To enable:
|
||||
|
||||
1. **Configure Airflow endpoint** in `app/routes/pages.py`:
|
||||
```python
|
||||
airflow_url = "http://airflow-webserver:8080/api/v1/dags/{dag_id}/dagRuns"
|
||||
```
|
||||
|
||||
2. **Set DAG ID** for the finance processing job
|
||||
|
||||
3. **Uncomment the Airflow trigger code** in `trigger_airflow_job()` function
|
||||
|
||||
4. **Configure authentication** (username/password or API token)
|
||||
|
||||
### Example Airflow Integration
|
||||
|
||||
```python
|
||||
async def trigger_airflow_job(filepath: str, upload_id: str) -> str:
|
||||
import httpx
|
||||
|
||||
airflow_url = "http://airflow-webserver:8080/api/v1/dags/finance_processing/dagRuns"
|
||||
headers = {"Content-Type": "application/json"}
|
||||
auth = ("airflow", "airflow")
|
||||
|
||||
payload = {
|
||||
"conf": {
|
||||
"filepath": filepath,
|
||||
"upload_id": upload_id
|
||||
}
|
||||
}
|
||||
|
||||
async with httpx.AsyncClient() as client:
|
||||
response = await client.post(
|
||||
airflow_url,
|
||||
json=payload,
|
||||
headers=headers,
|
||||
auth=auth
|
||||
)
|
||||
response.raise_for_status()
|
||||
result = response.json()
|
||||
return result["dag_run_id"]
|
||||
```
|
||||
|
||||
## Upload Status Tracking
|
||||
|
||||
Upload records include:
|
||||
- `id`: Unique identifier
|
||||
- `filename`: Original filename
|
||||
- `filepath`: Full path to saved file
|
||||
- `uploaded_at`: ISO timestamp
|
||||
- `description`: User-provided description
|
||||
- `status`: pending | processing | success | error
|
||||
- `job_id`: Airflow DAG run ID (when triggered)
|
||||
- `logs`: Error logs or processing information
|
||||
|
||||
## Database Storage
|
||||
|
||||
Currently uses in-memory storage (`uploads_db` list). For production:
|
||||
|
||||
1. **Create database table:**
|
||||
```sql
|
||||
CREATE TABLE uploads (
|
||||
id VARCHAR(100) PRIMARY KEY,
|
||||
filename VARCHAR(255) NOT NULL,
|
||||
filepath VARCHAR(500) NOT NULL,
|
||||
uploaded_at TIMESTAMP NOT NULL,
|
||||
description TEXT,
|
||||
status VARCHAR(20) NOT NULL,
|
||||
job_id VARCHAR(100),
|
||||
logs TEXT,
|
||||
created_at TIMESTAMP DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
2. **Replace in-memory storage** with SQLAlchemy model
|
||||
|
||||
3. **Add database queries** in route handlers
|
||||
|
||||
## Access URLs
|
||||
|
||||
- **Local**: http://localhost:8040/apiservice/
|
||||
- **Production**: https://ai.sriphat.com/apiservice/
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **File validation**: Only .xlsx and .xls files allowed
|
||||
2. **Filename sanitization**: Spaces replaced with underscores
|
||||
3. **Unique filenames**: Timestamp prefix prevents conflicts
|
||||
4. **File size limits**: Configure in nginx/FastAPI if needed
|
||||
5. **Authentication**: Add authentication middleware for production
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
1. **Database persistence**: Replace in-memory storage
|
||||
2. **File preview**: Show Excel data preview before processing
|
||||
3. **Batch upload**: Support multiple file uploads
|
||||
4. **Download processed files**: Allow downloading results
|
||||
5. **User management**: Track uploads by user
|
||||
6. **Email notifications**: Notify on processing completion
|
||||
7. **Scheduled jobs**: Automatic periodic processing
|
||||
8. **Data validation**: Validate Excel structure before processing
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Upload fails with 500 error
|
||||
- Check `/data/uploads/` directory exists and is writable
|
||||
- Verify volume mount in docker-compose.yml
|
||||
- Check container logs: `docker logs apiservice`
|
||||
|
||||
### Files not persisting after container restart
|
||||
- Ensure volume mount is configured: `./data/uploads:/data/uploads`
|
||||
- Check file permissions on host directory
|
||||
|
||||
### Airflow job not triggering
|
||||
- Verify Airflow is accessible from container
|
||||
- Check network connectivity: `docker network inspect shared_data_network`
|
||||
- Verify Airflow credentials and DAG ID
|
||||
- Check logs in upload history
|
||||
|
||||
## Development
|
||||
|
||||
To add new data management pages:
|
||||
|
||||
1. Create HTML template in `app/templates/`
|
||||
2. Add route in `app/routes/pages.py`
|
||||
3. Update landing page menu in `index.html`
|
||||
4. Add submenu item if needed
|
||||
|
||||
Example:
|
||||
```python
|
||||
@router.get("/data-management/hr", response_class=HTMLResponse)
|
||||
async def hr_page(request: Request):
|
||||
return templates.TemplateResponse(
|
||||
"data_management_hr.html",
|
||||
{"request": request, "root_path": settings.ROOT_PATH}
|
||||
)
|
||||
```
|
||||
Reference in New Issue
Block a user