Files
sriphat-dataplatform/03-apiservice/README-PAGES.md

227 lines
6.2 KiB
Markdown

# Web Pages Documentation
## Overview
The API service now includes web pages for easy access to all platform services and data management functionality.
## Pages
### 1. Landing Page (`/`)
Main dashboard with navigation to all services:
- **Supabase**: PostgreSQL database with REST API
- **API Docs**: Interactive API documentation
- **Airflow**: Workflow orchestration
- **Airbyte**: Data integration and ETL
- **Data Management**: File upload and processing (with submenu)
- **DBT**: Data transformation
- **Superset**: Business intelligence and visualization
### 2. Finance Excel Upload (`/data-management/finance`)
Upload and process financial Excel files with the following features:
**Upload Form:**
- Drag-and-drop or click to select Excel files (.xlsx, .xls)
- Optional description field
- File validation and size display
**Upload Processing:**
- Files saved to `/data/uploads/` with timestamp prefix
- Unique filename generation to prevent conflicts
- Automatic Airflow job triggering (when configured)
**Upload History:**
- Real-time list of all uploaded files
- Status tracking: pending, processing, success, error
- Expandable accordion for logs and error details
- Auto-refresh every 10 seconds
## API Endpoints
### Page Routes
```
GET / - Landing page
GET /data-management/finance - Finance upload page
POST /data-management/finance/upload - Upload file endpoint
GET /data-management/finance/uploads - Get all uploads
GET /data-management/finance/uploads/{upload_id} - Get specific upload
```
### Upload Endpoint
**POST** `/data-management/finance/upload`
**Form Data:**
- `file`: Excel file (.xlsx or .xls)
- `description`: Optional description text
**Response:**
```json
{
"success": true,
"message": "File 'example.xlsx' uploaded successfully",
"upload_id": "upload_20260306_174500",
"filename": "20260306_174500_example.xlsx"
}
```
**Error Response:**
```json
{
"detail": "Invalid file type. Only .xlsx and .xls files are allowed."
}
```
## File Storage
Uploaded files are stored in:
```
/data/uploads/YYYYMMDD_HHMMSS_original_filename.xlsx
```
Example:
```
/data/uploads/20260306_174530_finance_report.xlsx
```
## Airflow Integration
The upload endpoint includes a placeholder for Airflow integration. To enable:
1. **Configure Airflow endpoint** in `app/routes/pages.py`:
```python
airflow_url = "http://airflow-webserver:8080/api/v1/dags/{dag_id}/dagRuns"
```
2. **Set DAG ID** for the finance processing job
3. **Uncomment the Airflow trigger code** in `trigger_airflow_job()` function
4. **Configure authentication** (username/password or API token)
### Example Airflow Integration
```python
async def trigger_airflow_job(filepath: str, upload_id: str) -> str:
import httpx
airflow_url = "http://airflow-webserver:8080/api/v1/dags/finance_processing/dagRuns"
headers = {"Content-Type": "application/json"}
auth = ("airflow", "airflow")
payload = {
"conf": {
"filepath": filepath,
"upload_id": upload_id
}
}
async with httpx.AsyncClient() as client:
response = await client.post(
airflow_url,
json=payload,
headers=headers,
auth=auth
)
response.raise_for_status()
result = response.json()
return result["dag_run_id"]
```
## Upload Status Tracking
Upload records include:
- `id`: Unique identifier
- `filename`: Original filename
- `filepath`: Full path to saved file
- `uploaded_at`: ISO timestamp
- `description`: User-provided description
- `status`: pending | processing | success | error
- `job_id`: Airflow DAG run ID (when triggered)
- `logs`: Error logs or processing information
## Database Storage
Currently uses in-memory storage (`uploads_db` list). For production:
1. **Create database table:**
```sql
CREATE TABLE uploads (
id VARCHAR(100) PRIMARY KEY,
filename VARCHAR(255) NOT NULL,
filepath VARCHAR(500) NOT NULL,
uploaded_at TIMESTAMP NOT NULL,
description TEXT,
status VARCHAR(20) NOT NULL,
job_id VARCHAR(100),
logs TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
```
2. **Replace in-memory storage** with SQLAlchemy model
3. **Add database queries** in route handlers
## Access URLs
- **Local**: http://localhost:8040/apiservice/
- **Production**: https://ai.sriphat.com/apiservice/
## Security Considerations
1. **File validation**: Only .xlsx and .xls files allowed
2. **Filename sanitization**: Spaces replaced with underscores
3. **Unique filenames**: Timestamp prefix prevents conflicts
4. **File size limits**: Configure in nginx/FastAPI if needed
5. **Authentication**: Add authentication middleware for production
## Future Enhancements
1. **Database persistence**: Replace in-memory storage
2. **File preview**: Show Excel data preview before processing
3. **Batch upload**: Support multiple file uploads
4. **Download processed files**: Allow downloading results
5. **User management**: Track uploads by user
6. **Email notifications**: Notify on processing completion
7. **Scheduled jobs**: Automatic periodic processing
8. **Data validation**: Validate Excel structure before processing
## Troubleshooting
### Upload fails with 500 error
- Check `/data/uploads/` directory exists and is writable
- Verify volume mount in docker-compose.yml
- Check container logs: `docker logs apiservice`
### Files not persisting after container restart
- Ensure volume mount is configured: `./data/uploads:/data/uploads`
- Check file permissions on host directory
### Airflow job not triggering
- Verify Airflow is accessible from container
- Check network connectivity: `docker network inspect shared_data_network`
- Verify Airflow credentials and DAG ID
- Check logs in upload history
## Development
To add new data management pages:
1. Create HTML template in `app/templates/`
2. Add route in `app/routes/pages.py`
3. Update landing page menu in `index.html`
4. Add submenu item if needed
Example:
```python
@router.get("/data-management/hr", response_class=HTMLResponse)
async def hr_page(request: Request):
return templates.TemplateResponse(
"data_management_hr.html",
{"request": request, "root_path": settings.ROOT_PATH}
)
```