227 lines
6.2 KiB
Markdown
227 lines
6.2 KiB
Markdown
# Web Pages Documentation
|
|
|
|
## Overview
|
|
|
|
The API service now includes web pages for easy access to all platform services and data management functionality.
|
|
|
|
## Pages
|
|
|
|
### 1. Landing Page (`/`)
|
|
|
|
Main dashboard with navigation to all services:
|
|
- **Supabase**: PostgreSQL database with REST API
|
|
- **API Docs**: Interactive API documentation
|
|
- **Airflow**: Workflow orchestration
|
|
- **Airbyte**: Data integration and ETL
|
|
- **Data Management**: File upload and processing (with submenu)
|
|
- **DBT**: Data transformation
|
|
- **Superset**: Business intelligence and visualization
|
|
|
|
### 2. Finance Excel Upload (`/data-management/finance`)
|
|
|
|
Upload and process financial Excel files with the following features:
|
|
|
|
**Upload Form:**
|
|
- Drag-and-drop or click to select Excel files (.xlsx, .xls)
|
|
- Optional description field
|
|
- File validation and size display
|
|
|
|
**Upload Processing:**
|
|
- Files saved to `/data/uploads/` with timestamp prefix
|
|
- Unique filename generation to prevent conflicts
|
|
- Automatic Airflow job triggering (when configured)
|
|
|
|
**Upload History:**
|
|
- Real-time list of all uploaded files
|
|
- Status tracking: pending, processing, success, error
|
|
- Expandable accordion for logs and error details
|
|
- Auto-refresh every 10 seconds
|
|
|
|
## API Endpoints
|
|
|
|
### Page Routes
|
|
|
|
```
|
|
GET / - Landing page
|
|
GET /data-management/finance - Finance upload page
|
|
POST /data-management/finance/upload - Upload file endpoint
|
|
GET /data-management/finance/uploads - Get all uploads
|
|
GET /data-management/finance/uploads/{upload_id} - Get specific upload
|
|
```
|
|
|
|
### Upload Endpoint
|
|
|
|
**POST** `/data-management/finance/upload`
|
|
|
|
**Form Data:**
|
|
- `file`: Excel file (.xlsx or .xls)
|
|
- `description`: Optional description text
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"success": true,
|
|
"message": "File 'example.xlsx' uploaded successfully",
|
|
"upload_id": "upload_20260306_174500",
|
|
"filename": "20260306_174500_example.xlsx"
|
|
}
|
|
```
|
|
|
|
**Error Response:**
|
|
```json
|
|
{
|
|
"detail": "Invalid file type. Only .xlsx and .xls files are allowed."
|
|
}
|
|
```
|
|
|
|
## File Storage
|
|
|
|
Uploaded files are stored in:
|
|
```
|
|
/data/uploads/YYYYMMDD_HHMMSS_original_filename.xlsx
|
|
```
|
|
|
|
Example:
|
|
```
|
|
/data/uploads/20260306_174530_finance_report.xlsx
|
|
```
|
|
|
|
## Airflow Integration
|
|
|
|
The upload endpoint includes a placeholder for Airflow integration. To enable:
|
|
|
|
1. **Configure Airflow endpoint** in `app/routes/pages.py`:
|
|
```python
|
|
airflow_url = "http://airflow-webserver:8080/api/v1/dags/{dag_id}/dagRuns"
|
|
```
|
|
|
|
2. **Set DAG ID** for the finance processing job
|
|
|
|
3. **Uncomment the Airflow trigger code** in `trigger_airflow_job()` function
|
|
|
|
4. **Configure authentication** (username/password or API token)
|
|
|
|
### Example Airflow Integration
|
|
|
|
```python
|
|
async def trigger_airflow_job(filepath: str, upload_id: str) -> str:
|
|
import httpx
|
|
|
|
airflow_url = "http://airflow-webserver:8080/api/v1/dags/finance_processing/dagRuns"
|
|
headers = {"Content-Type": "application/json"}
|
|
auth = ("airflow", "airflow")
|
|
|
|
payload = {
|
|
"conf": {
|
|
"filepath": filepath,
|
|
"upload_id": upload_id
|
|
}
|
|
}
|
|
|
|
async with httpx.AsyncClient() as client:
|
|
response = await client.post(
|
|
airflow_url,
|
|
json=payload,
|
|
headers=headers,
|
|
auth=auth
|
|
)
|
|
response.raise_for_status()
|
|
result = response.json()
|
|
return result["dag_run_id"]
|
|
```
|
|
|
|
## Upload Status Tracking
|
|
|
|
Upload records include:
|
|
- `id`: Unique identifier
|
|
- `filename`: Original filename
|
|
- `filepath`: Full path to saved file
|
|
- `uploaded_at`: ISO timestamp
|
|
- `description`: User-provided description
|
|
- `status`: pending | processing | success | error
|
|
- `job_id`: Airflow DAG run ID (when triggered)
|
|
- `logs`: Error logs or processing information
|
|
|
|
## Database Storage
|
|
|
|
Currently uses in-memory storage (`uploads_db` list). For production:
|
|
|
|
1. **Create database table:**
|
|
```sql
|
|
CREATE TABLE uploads (
|
|
id VARCHAR(100) PRIMARY KEY,
|
|
filename VARCHAR(255) NOT NULL,
|
|
filepath VARCHAR(500) NOT NULL,
|
|
uploaded_at TIMESTAMP NOT NULL,
|
|
description TEXT,
|
|
status VARCHAR(20) NOT NULL,
|
|
job_id VARCHAR(100),
|
|
logs TEXT,
|
|
created_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
```
|
|
|
|
2. **Replace in-memory storage** with SQLAlchemy model
|
|
|
|
3. **Add database queries** in route handlers
|
|
|
|
## Access URLs
|
|
|
|
- **Local**: http://localhost:8040/apiservice/
|
|
- **Production**: https://ai.sriphat.com/apiservice/
|
|
|
|
## Security Considerations
|
|
|
|
1. **File validation**: Only .xlsx and .xls files allowed
|
|
2. **Filename sanitization**: Spaces replaced with underscores
|
|
3. **Unique filenames**: Timestamp prefix prevents conflicts
|
|
4. **File size limits**: Configure in nginx/FastAPI if needed
|
|
5. **Authentication**: Add authentication middleware for production
|
|
|
|
## Future Enhancements
|
|
|
|
1. **Database persistence**: Replace in-memory storage
|
|
2. **File preview**: Show Excel data preview before processing
|
|
3. **Batch upload**: Support multiple file uploads
|
|
4. **Download processed files**: Allow downloading results
|
|
5. **User management**: Track uploads by user
|
|
6. **Email notifications**: Notify on processing completion
|
|
7. **Scheduled jobs**: Automatic periodic processing
|
|
8. **Data validation**: Validate Excel structure before processing
|
|
|
|
## Troubleshooting
|
|
|
|
### Upload fails with 500 error
|
|
- Check `/data/uploads/` directory exists and is writable
|
|
- Verify volume mount in docker-compose.yml
|
|
- Check container logs: `docker logs apiservice`
|
|
|
|
### Files not persisting after container restart
|
|
- Ensure volume mount is configured: `./data/uploads:/data/uploads`
|
|
- Check file permissions on host directory
|
|
|
|
### Airflow job not triggering
|
|
- Verify Airflow is accessible from container
|
|
- Check network connectivity: `docker network inspect shared_data_network`
|
|
- Verify Airflow credentials and DAG ID
|
|
- Check logs in upload history
|
|
|
|
## Development
|
|
|
|
To add new data management pages:
|
|
|
|
1. Create HTML template in `app/templates/`
|
|
2. Add route in `app/routes/pages.py`
|
|
3. Update landing page menu in `index.html`
|
|
4. Add submenu item if needed
|
|
|
|
Example:
|
|
```python
|
|
@router.get("/data-management/hr", response_class=HTMLResponse)
|
|
async def hr_page(request: Request):
|
|
return templates.TemplateResponse(
|
|
"data_management_hr.html",
|
|
{"request": request, "root_path": settings.ROOT_PATH}
|
|
)
|
|
```
|