All checks were successful
Build and Publish / build-release (push) Successful in 57s
Document Service
Generic document management service with S3 storage and PDF field discovery.
Features
- Multi-format support: PDF, DOCX, XLSX, JPG, JPEG, PNG, GIF
- S3 storage: Configurable S3-compatible storage (MinIO, AWS S3, etc.)
- PDF field discovery: Extract form fields from PDF documents
- Organization-based access control: Documents scoped to organizations
- File size limits: Configurable per document type
- Content type detection: Automatic detection using python-magic
- Comprehensive logging: All operations logged for audit trail
API Endpoints
Upload Document
POST /api/v1/documents/upload
Content-Type: multipart/form-data
Authorization: Bearer <token>
Form data:
- file: (required) Document file
- uploaded_by: (optional) User who uploaded the document
Response:
{
"document_id": "uuid",
"metadata": {...},
"download_url": "presigned-url"
}
Rewrite Document
PUT /api/v1/documents/{document_id}
Content-Type: multipart/form-data
Authorization: Bearer <token>
Form data:
- file: (required) New document file
- uploaded_by: (optional) User who uploaded the document
Response:
{
"document_id": "uuid",
"metadata": {...},
"download_url": "presigned-url"
}
Get Document Metadata
GET /api/v1/documents/{document_id}
Authorization: Bearer <token>
Response:
{
"document_id": "uuid",
"org_id": "org-id",
"uploaded_by": "user",
"document_type": "pdf",
"filename": "document.pdf",
"content_type": "application/pdf",
"file_size": 12345,
"s3_key": "documents/org-id/uuid/document.pdf",
"created_at": "2024-01-01T00:00:00",
"updated_at": "2024-01-01T00:00:00"
}
Get Download URL
GET /api/v1/documents/{document_id}/download-url?expires_in=3600
Authorization: Bearer <token>
Response:
{
"download_url": "presigned-url",
"s3_key": "documents/org-id/uuid/document.pdf",
"expires_in": 3600
}
Get PDF Fields
GET /api/v1/documents/{document_id}/fields
Authorization: Bearer <token>
Response:
{
"document_id": "uuid",
"document_type": "pdf",
"fields": [
{
"field": "field_name",
"label": "Field Name",
"type": "string",
"required": false,
"options": null
}
]
}
Delete Document
DELETE /api/v1/documents/{document_id}
Authorization: Bearer <token>
Response:
{
"message": "Document deleted successfully"
}
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
S3_ENDPOINT |
S3 endpoint URL | http://localhost:9000 |
S3_ACCESS_KEY |
S3 access key | minioadmin |
S3_SECRET_KEY |
S3 secret key | minioadmin |
S3_BUCKET |
S3 bucket name | document-bucket |
S3_REGION |
S3 region | us-east-1 |
HOST |
Service host | 0.0.0.0 |
PORT |
Service port | 8082 |
TEST_UPLOADER |
Default uploader for testing | test-user |
LOG_LEVEL |
Logging level | INFO |
File Size Limits
| Document Type | Default Limit |
|---|---|
| 50MB | |
| DOCX | 25MB |
| XLSX | 25MB |
| JPG/JPEG | 10MB |
| PNG | 10MB |
| GIF | 10MB |
| Other | 10MB |
Authentication
The service uses JWT tokens for authentication. The org_id is extracted from the token claims and used for organization-based access control.
Note: Currently, the auth middleware includes a mock implementation for testing. In production, this should be replaced with proper Zitadel integration.
Development
Setup
This project uses uv2nix for reproducible Python dependency management with Nix.
# Enter the development shell (uses uv2nix)
nix develop
# The development shell includes:
# - Python with all dependencies from uv.lock
# - uv tool for package management
# - pyright for type checking
# - file package (provides libmagic for content type detection)
Running the Service
# Start the development server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8082
# Access API documentation
open http://localhost:8082/docs
Adding Dependencies
# Add a new dependency
uv add <package-name>
# Add a development dependency
uv add --dev <package-name>
# Update the lock file
uv lock
Testing
# Run tests
pytest
# Run with coverage
pytest --cov=app
Linting
# Run ruff
ruff check app/
# Format code
ruff format app/
Building Production Package
# Build the production package
nix build
# The package will be available at ./result
Deployment
Using Helm
# Install chart
helm install document-service ./ops/chart
# Upgrade chart
helm upgrade document-service ./ops/chart
# Uninstall
helm uninstall document-service
Configuration
Edit ops/chart/values.yaml to customize deployment settings.
S3 Path Structure
Documents are stored in S3 using the following path structure:
documents/{org_id}/{document_id}/{filename}
Example:
documents/org-123/abc-456-def-789/policy_document.pdf
Logging
All operations are logged with the following information:
- Operation type (upload, download, delete, etc.)
- Document ID
- Organization ID
- User ID
- Timestamp
- Success/failure status
Error Handling
The service returns appropriate HTTP status codes:
200- Success201- Created400- Bad Request401- Unauthorized403- Forbidden404- Not Found413- Payload Too Large (file size exceeded)415- Unsupported Media Type500- Internal Server Error
TODO
- Implement proper Zitadel authentication
- Add document listing endpoint
- Add document search functionality
- Add document versioning support
- Add document conversion capabilities
- Add comprehensive test coverage
- Add API rate limiting
- Add metrics and monitoring
Description
Languages
Python
93.1%
Nix
6.7%
Smarty
0.2%