HaimKortovich 9a58242a91
All checks were successful
Build and Publish / build-release (push) Successful in 57s
prefix route
2026-04-29 13:09:32 -05:00
2026-04-29 13:09:32 -05:00
2026-04-24 14:38:05 -05:00
2026-04-29 13:09:32 -05:00
2026-04-23 16:20:58 -05:00
2026-04-23 16:20:58 -05:00
2026-04-23 16:20:58 -05:00
2026-04-23 16:51:11 -05:00
2026-04-29 13:09:32 -05:00
2026-04-23 16:20:58 -05:00

Document Service

Generic document management service with S3 storage and PDF field discovery.

Features

  • Multi-format support: PDF, DOCX, XLSX, JPG, JPEG, PNG, GIF
  • S3 storage: Configurable S3-compatible storage (MinIO, AWS S3, etc.)
  • PDF field discovery: Extract form fields from PDF documents
  • Organization-based access control: Documents scoped to organizations
  • File size limits: Configurable per document type
  • Content type detection: Automatic detection using python-magic
  • Comprehensive logging: All operations logged for audit trail

API Endpoints

Upload Document

POST /api/v1/documents/upload
Content-Type: multipart/form-data
Authorization: Bearer <token>

Form data:
- file: (required) Document file
- uploaded_by: (optional) User who uploaded the document

Response:
{
  "document_id": "uuid",
  "metadata": {...},
  "download_url": "presigned-url"
}

Rewrite Document

PUT /api/v1/documents/{document_id}
Content-Type: multipart/form-data
Authorization: Bearer <token>

Form data:
- file: (required) New document file
- uploaded_by: (optional) User who uploaded the document

Response:
{
  "document_id": "uuid",
  "metadata": {...},
  "download_url": "presigned-url"
}

Get Document Metadata

GET /api/v1/documents/{document_id}
Authorization: Bearer <token>

Response:
{
  "document_id": "uuid",
  "org_id": "org-id",
  "uploaded_by": "user",
  "document_type": "pdf",
  "filename": "document.pdf",
  "content_type": "application/pdf",
  "file_size": 12345,
  "s3_key": "documents/org-id/uuid/document.pdf",
  "created_at": "2024-01-01T00:00:00",
  "updated_at": "2024-01-01T00:00:00"
}

Get Download URL

GET /api/v1/documents/{document_id}/download-url?expires_in=3600
Authorization: Bearer <token>

Response:
{
  "download_url": "presigned-url",
  "s3_key": "documents/org-id/uuid/document.pdf",
  "expires_in": 3600
}

Get PDF Fields

GET /api/v1/documents/{document_id}/fields
Authorization: Bearer <token>

Response:
{
  "document_id": "uuid",
  "document_type": "pdf",
  "fields": [
    {
      "field": "field_name",
      "label": "Field Name",
      "type": "string",
      "required": false,
      "options": null
    }
  ]
}

Delete Document

DELETE /api/v1/documents/{document_id}
Authorization: Bearer <token>

Response:
{
  "message": "Document deleted successfully"
}

Configuration

Environment Variables

Variable Description Default
S3_ENDPOINT S3 endpoint URL http://localhost:9000
S3_ACCESS_KEY S3 access key minioadmin
S3_SECRET_KEY S3 secret key minioadmin
S3_BUCKET S3 bucket name document-bucket
S3_REGION S3 region us-east-1
HOST Service host 0.0.0.0
PORT Service port 8082
TEST_UPLOADER Default uploader for testing test-user
LOG_LEVEL Logging level INFO

File Size Limits

Document Type Default Limit
PDF 50MB
DOCX 25MB
XLSX 25MB
JPG/JPEG 10MB
PNG 10MB
GIF 10MB
Other 10MB

Authentication

The service uses JWT tokens for authentication. The org_id is extracted from the token claims and used for organization-based access control.

Note: Currently, the auth middleware includes a mock implementation for testing. In production, this should be replaced with proper Zitadel integration.

Development

Setup

This project uses uv2nix for reproducible Python dependency management with Nix.

# Enter the development shell (uses uv2nix)
nix develop

# The development shell includes:
# - Python with all dependencies from uv.lock
# - uv tool for package management
# - pyright for type checking
# - file package (provides libmagic for content type detection)

Running the Service

# Start the development server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8082

# Access API documentation
open http://localhost:8082/docs

Adding Dependencies

# Add a new dependency
uv add <package-name>

# Add a development dependency
uv add --dev <package-name>

# Update the lock file
uv lock

Testing

# Run tests
pytest

# Run with coverage
pytest --cov=app

Linting

# Run ruff
ruff check app/

# Format code
ruff format app/

Building Production Package

# Build the production package
nix build

# The package will be available at ./result

Deployment

Using Helm

# Install chart
helm install document-service ./ops/chart

# Upgrade chart
helm upgrade document-service ./ops/chart

# Uninstall
helm uninstall document-service

Configuration

Edit ops/chart/values.yaml to customize deployment settings.

S3 Path Structure

Documents are stored in S3 using the following path structure:

documents/{org_id}/{document_id}/{filename}

Example:

documents/org-123/abc-456-def-789/policy_document.pdf

Logging

All operations are logged with the following information:

  • Operation type (upload, download, delete, etc.)
  • Document ID
  • Organization ID
  • User ID
  • Timestamp
  • Success/failure status

Error Handling

The service returns appropriate HTTP status codes:

  • 200 - Success
  • 201 - Created
  • 400 - Bad Request
  • 401 - Unauthorized
  • 403 - Forbidden
  • 404 - Not Found
  • 413 - Payload Too Large (file size exceeded)
  • 415 - Unsupported Media Type
  • 500 - Internal Server Error

TODO

  • Implement proper Zitadel authentication
  • Add document listing endpoint
  • Add document search functionality
  • Add document versioning support
  • Add document conversion capabilities
  • Add comprehensive test coverage
  • Add API rate limiting
  • Add metrics and monitoring
Description
No description provided
Readme 180 KiB
Languages
Python 93.1%
Nix 6.7%
Smarty 0.2%