Initial commit of document-service
This commit is contained in:
286
README.md
Normal file
286
README.md
Normal file
@@ -0,0 +1,286 @@
|
||||
# Document Service
|
||||
|
||||
Generic document management service with S3 storage and PDF field discovery.
|
||||
|
||||
## Features
|
||||
|
||||
- **Multi-format support**: PDF, DOCX, XLSX, JPG, JPEG, PNG, GIF
|
||||
- **S3 storage**: Configurable S3-compatible storage (MinIO, AWS S3, etc.)
|
||||
- **PDF field discovery**: Extract form fields from PDF documents
|
||||
- **Organization-based access control**: Documents scoped to organizations
|
||||
- **File size limits**: Configurable per document type
|
||||
- **Content type detection**: Automatic detection using python-magic
|
||||
- **Comprehensive logging**: All operations logged for audit trail
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Upload Document
|
||||
```
|
||||
POST /api/documents/upload
|
||||
Content-Type: multipart/form-data
|
||||
Authorization: Bearer <token>
|
||||
|
||||
Form data:
|
||||
- file: (required) Document file
|
||||
- uploaded_by: (optional) User who uploaded the document
|
||||
|
||||
Response:
|
||||
{
|
||||
"document_id": "uuid",
|
||||
"metadata": {...},
|
||||
"download_url": "presigned-url"
|
||||
}
|
||||
```
|
||||
|
||||
### Rewrite Document
|
||||
```
|
||||
PUT /api/documents/{document_id}
|
||||
Content-Type: multipart/form-data
|
||||
Authorization: Bearer <token>
|
||||
|
||||
Form data:
|
||||
- file: (required) New document file
|
||||
- uploaded_by: (optional) User who uploaded the document
|
||||
|
||||
Response:
|
||||
{
|
||||
"document_id": "uuid",
|
||||
"metadata": {...},
|
||||
"download_url": "presigned-url"
|
||||
}
|
||||
```
|
||||
|
||||
### Get Document Metadata
|
||||
```
|
||||
GET /api/documents/{document_id}
|
||||
Authorization: Bearer <token>
|
||||
|
||||
Response:
|
||||
{
|
||||
"document_id": "uuid",
|
||||
"org_id": "org-id",
|
||||
"uploaded_by": "user",
|
||||
"document_type": "pdf",
|
||||
"filename": "document.pdf",
|
||||
"content_type": "application/pdf",
|
||||
"file_size": 12345,
|
||||
"s3_key": "documents/org-id/uuid/document.pdf",
|
||||
"created_at": "2024-01-01T00:00:00",
|
||||
"updated_at": "2024-01-01T00:00:00"
|
||||
}
|
||||
```
|
||||
|
||||
### Get Download URL
|
||||
```
|
||||
GET /api/documents/{document_id}/download-url?expires_in=3600
|
||||
Authorization: Bearer <token>
|
||||
|
||||
Response:
|
||||
{
|
||||
"download_url": "presigned-url",
|
||||
"s3_key": "documents/org-id/uuid/document.pdf",
|
||||
"expires_in": 3600
|
||||
}
|
||||
```
|
||||
|
||||
### Get PDF Fields
|
||||
```
|
||||
GET /api/documents/{document_id}/fields
|
||||
Authorization: Bearer <token>
|
||||
|
||||
Response:
|
||||
{
|
||||
"document_id": "uuid",
|
||||
"document_type": "pdf",
|
||||
"fields": [
|
||||
{
|
||||
"field": "field_name",
|
||||
"label": "Field Name",
|
||||
"type": "string",
|
||||
"required": false,
|
||||
"options": null
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Delete Document
|
||||
```
|
||||
DELETE /api/documents/{document_id}
|
||||
Authorization: Bearer <token>
|
||||
|
||||
Response:
|
||||
{
|
||||
"message": "Document deleted successfully"
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `S3_ENDPOINT` | S3 endpoint URL | `http://localhost:9000` |
|
||||
| `S3_ACCESS_KEY` | S3 access key | `minioadmin` |
|
||||
| `S3_SECRET_KEY` | S3 secret key | `minioadmin` |
|
||||
| `S3_BUCKET` | S3 bucket name | `document-bucket` |
|
||||
| `S3_REGION` | S3 region | `us-east-1` |
|
||||
| `HOST` | Service host | `0.0.0.0` |
|
||||
| `PORT` | Service port | `8082` |
|
||||
| `TEST_UPLOADER` | Default uploader for testing | `test-user` |
|
||||
| `LOG_LEVEL` | Logging level | `INFO` |
|
||||
|
||||
### File Size Limits
|
||||
|
||||
| Document Type | Default Limit |
|
||||
|---------------|---------------|
|
||||
| PDF | 50MB |
|
||||
| DOCX | 25MB |
|
||||
| XLSX | 25MB |
|
||||
| JPG/JPEG | 10MB |
|
||||
| PNG | 10MB |
|
||||
| GIF | 10MB |
|
||||
| Other | 10MB |
|
||||
|
||||
## Authentication
|
||||
|
||||
The service uses JWT tokens for authentication. The `org_id` is extracted from the token claims and used for organization-based access control.
|
||||
|
||||
**Note**: Currently, the auth middleware includes a mock implementation for testing. In production, this should be replaced with proper Zitadel integration.
|
||||
|
||||
## Development
|
||||
|
||||
### Setup
|
||||
|
||||
This project uses [uv2nix](https://pyproject-nix.github.io/uv2nix/) for reproducible Python dependency management with Nix.
|
||||
|
||||
```bash
|
||||
# Enter the development shell (uses uv2nix)
|
||||
nix develop
|
||||
|
||||
# The development shell includes:
|
||||
# - Python with all dependencies from uv.lock
|
||||
# - uv tool for package management
|
||||
# - pyright for type checking
|
||||
# - file package (provides libmagic for content type detection)
|
||||
```
|
||||
|
||||
### Running the Service
|
||||
|
||||
```bash
|
||||
# Start the development server
|
||||
uvicorn app.main:app --reload --host 0.0.0.0 --port 8082
|
||||
|
||||
# Access API documentation
|
||||
open http://localhost:8082/docs
|
||||
```
|
||||
|
||||
### Adding Dependencies
|
||||
|
||||
```bash
|
||||
# Add a new dependency
|
||||
uv add <package-name>
|
||||
|
||||
# Add a development dependency
|
||||
uv add --dev <package-name>
|
||||
|
||||
# Update the lock file
|
||||
uv lock
|
||||
```
|
||||
|
||||
### Testing
|
||||
|
||||
```bash
|
||||
# Run tests
|
||||
pytest
|
||||
|
||||
# Run with coverage
|
||||
pytest --cov=app
|
||||
```
|
||||
|
||||
### Linting
|
||||
|
||||
```bash
|
||||
# Run ruff
|
||||
ruff check app/
|
||||
|
||||
# Format code
|
||||
ruff format app/
|
||||
```
|
||||
|
||||
### Building Production Package
|
||||
|
||||
```bash
|
||||
# Build the production package
|
||||
nix build
|
||||
|
||||
# The package will be available at ./result
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
### Using Helm
|
||||
|
||||
```bash
|
||||
# Install chart
|
||||
helm install document-service ./ops/chart
|
||||
|
||||
# Upgrade chart
|
||||
helm upgrade document-service ./ops/chart
|
||||
|
||||
# Uninstall
|
||||
helm uninstall document-service
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
Edit `ops/chart/values.yaml` to customize deployment settings.
|
||||
|
||||
## S3 Path Structure
|
||||
|
||||
Documents are stored in S3 using the following path structure:
|
||||
|
||||
```
|
||||
documents/{org_id}/{document_id}/{filename}
|
||||
```
|
||||
|
||||
Example:
|
||||
```
|
||||
documents/org-123/abc-456-def-789/policy_document.pdf
|
||||
```
|
||||
|
||||
## Logging
|
||||
|
||||
All operations are logged with the following information:
|
||||
- Operation type (upload, download, delete, etc.)
|
||||
- Document ID
|
||||
- Organization ID
|
||||
- User ID
|
||||
- Timestamp
|
||||
- Success/failure status
|
||||
|
||||
## Error Handling
|
||||
|
||||
The service returns appropriate HTTP status codes:
|
||||
|
||||
- `200` - Success
|
||||
- `201` - Created
|
||||
- `400` - Bad Request
|
||||
- `401` - Unauthorized
|
||||
- `403` - Forbidden
|
||||
- `404` - Not Found
|
||||
- `413` - Payload Too Large (file size exceeded)
|
||||
- `415` - Unsupported Media Type
|
||||
- `500` - Internal Server Error
|
||||
|
||||
## TODO
|
||||
|
||||
- [ ] Implement proper Zitadel authentication
|
||||
- [ ] Add document listing endpoint
|
||||
- [ ] Add document search functionality
|
||||
- [ ] Add document versioning support
|
||||
- [ ] Add document conversion capabilities
|
||||
- [ ] Add comprehensive test coverage
|
||||
- [ ] Add API rate limiting
|
||||
- [ ] Add metrics and monitoring
|
||||
Reference in New Issue
Block a user