Container Images

Algorithms must be packaged as Docker container images. This ensures consistency, reproducibility, and isolation across different environments.

Why Containers?

Containers provide several benefits:

Language agnostic - Use Python, Java, C++, or any language
Dependency management - Include all libraries and tools
Reproducibility - Same image produces same results
Isolation - Secure execution environment
Resource management - Controlled CPU, memory, and GPU allocation

Container Requirements

Execution Environment

Your container will run in a restricted environment:

No network access - Cannot make HTTP requests or connect to external services
No host access - Cannot access the underlying host filesystem
Read-only filesystem - Except for specified input/output directories
Time-limited - Must complete within timeout period
Resource-constrained - Limited to requested CPU/memory/GPU

Must Include

Your container image must:

Include your algorithm implementation
Include all dependencies (libraries, models, data files)
Respond to the command specified in the manifest
Read input from ALGORITHM_INPUT_PATH environment variable
Write output to the directory specified in input
Exit with code 0 on success, non-zero on failure

Creating a Dockerfile

Basic Structure

# Start with a base image
FROM python:3.9-slim

# Set working directory
WORKDIR /app

# Copy and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy algorithm code
COPY algorithm.py .
COPY utils/ ./utils/

# Copy any additional files (models, configs, etc.)
COPY models/ ./models/

# Container will be invoked via manifest command
# No CMD needed - specified in manifest

Python Example

For a Python algorithm:

FROM python:3.9-slim

WORKDIR /app

# Install system dependencies if needed
RUN apt-get update && apt-get install -y 
    libgeos-dev 
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy algorithm code
COPY device_visits.py .
COPY __init__.py .

# Algorithm will be invoked via command in manifest:
# ["python", "/app/device_visits.py"]

Including ML Models

If your algorithm uses machine learning models:

FROM python:3.9-slim

WORKDIR /app

# Install ML frameworks
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy model weights
COPY models/vehicle_detector_v2.onnx /app/models/
COPY models/config.json /app/models/

# Copy algorithm code
COPY detect_objects.py .

# Model will be loaded from /app/models/ at runtime

Multi-Stage Builds

For smaller images, use multi-stage builds:

# Build stage
FROM python:3.9 as builder

WORKDIR /build

COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt

# Runtime stage
FROM python:3.9-slim

WORKDIR /app

# Copy installed packages from builder
COPY --from=builder /root/.local /root/.local

# Copy algorithm code
COPY algorithm.py .

# Update PATH
ENV PATH=/root/.local/bin:$PATH

Building Images

Build Command

# Build with a tag
docker build -t myorg/device-visits:1.0.0 .

# Build with multiple tags
docker build 
  -t myorg/device-visits:1.0.0 
  -t myorg/device-visits:latest 
  .

# Build for specific platform
docker build --platform linux/amd64 -t myorg/device-visits:1.0.0 .

Tagging Strategy

Use semantic versioning for tags:

# Full version
docker tag myorg/device-visits:1.0.0

# Major.minor version
docker tag myorg/device-visits:1.0

# Major version
docker tag myorg/device-visits:1

# Latest (not recommended for production)
docker tag myorg/device-visits:latest

Image Digests

For maximum reproducibility, use image digests:

# Get image digest
docker images --digests myorg/device-visits

# Output:
# REPOSITORY              TAG    DIGEST
# myorg/device-visits     1.0.0  sha256:abc123...

# Reference by digest in manifest
"image": "myorg/device-visits@sha256:abc123..."

Pushing to Registry

Docker Hub

# Log in
docker login

# Push image
docker push myorg/device-visits:1.0.0

# Push all tags
docker push myorg/device-visits --all-tags

Private Registry

# Log in to private registry
docker login registry.example.com

# Tag for private registry
docker tag myorg/device-visits:1.0.0 registry.example.com/myorg/device-visits:1.0.0

# Push
docker push registry.example.com/myorg/device-visits:1.0.0

Registry Requirements

Your container registry must be:

Publicly accessible - Platform needs to pull without authentication
Reliable - High availability for production algorithms
Versioned - Support multiple tags per image

Testing Locally

Test with Sample Input

Create a test input file:

# Create test directories
mkdir -p /tmp/test/input /tmp/test/output

# Create sample algo_input.json
cat > /tmp/test/algo_input.json << 'EOF'
{
  "version": "0.1.0",
  "input_data_path": "/work/input",
  "output_path": "/work/output",
  "config": {
    "parameters": {
      "look_back_time": 3600
    }
  },
  "input_data": []
}
EOF

# Create sample input data
# (Create appropriate parquet/csv files in /tmp/test/input)

Run Container

# Run with mounted volumes
docker run 
  --rm 
  -v /tmp/test:/work 
  -e ALGORITHM_INPUT_PATH=/work/algo_input.json 
  myorg/device-visits:1.0.0 
  python /app/device_visits.py

# Check output
cat /tmp/test/output/algo_output.json

Test with Resource Limits

Test that your algorithm works within specified resources:

# Run with memory limit
docker run 
  --rm 
  --memory=5g 
  --cpus=0.2 
  -v /tmp/test:/work 
  -e ALGORITHM_INPUT_PATH=/work/algo_input.json 
  myorg/device-visits:1.0.0 
  python /app/device_visits.py

# Monitor resource usage
docker stats

Optimizing Images

Reduce Image Size

# Use slim/alpine base images
FROM python:3.9-slim  # Instead of python:3.9

# Clean up in same layer
RUN apt-get update && apt-get install -y 
    package1 package2 
    && rm -rf /var/lib/apt/lists/*  # Clean up in same RUN

# Use .dockerignore
# Create .dockerignore file:
__pycache__
*.pyc
.git
.gitignore
tests/
docs/

Optimize Layers

# Bad: Creates many layers
COPY file1.py .
COPY file2.py .
COPY file3.py .

# Good: Single layer
COPY file1.py file2.py file3.py ./

# Better: Copy directory
COPY src/ ./src/

Cache Dependencies

# Good: Dependencies change less often than code
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY algorithm.py .  # Code changes don't rebuild dependencies

# Bad: Code changes rebuild dependencies
COPY . .
RUN pip install -r requirements.txt

Security Best Practices

Run as Non-Root User

# Create non-root user
RUN useradd -m -u 1000 algorithm

# Set ownership
RUN chown -R algorithm:algorithm /app

# Switch to non-root user
USER algorithm

# Subsequent commands run as 'algorithm' user

Minimize Installed Packages

# Only install what you need
RUN apt-get update && apt-get install -y 
    libgeos-dev \  # Only needed packages
    && rm -rf /var/lib/apt/lists/*

# Avoid installing:
# - Shells (if not needed)
# - Network tools (curl, wget)
# - Text editors
# - Compilers (if not needed for runtime)

Pin Dependency Versions

# requirements.txt - Pin exact versions
pandas==1.5.3
numpy==1.24.2
pyarrow==11.0.0

# Avoid:
pandas  # No version
pandas>=1.5  # Range allows changes

Troubleshooting

Container Won't Start

# Check image exists
docker images | grep device-visits

# Test interactively
docker run -it --rm myorg/device-visits:1.0.0 /bin/bash

# Check entrypoint/cmd
docker inspect myorg/device-visits:1.0.0 | grep -A 5 '"Cmd"'

Algorithm Fails

# View logs
docker logs <container-id>

# Run with debugging
docker run 
  -e DEBUG=1 
  -v /tmp/test:/work 
  -e ALGORITHM_INPUT_PATH=/work/algo_input.json 
  myorg/device-visits:1.0.0 
  python -u /app/device_visits.py  # -u for unbuffered output

Out of Memory

# Monitor memory usage
docker stats

# Increase memory limit
docker run --memory=10g ...

# Profile memory usage in code
import tracemalloc
tracemalloc.start()
# ... your code ...
current, peak = tracemalloc.get_traced_memory()
print(f"Peak memory: {peak / 1024 / 1024:.2f} MB")

Image Too Large

# Check layer sizes
docker history myorg/device-visits:1.0.0

# Use dive to analyze image
dive myorg/device-visits:1.0.0

# Common culprits:
# - Large model files (consider model compression)
# - Unnecessary build tools (use multi-stage builds)
# - Large datasets (should be external, not in image)

Advanced Patterns

GPU Algorithms

FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04

# Install Python
RUN apt-get update && apt-get install -y python3.9 python3-pip

# Install GPU-enabled frameworks
RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# Copy algorithm
COPY detect_objects.py /app/

WORKDIR /app

Manifest specifies GPU requirement:

{
  "container_parameters": {
    "image": "myorg/gpu-detector:1.0.0",
    "resource_request": {
      "gpu": 1,
      "memory_gb": 16,
      "cpu_millicore": 2000
    }
  }
}

Multiple Binaries

If your algorithm uses multiple languages:

FROM python:3.9-slim

# Install additional runtime
RUN apt-get update && apt-get install -y openjdk-11-jre-headless

# Install Python dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy Python wrapper
COPY algorithm.py .

# Copy Java application
COPY target/processor.jar /app/

# Python script will invoke Java:
# subprocess.run(['java', '-jar', '/app/processor.jar'])

Next Steps

Learn how to Register Algorithms and publish your container to the platform
Review Creating Algorithms for the complete workflow
See Algorithm Input/Output for runtime behavior