Algorithm Manifests

Complete guide to defining algorithm manifests

Algorithm Manifests

An algorithm manifest is a JSON document that describes your algorithm's requirements, capabilities, and behavior. The platform validates this manifest when you register an algorithm.

Complete Example

Here's a complete manifest for a device visits algorithm:

{
  "manifest_version": "0.1.0",
  "metadata": {
    "description": "Produce a list of AOI visits per device, based on geolocation device pings.",
    "tags": ["device_visits", "foot_traffic"],
    "version": "0.0.1"
  },
  "inputs": [
    {
      "data_type_name": "pings",
      "min_count": 1,
      "max_count": 1
    }
  ],
  "outputs": {
    "output_data_types": ["device_visits"],
    "observation_value_columns": ["visit_count"]
  },
  "container_parameters": {
    "image": "orbitalinsight/device_visits:13131a4e",
    "resource_request": {
      "gpu": 0,
      "memory_gb": 5,
      "cpu_millicore": 200
    },
    "command": [
      "python",
      "/app/device_visits.py"
    ]
  },
  "interface": {
    "interface_type": "FILESYSTEM_TASK_WORKER"
  },
  "parameters": [
    {
      "name": "look_back_time",
      "type": "integer",
      "units": "seconds",
      "description": "Number of seconds to look back before the first recorded ping for each device.",
      "min": 0,
      "max": 2592000,
      "default": 3600
    },
    {
      "name": "look_forward_time",
      "type": "integer",
      "units": "seconds",
      "description": "Number of seconds to look forward after the last recorded ping for each device.",
      "min": 0,
      "max": 2592000,
      "default": 3600
    },
    {
      "name": "override_visit_time",
      "description": "If enabled, set the look back and look forward timestamp to be equal to the start and end of the given observation.",
      "type": "boolean",
      "default": false
    }
  ]
}

Manifest Structure

manifest_version (required)

Specifies which version of the manifest schema to use.

{
  "manifest_version": "0.1.0"
}

Current version: "0.1.0"

The platform validates your manifest against the schema for this version.

metadata (required)

Provides high-level information about your algorithm.

{
  "metadata": {
    "description": "A brief description of what the algorithm does",
    "tags": ["category1", "category2"],
    "version": "1.0.0"
  }
}

Fields:

  • description (string, required) - Clear description of the algorithm's purpose
  • tags (array of strings, optional) - Keywords for categorization and search
  • version (string, required) - Semantic version for this algorithm version (e.g., "1.0.0")

Best practices:

  • Keep description concise but informative
  • Use tags that match platform conventions
  • Follow semantic versioning (major.minor.patch)

inputs (required)

Declares the Data Types your algorithm consumes.

{
  "inputs": [
    {
      "data_type_name": "pings",
      "min_count": 1,
      "max_count": 1
    },
    {
      "data_type_name": "aoi_metadata",
      "min_count": 0,
      "max_count": 1
    }
  ]
}

Fields per input:

  • data_type_name (string, required) - Name of the Data Type (must exist in platform)
  • min_count (integer, optional) - Minimum number of data sources required (default: 1)
  • max_count (integer, optional) - Maximum number of data sources allowed (default: 1)

Notes:

  • Most algorithms have a single input Data Type
  • Use multiple inputs if your algorithm combines different data types
  • Set min_count: 0 for optional inputs
  • The platform will validate that specified Data Types exist

outputs (required)

Declares the Data Types your algorithm produces.

{
  "outputs": {
    "output_data_types": ["device_visits"],
    "observation_value_columns": ["visit_count", "total_dwell_time"]
  }
}

Fields:

  • output_data_types (array of strings, required) - Data Types your algorithm outputs
  • observation_value_columns (array of strings, required) - Summary metrics your algorithm calculates

Understanding observation_value_columns:

These are aggregated values included in each observation, separate from the detailed measurement data. For example:

{
  "observation_start_ts": 1614607200,
  "measurement_path": "visits_abc123.parquet",
  "observation_values": [{
    "visit_count": 42,
    "total_dwell_time": 3600
  }]
}

The visit_count and total_dwell_time must be listed in observation_value_columns.

Best practices:

  • Include counts, sums, averages, or other aggregate metrics
  • Keep observation values simple (numbers, strings, booleans)
  • Use consistent naming across algorithms

parameters (required)

Defines configurable parameters users can set when creating an Algorithm Config.

{
  "parameters": [
    {
      "name": "confidence_threshold",
      "type": "float",
      "description": "Minimum confidence score for detections",
      "min": 0.0,
      "max": 1.0,
      "default": 0.7,
      "units": "probability"
    },
    {
      "name": "enable_filtering",
      "type": "boolean",
      "description": "Apply post-processing filters",
      "default": true
    },
    {
      "name": "model_variant",
      "type": "string",
      "description": "Which model variant to use",
      "default": "standard",
      "allowed_values": ["fast", "standard", "accurate"]
    }
  ]
}

Fields per parameter:

  • name (string, required) - Parameter identifier
  • type (string, required) - Data type: integer, float, string, boolean
  • description (string, required) - Clear explanation of what the parameter does
  • default (any, required) - Default value if user doesn't override
  • min (number, optional) - Minimum allowed value (for numeric types)
  • max (number, optional) - Maximum allowed value (for numeric types)
  • units (string, optional) - Units for the parameter (e.g., "seconds", "meters")
  • allowed_values (array, optional) - Restricted set of allowed values

Parameter types:

  • integer - Whole numbers (e.g., counts, thresholds)
  • float - Decimal numbers (e.g., confidence scores, distances)
  • string - Text values (e.g., model names, modes)
  • boolean - True/false flags (e.g., enable/disable features)

Best practices:

  • Provide sensible defaults
  • Use min/max to prevent invalid values
  • Include units for clarity
  • Write clear descriptions explaining the impact of each parameter

container_parameters (required)

Specifies the Docker container that runs your algorithm.

{
  "container_parameters": {
    "image": "myorg/my-algorithm:v1.0.0",
    "resource_request": {
      "gpu": 0,
      "memory_gb": 8,
      "cpu_millicore": 1000
    },
    "command": [
      "python",
      "-u",
      "/app/main.py"
    ]
  }
}

Fields:

  • image (string, required) - Full container image reference (registry/name:tag)
  • resource_request (object, required) - Resource requirements
  • command (array of strings, required) - Container command in exec form

Resource request fields:

  • gpu (integer, required) - Number of GPUs (0 for CPU-only)
  • memory_gb (number, required) - Memory in gigabytes
  • cpu_millicore (integer, required) - CPU in millicores (1000 = 1 core)

Command format:

  • Must use exec form: ["python", "script.py"]
  • NOT shell form: "python script.py"
  • Can include flags: ["python", "-u", "script.py", "--verbose"]

Best practices:

  • Use specific image tags, not latest
  • Include image digest for reproducibility
  • Request resources conservatively but adequately
  • Use -u flag with Python for unbuffered output

Resource sizing guidance:

  • Start conservative, increase if needed
  • Monitor actual usage during test runs
  • Consider parallelization (more instances with less resources each)
  • GPU algorithms require careful memory sizing

interface (required)

Specifies how the algorithm communicates with the platform.

{
  "interface": {
    "interface_type": "FILESYSTEM_TASK_WORKER"
  }
}

Current interface type:

  • FILESYSTEM_TASK_WORKER - Algorithm reads from and writes to filesystem

This is the standard interface for algorithms. The platform provides:

  • Input via algo_input.json file
  • Output via algo_output.json file
  • Environment variable ALGORITHM_INPUT_PATH pointing to input file

Validation

When you register an algorithm, the platform validates:

  1. Schema compliance - Manifest matches the schema for manifest_version
  2. Data Type existence - All referenced Data Types exist
  3. Data Source compatibility - Data sources support the specified Data Types
  4. Parameter validity - Parameters have valid types and constraints
  5. Resource constraints - Resources are within platform limits

Common validation errors:

Invalid manifest: 'gpu' must be a non-negative integer
  → Fix: Set gpu to 0 or positive integer

Data Type 'custom_type' not found
  → Fix: Create the Data Type first or use existing one

Parameter 'threshold' missing required field 'default'
  → Fix: Add default value to parameter definition

Advanced Manifest Features

Multiple Input Data Types

If your algorithm needs multiple types of input:

{
  "inputs": [
    {
      "data_type_name": "satellite_imagery",
      "min_count": 1,
      "max_count": 1
    },
    {
      "data_type_name": "ground_truth_labels",
      "min_count": 0,
      "max_count": 1
    }
  ]
}

Multiple Output Data Types

If your algorithm produces multiple types of output:

{
  "outputs": {
    "output_data_types": ["object_detections", "classification_results"],
    "observation_value_columns": ["detection_count", "classification_accuracy"]
  }
}

Complex Parameters

Parameters can have constraints and documentation:

{
  "parameters": [
    {
      "name": "detection_classes",
      "type": "string",
      "description": "Comma-separated list of object classes to detect",
      "default": "car,truck,bus",
      "allowed_values": ["car", "truck", "bus", "car,truck", "car,bus", "truck,bus", "car,truck,bus"]
    }
  ]
}

Manifest Versioning

When you update your algorithm:

  1. Minor changes (bug fixes, performance improvements):

    • Increment metadata.version: "1.0.0""1.0.1"
    • Keep same manifest_version
    • Keep same inputs/outputs
    • Register as new Algorithm Version
  2. Major changes (new features, changed behavior):

    • Increment metadata.version: "1.0.0""2.0.0"
    • May change parameters
    • May change resource requirements
    • Register as new Algorithm Version
  3. Breaking changes (changed inputs/outputs):

    • Increment metadata.version: "1.0.0""2.0.0"
    • Update inputs/outputs
    • Consider creating new Algorithm instead
    • Register as new Algorithm Version

Remember: Algorithm Versions are immutable once registered!

Next Steps