Algorithm Manifests
Complete guide to defining algorithm manifests
Algorithm Manifests
An algorithm manifest is a JSON document that describes your algorithm's requirements, capabilities, and behavior. The platform validates this manifest when you register an algorithm.
Complete Example
Here's a complete manifest for a device visits algorithm:
{
"manifest_version": "0.1.0",
"metadata": {
"description": "Produce a list of AOI visits per device, based on geolocation device pings.",
"tags": ["device_visits", "foot_traffic"],
"version": "0.0.1"
},
"inputs": [
{
"data_type_name": "pings",
"min_count": 1,
"max_count": 1
}
],
"outputs": {
"output_data_types": ["device_visits"],
"observation_value_columns": ["visit_count"]
},
"container_parameters": {
"image": "orbitalinsight/device_visits:13131a4e",
"resource_request": {
"gpu": 0,
"memory_gb": 5,
"cpu_millicore": 200
},
"command": [
"python",
"/app/device_visits.py"
]
},
"interface": {
"interface_type": "FILESYSTEM_TASK_WORKER"
},
"parameters": [
{
"name": "look_back_time",
"type": "integer",
"units": "seconds",
"description": "Number of seconds to look back before the first recorded ping for each device.",
"min": 0,
"max": 2592000,
"default": 3600
},
{
"name": "look_forward_time",
"type": "integer",
"units": "seconds",
"description": "Number of seconds to look forward after the last recorded ping for each device.",
"min": 0,
"max": 2592000,
"default": 3600
},
{
"name": "override_visit_time",
"description": "If enabled, set the look back and look forward timestamp to be equal to the start and end of the given observation.",
"type": "boolean",
"default": false
}
]
}Manifest Structure
manifest_version (required)
Specifies which version of the manifest schema to use.
{
"manifest_version": "0.1.0"
}Current version: "0.1.0"
The platform validates your manifest against the schema for this version.
metadata (required)
Provides high-level information about your algorithm.
{
"metadata": {
"description": "A brief description of what the algorithm does",
"tags": ["category1", "category2"],
"version": "1.0.0"
}
}Fields:
description(string, required) - Clear description of the algorithm's purposetags(array of strings, optional) - Keywords for categorization and searchversion(string, required) - Semantic version for this algorithm version (e.g., "1.0.0")
Best practices:
- Keep description concise but informative
- Use tags that match platform conventions
- Follow semantic versioning (major.minor.patch)
inputs (required)
Declares the Data Types your algorithm consumes.
{
"inputs": [
{
"data_type_name": "pings",
"min_count": 1,
"max_count": 1
},
{
"data_type_name": "aoi_metadata",
"min_count": 0,
"max_count": 1
}
]
}Fields per input:
data_type_name(string, required) - Name of the Data Type (must exist in platform)min_count(integer, optional) - Minimum number of data sources required (default: 1)max_count(integer, optional) - Maximum number of data sources allowed (default: 1)
Notes:
- Most algorithms have a single input Data Type
- Use multiple inputs if your algorithm combines different data types
- Set
min_count: 0for optional inputs - The platform will validate that specified Data Types exist
outputs (required)
Declares the Data Types your algorithm produces.
{
"outputs": {
"output_data_types": ["device_visits"],
"observation_value_columns": ["visit_count", "total_dwell_time"]
}
}Fields:
output_data_types(array of strings, required) - Data Types your algorithm outputsobservation_value_columns(array of strings, required) - Summary metrics your algorithm calculates
Understanding observation_value_columns:
These are aggregated values included in each observation, separate from the detailed measurement data. For example:
{
"observation_start_ts": 1614607200,
"measurement_path": "visits_abc123.parquet",
"observation_values": [{
"visit_count": 42,
"total_dwell_time": 3600
}]
}The visit_count and total_dwell_time must be listed in observation_value_columns.
Best practices:
- Include counts, sums, averages, or other aggregate metrics
- Keep observation values simple (numbers, strings, booleans)
- Use consistent naming across algorithms
parameters (required)
Defines configurable parameters users can set when creating an Algorithm Config.
{
"parameters": [
{
"name": "confidence_threshold",
"type": "float",
"description": "Minimum confidence score for detections",
"min": 0.0,
"max": 1.0,
"default": 0.7,
"units": "probability"
},
{
"name": "enable_filtering",
"type": "boolean",
"description": "Apply post-processing filters",
"default": true
},
{
"name": "model_variant",
"type": "string",
"description": "Which model variant to use",
"default": "standard",
"allowed_values": ["fast", "standard", "accurate"]
}
]
}Fields per parameter:
name(string, required) - Parameter identifiertype(string, required) - Data type:integer,float,string,booleandescription(string, required) - Clear explanation of what the parameter doesdefault(any, required) - Default value if user doesn't overridemin(number, optional) - Minimum allowed value (for numeric types)max(number, optional) - Maximum allowed value (for numeric types)units(string, optional) - Units for the parameter (e.g., "seconds", "meters")allowed_values(array, optional) - Restricted set of allowed values
Parameter types:
integer- Whole numbers (e.g., counts, thresholds)float- Decimal numbers (e.g., confidence scores, distances)string- Text values (e.g., model names, modes)boolean- True/false flags (e.g., enable/disable features)
Best practices:
- Provide sensible defaults
- Use min/max to prevent invalid values
- Include units for clarity
- Write clear descriptions explaining the impact of each parameter
container_parameters (required)
Specifies the Docker container that runs your algorithm.
{
"container_parameters": {
"image": "myorg/my-algorithm:v1.0.0",
"resource_request": {
"gpu": 0,
"memory_gb": 8,
"cpu_millicore": 1000
},
"command": [
"python",
"-u",
"/app/main.py"
]
}
}Fields:
image(string, required) - Full container image reference (registry/name:tag)resource_request(object, required) - Resource requirementscommand(array of strings, required) - Container command in exec form
Resource request fields:
gpu(integer, required) - Number of GPUs (0 for CPU-only)memory_gb(number, required) - Memory in gigabytescpu_millicore(integer, required) - CPU in millicores (1000 = 1 core)
Command format:
- Must use exec form:
["python", "script.py"] - NOT shell form:
"python script.py" - Can include flags:
["python", "-u", "script.py", "--verbose"]
Best practices:
- Use specific image tags, not
latest - Include image digest for reproducibility
- Request resources conservatively but adequately
- Use
-uflag with Python for unbuffered output
Resource sizing guidance:
- Start conservative, increase if needed
- Monitor actual usage during test runs
- Consider parallelization (more instances with less resources each)
- GPU algorithms require careful memory sizing
interface (required)
Specifies how the algorithm communicates with the platform.
{
"interface": {
"interface_type": "FILESYSTEM_TASK_WORKER"
}
}Current interface type:
FILESYSTEM_TASK_WORKER- Algorithm reads from and writes to filesystem
This is the standard interface for algorithms. The platform provides:
- Input via
algo_input.jsonfile - Output via
algo_output.jsonfile - Environment variable
ALGORITHM_INPUT_PATHpointing to input file
Validation
When you register an algorithm, the platform validates:
- Schema compliance - Manifest matches the schema for
manifest_version - Data Type existence - All referenced Data Types exist
- Data Source compatibility - Data sources support the specified Data Types
- Parameter validity - Parameters have valid types and constraints
- Resource constraints - Resources are within platform limits
Common validation errors:
Invalid manifest: 'gpu' must be a non-negative integer
→ Fix: Set gpu to 0 or positive integer
Data Type 'custom_type' not found
→ Fix: Create the Data Type first or use existing one
Parameter 'threshold' missing required field 'default'
→ Fix: Add default value to parameter definition
Advanced Manifest Features
Multiple Input Data Types
If your algorithm needs multiple types of input:
{
"inputs": [
{
"data_type_name": "satellite_imagery",
"min_count": 1,
"max_count": 1
},
{
"data_type_name": "ground_truth_labels",
"min_count": 0,
"max_count": 1
}
]
}Multiple Output Data Types
If your algorithm produces multiple types of output:
{
"outputs": {
"output_data_types": ["object_detections", "classification_results"],
"observation_value_columns": ["detection_count", "classification_accuracy"]
}
}Complex Parameters
Parameters can have constraints and documentation:
{
"parameters": [
{
"name": "detection_classes",
"type": "string",
"description": "Comma-separated list of object classes to detect",
"default": "car,truck,bus",
"allowed_values": ["car", "truck", "bus", "car,truck", "car,bus", "truck,bus", "car,truck,bus"]
}
]
}Manifest Versioning
When you update your algorithm:
-
Minor changes (bug fixes, performance improvements):
- Increment
metadata.version:"1.0.0"→"1.0.1" - Keep same
manifest_version - Keep same inputs/outputs
- Register as new Algorithm Version
- Increment
-
Major changes (new features, changed behavior):
- Increment
metadata.version:"1.0.0"→"2.0.0" - May change parameters
- May change resource requirements
- Register as new Algorithm Version
- Increment
-
Breaking changes (changed inputs/outputs):
- Increment
metadata.version:"1.0.0"→"2.0.0" - Update inputs/outputs
- Consider creating new Algorithm instead
- Register as new Algorithm Version
- Increment
Remember: Algorithm Versions are immutable once registered!
Next Steps
- Learn about Algorithm Input/Output to understand the data your algorithm will process
- See Container Images for packaging your algorithm
- Follow Registering Algorithms to publish your algorithm
Updated 5 months ago