Algorithm Input Schema

Complete specification of algorithm input format

Algorithm Input Schema

This is the complete JSON schema that defines the structure of the algo_input.json file provided to algorithms at runtime.

Schema Overview

The input file contains:

  • version - Schema version
  • input_data_path - Directory containing input data files
  • output_path - Directory for writing output
  • config - User-specified configuration (parameters, data sources)
  • input_data - Array of data to process (one entry per AOI)

Complete JSON Schema

{
  "title": "Algorithm Input Schema",
  "description": "JSON schema to define an algorithm input file",
  "type": "object",
  "required": [
    "version",
    "input_data_path",
    "output_path",
    "config",
    "input_data"
  ],
  "properties": {
    "version": {
      "description": "The algorithm input schema version",
      "type": "string",
      "example": "v1.0.0"
    },
    "input_data_path": {
      "description": "The file path that the input downloader will use to download the data",
      "type": "string",
      "example": "/work/algo_input_data.json"
    },
    "output_path": {
      "description": "The file path that the algorithm container should output its results to",
      "type": "string",
      "example": "/work/output/0"
    },
    "config": {
      "description": "The config used to run the algorithm. Consists of algorithm parameters and data sources.",
      "type": "object",
      "example": {
        "parameters": {
          "stream": false,
          "grouping": {
            "frequency": "DAILY",
            "value": 1
          },
          "sampling": {
            "min_count": 1,
            "max_count": 1
          }
        }
      }
    },
    "input_data": {
      "$ref": "#/$defs/input_data"
    }
  },
  "$defs": {
    "input_data": {
      "description": "A list of input data for the algorithm",
      "type": "array",
      "minItems": 1,
      "items": {
        "type": "object",
        "required": [
          "aoi_version",
          "aoi_wkb",
          "time_ranges",
          "data_sources"
        ],
        "properties": {
          "aoi_version": {
            "type": "integer",
            "example": 3980
          },
          "aoi_wkb": {
            "type": "string",
            "example": "AQMAAAABAAAABQAAAF0mwc/WdFjA..."
          },
          "time_ranges": {
            "$ref": "#/$defs/time_ranges"
          },
          "data_sources": {
            "$ref": "#/$defs/data_sources"
          }
        }
      }
    },
    "time_ranges": {
      "description": "A list of time_ranges for each observation based on the Algorithm Configuration Grouping and TOI Specifications. Each item in the list corresponds to a single observation, with each observation containing one or more time ranges.",
      "type": "array",
      "minItems": 1,
      "items": {
        "type": "array",
        "minItems": 1,
        "items": {
          "type": "object",
          "required": [
            "start_local",
            "finish_local"
          ],
          "properties": {
            "start_local": {
              "type": "string",
              "description": "Start time of the time range in ISO-8601 format.",
              "example": "2020-01-01T08:00:00Z"
            },
            "finish_local": {
              "type": "string",
              "description": "Finish time of the time range in ISO-8601 format.",
              "example": "2020-01-08T08:00:00Z"
            }
          }
        }
      }
    },
    "data_sources": {
      "description": "A list of data sources",
      "type": "array",
      "minItems": 1,
      "items": {
        "type": "object",
        "required": [
          "data_source_ids",
          "data"
        ],
        "properties": {
          "data_source_ids": {
            "type": "array",
            "minItems": 1,
            "items": {
              "type": "string",
              "example": "planet_SkySatCollect"
            }
          },
          "data": {
            "$ref": "#/$defs/data"
          }
        }
      }
    },
    "data": {
      "description": "A list of data provided for the data source",
      "type": "array",
      "minItems": 1,
      "items": {
        "type": "object",
        "required": [
          "file_path",
          "details"
        ],
        "properties": {
          "file_path": {
            "type": "string",
            "example": "/work/safegraph_safegraph_1423d3da-a5fc-4b46-b654-fda37a659051.parquet",
            "description": "References the path to the data file. Should conform to the Schema of the data_source's associated data_type."
          },
          "details": {
            "type": "object",
            "description": "Additional metadata/details about the data view. Should conform to the schema of the associated data_source's data_details schema.",
            "example": {
              "geom_wkb": "AQMAAAABAAAABQAAAF0mwc/WdFjA..."
            }
          }
        }
      }
    }
  }
}

Example Input File

{
  "version": "0.1.0",
  "input_data_path": "/work/input",
  "output_path": "/work/output",
  "config": {
    "parameters": {
      "look_back_time": 3600,
      "look_forward_time": 3600
    }
  },
  "input_data": [
    {
      "aoi_version": 36810008,
      "aoi_wkb": "AQMAAAABAAAABAAAANZvJqYLeV7A...",
      "time_ranges": [
        [
          {
            "start_local": "2021-03-01T08:00:00Z",
            "finish_local": "2021-03-08T08:00:00Z"
          }
        ]
      ],
      "data_sources": [
        {
          "data_source_ids": ["safegraph_pings"],
          "data": [
            {
              "file_path": "/work/input/safegraph_pings_abc123.parquet",
              "details": {
                "id": "abc123",
                "geom_wkb": "AQMAAAABAAAABAAAANZv...",
                "time_ranges": [
                  {
                    "start": "2021-03-01T08:00:00Z",
                    "end": "2021-03-08T08:00:00Z"
                  }
                ],
                "aoi_identifier": {
                  "id": "8a68ae09-f4e3-4c38-93b0-0e2cc21d7529",
                  "version": 36810008
                }
              }
            }
          ]
        }
      ]
    }
  ]
}

Key Points

version

  • Current version: "0.1.0"
  • Used to validate against schema

config

  • Contains user-specified parameters from Algorithm Config
  • Structure varies by algorithm

input_data array

  • One entry per AOI being processed
  • Contains AOI geometry (WKB format)
  • Lists time ranges for observations
  • References data files

time_ranges structure

  • Outer array: One element per observation
  • Inner array: One or more time ranges per observation
  • Supports complex recurrence patterns

data_sources array

  • Groups data by data source
  • Each source has a list of data files
  • Files conform to the Data Type schema

file_path

  • Points to actual data file (parquet, CSV, etc.)
  • File format determined by Data Type

details object

  • Additional metadata about the data
  • Structure varies by data source
  • Often includes geometry and time information

Next Steps