Data Types

How Data Types define the structure of algorithm inputs and outputs

Understanding Data Types

Data Types are fundamental to the Elements Platform. They define the structure and format of data that flows through algorithms, ensuring consistency and compatibility.

What is a Data Type?

A Data Type is similar to an interface in programming languages. It specifies:

  • A named set of columns
  • The data type of each column (string, integer, float, boolean, datetime, etc.)
  • The expected format of data files

Think of Data Types as contracts between algorithms and the platform:

  • Input Data Types tell an algorithm what format to expect
  • Output Data Types tell the platform what format the algorithm will produce
  • Algorithms declare which Data Types they consume and produce in their manifest

Built-in Data Types

The Elements Platform includes many Data Types out of the box. Here are some commonly used ones:

Pings Data Type

Represents geolocation data from mobile devices.

Columns:

  • device_id (string) - Unique identifier for the device
  • latitude (float) - Latitude coordinate
  • longitude (float) - Longitude coordinate
  • timestamp (datetime) - UTC timestamp in ISO 8601 format (e.g., "2007-04-05T14:30Z")

Use cases:

  • Foot traffic analysis
  • Device visit detection
  • Movement pattern analysis

Device Visits Data Type

Represents processed visits derived from ping data.

Columns:

  • aoi_id (string) - Area of Interest identifier
  • device_id (string) - Unique identifier for the device
  • start (datetime) - Visit start time in ISO 8601 format
  • finish (datetime) - Visit end time in ISO 8601 format

Use cases:

  • Dwell time analysis
  • Visitor counting
  • Traffic pattern analysis

Other Common Data Types

The platform includes many more Data Types for various use cases:

  • Object detection results (vehicles, ships, aircraft, buildings)
  • Land use classification
  • Supply chain tracking
  • GPS interference detection
  • And many more...

Working with Data Types

Listing Available Data Types

To see all available Data Types:

from elements.sdk.elements_sdk import ElementsSDK

sdk = ElementsSDK()

# List all data types
data_types = await sdk.data_type.list()

for data_type in data_types:
    print(f"{data_type.name}: {data_type.description}")

Getting Data Type Details

To get detailed information about a specific Data Type:

# Get information about the pings data type
pings_type = await sdk.data_type.get(name="pings")

print(f"Name: {pings_type.name}")
print(f"Description: {pings_type.description}")
print(f"Schema: {pings_type.schema}")
print(f"Associated Data Sources: {pings_type.data_source_ids}")

The schema will show you the exact column names and types expected.

Creating Custom Data Types

If the built-in Data Types don't meet your needs, you can create custom ones:

# Define your custom schema
custom_schema = {
    "columns": [
        {"name": "observation_id", "type": "string"},
        {"name": "value", "type": "float"},
        {"name": "confidence", "type": "float"},
        {"name": "timestamp", "type": "datetime"}
    ]
}

# Create the data type
new_type = await sdk.data_type.create(
    name="custom_observations",
    description="Custom observation data with confidence scores",
    schema=custom_schema
)

Data Type Flow in Algorithms

Understanding how Data Types flow through an algorithm:

1. Algorithm Declares Data Types

In the algorithm manifest, you specify:

{
  "inputs": [
    {
      "data_type_name": "pings",
      "min_count": 1,
      "max_count": 1
    }
  ],
  "outputs": {
    "output_data_types": ["device_visits"],
    "observation_value_columns": ["visit_count"]
  }
}

This declares:

  • The algorithm consumes pings data
  • The algorithm produces device_visits data
  • The algorithm calculates a visit_count observation value

2. Platform Validates Data Types

When you register an algorithm, the platform validates:

  • All specified Data Types exist
  • Input and output declarations are complete
  • Data Types are compatible with specified data sources

3. Data is Filtered and Prepared

When a computation runs:

  • Input data is filtered by AOI (spatial filter)
  • Input data is filtered by TOI (temporal filter)
  • Data is formatted according to the Data Type schema
  • File paths to the data are provided in algo_input.json

4. Algorithm Reads Input Data

Your algorithm reads data files that conform to the input Data Type:

import pandas as pd

# Read input data (format guaranteed by Data Type)
pings_df = pd.read_parquet(input_file_path)

# Columns are guaranteed to exist per the Data Type
for _, row in pings_df.iterrows():
    device_id = row['device_id']
    lat = row['latitude']
    lon = row['longitude']
    timestamp = row['timestamp']
    # Process the ping...

5. Algorithm Writes Output Data

Your algorithm writes results that conform to the output Data Type:

# Create output data matching the device_visits Data Type
visits_df = pd.DataFrame({
    'aoi_id': aoi_ids,
    'device_id': device_ids,
    'start': start_times,
    'finish': finish_times
})

# Write to parquet file
visits_df.to_parquet(output_file_path)

6. Platform Validates Output

The platform validates that:

  • Output files conform to the declared Data Type schema
  • All required columns are present
  • Column types match the Data Type specification

Data Types in Analyses

When building an Analysis with multiple algorithms:

  1. Output-to-Input Matching

    • Algorithm A's output Data Type must match Algorithm B's input Data Type
    • This is validated when you register the Analysis
  2. Data Flow Example

Algorithm 1: Object Detection
  Input: satellite_imagery
  Output: object_detections
         ↓
Algorithm 2: Object Tracking
  Input: object_detections  ← Matches output above
  Output: object_tracks
         ↓
Algorithm 3: Route Analysis
  Input: object_tracks      ← Matches output above
  Output: route_patterns
  1. Branching Analyses
    • One algorithm's output can feed multiple downstream algorithms
    • Multiple algorithms' outputs can be consumed by a single algorithm (if it accepts multiple inputs)

File Formats

Data Type files are typically stored as:

  • Parquet files - Efficient columnar format (recommended)
  • CSV files - For simple text data
  • GeoJSON - For geographic data with complex geometries

The format is determined by the data source and Data Type specification.

Best Practices

When Creating Algorithms

  1. Use existing Data Types when possible

    • Leverage built-in types for interoperability
    • Only create custom types when necessary
  2. Keep Data Types simple

    • Include only essential columns
    • Use clear, descriptive column names
    • Document the purpose of each column
  3. Consider downstream consumers

    • Design output Data Types that other algorithms can use
    • Follow naming conventions consistent with the platform

When Creating Custom Data Types

  1. Document thoroughly

    • Provide clear descriptions
    • Explain the meaning of each column
    • Include example data
  2. Version carefully

    • If you need to change a Data Type, create a new version
    • Don't modify existing Data Types that algorithms depend on
  3. Test with sample data

    • Validate your Data Type with real data before registering
    • Ensure algorithms can successfully read and write the format

Data Type vs Data Source

It's important to distinguish between these concepts:

Data Type:

  • Defines the schema (columns and types)
  • Describes what the data represents
  • Multiple data sources can provide the same Data Type

Data Source:

  • Identifies where the data comes from
  • Examples: "safegraph_pings", "planet_imagery", "ais_ship_positions"
  • Each data source produces data conforming to a specific Data Type

Example:

Data Type: pings
  - device_id (string)
  - latitude (float)
  - longitude (float)
  - timestamp (datetime)

Data Sources that provide "pings":
  - safegraph_pings
  - cuebiq_pings
  - mobilewalla_pings

All three data sources produce data with the same schema (the pings Data Type), but the data comes from different providers.

Next Steps

Now that you understand Data Types: