Understanding Data Types

Data Types are fundamental to the Elements Platform. They define the structure and format of data that flows through algorithms, ensuring consistency and compatibility.

What is a Data Type?

A Data Type is similar to an interface in programming languages. It specifies:

A named set of columns
The data type of each column (string, integer, float, boolean, datetime, etc.)
The expected format of data files

Think of Data Types as contracts between algorithms and the platform:

Input Data Types tell an algorithm what format to expect
Output Data Types tell the platform what format the algorithm will produce
Algorithms declare which Data Types they consume and produce in their manifest

Built-in Data Types

The Elements Platform includes many Data Types out of the box. Here are some commonly used ones:

Pings Data Type

Represents geolocation data from mobile devices.

Columns:

device_id (string) - Unique identifier for the device
latitude (float) - Latitude coordinate
longitude (float) - Longitude coordinate
timestamp (datetime) - UTC timestamp in ISO 8601 format (e.g., "2007-04-05T14:30Z")

Use cases:

Foot traffic analysis
Device visit detection
Movement pattern analysis

Device Visits Data Type

Represents processed visits derived from ping data.

Columns:

aoi_id (string) - Area of Interest identifier
device_id (string) - Unique identifier for the device
start (datetime) - Visit start time in ISO 8601 format
finish (datetime) - Visit end time in ISO 8601 format

Use cases:

Dwell time analysis
Visitor counting
Traffic pattern analysis

Other Common Data Types

The platform includes many more Data Types for various use cases:

Object detection results (vehicles, ships, aircraft, buildings)
Land use classification
Supply chain tracking
GPS interference detection
And many more...

Working with Data Types

Listing Available Data Types

To see all available Data Types:

from elements.sdk.elements_sdk import ElementsSDK

sdk = ElementsSDK()

# List all data types
data_types = await sdk.data_type.list()

for data_type in data_types:
    print(f"{data_type.name}: {data_type.description}")

Getting Data Type Details

To get detailed information about a specific Data Type:

# Get information about the pings data type
pings_type = await sdk.data_type.get(name="pings")

print(f"Name: {pings_type.name}")
print(f"Description: {pings_type.description}")
print(f"Schema: {pings_type.schema}")
print(f"Associated Data Sources: {pings_type.data_source_ids}")

The schema will show you the exact column names and types expected.

Creating Custom Data Types

If the built-in Data Types don't meet your needs, you can create custom ones:

# Define your custom schema
custom_schema = {
    "columns": [
        {"name": "observation_id", "type": "string"},
        {"name": "value", "type": "float"},
        {"name": "confidence", "type": "float"},
        {"name": "timestamp", "type": "datetime"}
    ]
}

# Create the data type
new_type = await sdk.data_type.create(
    name="custom_observations",
    description="Custom observation data with confidence scores",
    schema=custom_schema
)

Data Type Flow in Algorithms

Understanding how Data Types flow through an algorithm:

1. Algorithm Declares Data Types

In the algorithm manifest, you specify:

{
  "inputs": [
    {
      "data_type_name": "pings",
      "min_count": 1,
      "max_count": 1
    }
  ],
  "outputs": {
    "output_data_types": ["device_visits"],
    "observation_value_columns": ["visit_count"]
  }
}

This declares:

The algorithm consumes pings data
The algorithm produces device_visits data
The algorithm calculates a visit_count observation value

2. Platform Validates Data Types

When you register an algorithm, the platform validates:

All specified Data Types exist
Input and output declarations are complete
Data Types are compatible with specified data sources

3. Data is Filtered and Prepared

When a computation runs:

Input data is filtered by AOI (spatial filter)
Input data is filtered by TOI (temporal filter)
Data is formatted according to the Data Type schema
File paths to the data are provided in algo_input.json

4. Algorithm Reads Input Data

Your algorithm reads data files that conform to the input Data Type:

import pandas as pd

# Read input data (format guaranteed by Data Type)
pings_df = pd.read_parquet(input_file_path)

# Columns are guaranteed to exist per the Data Type
for _, row in pings_df.iterrows():
    device_id = row['device_id']
    lat = row['latitude']
    lon = row['longitude']
    timestamp = row['timestamp']
    # Process the ping...

5. Algorithm Writes Output Data

Your algorithm writes results that conform to the output Data Type:

# Create output data matching the device_visits Data Type
visits_df = pd.DataFrame({
    'aoi_id': aoi_ids,
    'device_id': device_ids,
    'start': start_times,
    'finish': finish_times
})

# Write to parquet file
visits_df.to_parquet(output_file_path)

6. Platform Validates Output

The platform validates that:

Output files conform to the declared Data Type schema
All required columns are present
Column types match the Data Type specification

Data Types in Analyses

When building an Analysis with multiple algorithms:

Output-to-Input Matching
- Algorithm A's output Data Type must match Algorithm B's input Data Type
- This is validated when you register the Analysis
Data Flow Example

Algorithm 1: Object Detection
  Input: satellite_imagery
  Output: object_detections
         ↓
Algorithm 2: Object Tracking
  Input: object_detections  ← Matches output above
  Output: object_tracks
         ↓
Algorithm 3: Route Analysis
  Input: object_tracks      ← Matches output above
  Output: route_patterns

Branching Analyses
- One algorithm's output can feed multiple downstream algorithms
- Multiple algorithms' outputs can be consumed by a single algorithm (if it accepts multiple inputs)

File Formats

Data Type files are typically stored as:

Parquet files - Efficient columnar format (recommended)
CSV files - For simple text data
GeoJSON - For geographic data with complex geometries

The format is determined by the data source and Data Type specification.

Best Practices

When Creating Algorithms

Use existing Data Types when possible
- Leverage built-in types for interoperability
- Only create custom types when necessary
Keep Data Types simple
- Include only essential columns
- Use clear, descriptive column names
- Document the purpose of each column
Consider downstream consumers
- Design output Data Types that other algorithms can use
- Follow naming conventions consistent with the platform

When Creating Custom Data Types

Document thoroughly
- Provide clear descriptions
- Explain the meaning of each column
- Include example data
Version carefully
- If you need to change a Data Type, create a new version
- Don't modify existing Data Types that algorithms depend on
Test with sample data
- Validate your Data Type with real data before registering
- Ensure algorithms can successfully read and write the format

Data Type vs Data Source

It's important to distinguish between these concepts:

Data Type:

Defines the schema (columns and types)
Describes what the data represents
Multiple data sources can provide the same Data Type

Data Source:

Identifies where the data comes from
Examples: "safegraph_pings", "planet_imagery", "ais_ship_positions"
Each data source produces data conforming to a specific Data Type

Example:

Data Type: pings
  - device_id (string)
  - latitude (float)
  - longitude (float)
  - timestamp (datetime)

Data Sources that provide "pings":
  - safegraph_pings
  - cuebiq_pings
  - mobilewalla_pings

All three data sources produce data with the same schema (the pings Data Type), but the data comes from different providers.

Next Steps

Now that you understand Data Types:

Learn how to Create Algorithms that use Data Types
Review the Data Types API Reference for detailed SDK methods
Explore available Data Types using the SDK's data_type.list() method