Data Types
How Data Types define the structure of algorithm inputs and outputs
Understanding Data Types
Data Types are fundamental to the Elements Platform. They define the structure and format of data that flows through algorithms, ensuring consistency and compatibility.
What is a Data Type?
A Data Type is similar to an interface in programming languages. It specifies:
- A named set of columns
- The data type of each column (string, integer, float, boolean, datetime, etc.)
- The expected format of data files
Think of Data Types as contracts between algorithms and the platform:
- Input Data Types tell an algorithm what format to expect
- Output Data Types tell the platform what format the algorithm will produce
- Algorithms declare which Data Types they consume and produce in their manifest
Built-in Data Types
The Elements Platform includes many Data Types out of the box. Here are some commonly used ones:
Pings Data Type
Represents geolocation data from mobile devices.
Columns:
device_id(string) - Unique identifier for the devicelatitude(float) - Latitude coordinatelongitude(float) - Longitude coordinatetimestamp(datetime) - UTC timestamp in ISO 8601 format (e.g., "2007-04-05T14:30Z")
Use cases:
- Foot traffic analysis
- Device visit detection
- Movement pattern analysis
Device Visits Data Type
Represents processed visits derived from ping data.
Columns:
aoi_id(string) - Area of Interest identifierdevice_id(string) - Unique identifier for the devicestart(datetime) - Visit start time in ISO 8601 formatfinish(datetime) - Visit end time in ISO 8601 format
Use cases:
- Dwell time analysis
- Visitor counting
- Traffic pattern analysis
Other Common Data Types
The platform includes many more Data Types for various use cases:
- Object detection results (vehicles, ships, aircraft, buildings)
- Land use classification
- Supply chain tracking
- GPS interference detection
- And many more...
Working with Data Types
Listing Available Data Types
To see all available Data Types:
from elements.sdk.elements_sdk import ElementsSDK
sdk = ElementsSDK()
# List all data types
data_types = await sdk.data_type.list()
for data_type in data_types:
print(f"{data_type.name}: {data_type.description}")Getting Data Type Details
To get detailed information about a specific Data Type:
# Get information about the pings data type
pings_type = await sdk.data_type.get(name="pings")
print(f"Name: {pings_type.name}")
print(f"Description: {pings_type.description}")
print(f"Schema: {pings_type.schema}")
print(f"Associated Data Sources: {pings_type.data_source_ids}")The schema will show you the exact column names and types expected.
Creating Custom Data Types
If the built-in Data Types don't meet your needs, you can create custom ones:
# Define your custom schema
custom_schema = {
"columns": [
{"name": "observation_id", "type": "string"},
{"name": "value", "type": "float"},
{"name": "confidence", "type": "float"},
{"name": "timestamp", "type": "datetime"}
]
}
# Create the data type
new_type = await sdk.data_type.create(
name="custom_observations",
description="Custom observation data with confidence scores",
schema=custom_schema
)Data Type Flow in Algorithms
Understanding how Data Types flow through an algorithm:
1. Algorithm Declares Data Types
In the algorithm manifest, you specify:
{
"inputs": [
{
"data_type_name": "pings",
"min_count": 1,
"max_count": 1
}
],
"outputs": {
"output_data_types": ["device_visits"],
"observation_value_columns": ["visit_count"]
}
}This declares:
- The algorithm consumes
pingsdata - The algorithm produces
device_visitsdata - The algorithm calculates a
visit_countobservation value
2. Platform Validates Data Types
When you register an algorithm, the platform validates:
- All specified Data Types exist
- Input and output declarations are complete
- Data Types are compatible with specified data sources
3. Data is Filtered and Prepared
When a computation runs:
- Input data is filtered by AOI (spatial filter)
- Input data is filtered by TOI (temporal filter)
- Data is formatted according to the Data Type schema
- File paths to the data are provided in
algo_input.json
4. Algorithm Reads Input Data
Your algorithm reads data files that conform to the input Data Type:
import pandas as pd
# Read input data (format guaranteed by Data Type)
pings_df = pd.read_parquet(input_file_path)
# Columns are guaranteed to exist per the Data Type
for _, row in pings_df.iterrows():
device_id = row['device_id']
lat = row['latitude']
lon = row['longitude']
timestamp = row['timestamp']
# Process the ping...5. Algorithm Writes Output Data
Your algorithm writes results that conform to the output Data Type:
# Create output data matching the device_visits Data Type
visits_df = pd.DataFrame({
'aoi_id': aoi_ids,
'device_id': device_ids,
'start': start_times,
'finish': finish_times
})
# Write to parquet file
visits_df.to_parquet(output_file_path)6. Platform Validates Output
The platform validates that:
- Output files conform to the declared Data Type schema
- All required columns are present
- Column types match the Data Type specification
Data Types in Analyses
When building an Analysis with multiple algorithms:
-
Output-to-Input Matching
- Algorithm A's output Data Type must match Algorithm B's input Data Type
- This is validated when you register the Analysis
-
Data Flow Example
Algorithm 1: Object Detection
Input: satellite_imagery
Output: object_detections
↓
Algorithm 2: Object Tracking
Input: object_detections ← Matches output above
Output: object_tracks
↓
Algorithm 3: Route Analysis
Input: object_tracks ← Matches output above
Output: route_patterns
- Branching Analyses
- One algorithm's output can feed multiple downstream algorithms
- Multiple algorithms' outputs can be consumed by a single algorithm (if it accepts multiple inputs)
File Formats
Data Type files are typically stored as:
- Parquet files - Efficient columnar format (recommended)
- CSV files - For simple text data
- GeoJSON - For geographic data with complex geometries
The format is determined by the data source and Data Type specification.
Best Practices
When Creating Algorithms
-
Use existing Data Types when possible
- Leverage built-in types for interoperability
- Only create custom types when necessary
-
Keep Data Types simple
- Include only essential columns
- Use clear, descriptive column names
- Document the purpose of each column
-
Consider downstream consumers
- Design output Data Types that other algorithms can use
- Follow naming conventions consistent with the platform
When Creating Custom Data Types
-
Document thoroughly
- Provide clear descriptions
- Explain the meaning of each column
- Include example data
-
Version carefully
- If you need to change a Data Type, create a new version
- Don't modify existing Data Types that algorithms depend on
-
Test with sample data
- Validate your Data Type with real data before registering
- Ensure algorithms can successfully read and write the format
Data Type vs Data Source
It's important to distinguish between these concepts:
Data Type:
- Defines the schema (columns and types)
- Describes what the data represents
- Multiple data sources can provide the same Data Type
Data Source:
- Identifies where the data comes from
- Examples: "safegraph_pings", "planet_imagery", "ais_ship_positions"
- Each data source produces data conforming to a specific Data Type
Example:
Data Type: pings
- device_id (string)
- latitude (float)
- longitude (float)
- timestamp (datetime)
Data Sources that provide "pings":
- safegraph_pings
- cuebiq_pings
- mobilewalla_pings
All three data sources produce data with the same schema (the pings Data Type), but the data comes from different providers.
Next Steps
Now that you understand Data Types:
- Learn how to Create Algorithms that use Data Types
- Review the Data Types API Reference for detailed SDK methods
- Explore available Data Types using the SDK's
data_type.list()method
Updated 5 months ago