Formats
Delivered to your pipeline.
Format-native means training-ready on arrival — no conversion step, no preprocessing overhead on your side. Format is specified and agreed at scoping, before capture begins.
Supported formats
Three native formats, one sample structure
We support the primary standards used in Physical AI research and development.
Hugging Face ecosystem · PyTorch-native
The Hugging Face LeRobot format is the emerging standard for robot learning datasets. Load directly with the datasets library — no custom data loaders required. Compatible with the full HF ecosystem: hub upload, versioning, model training pipelines.
from datasets import load_dataset
ds = load_dataset("gaitlabs/kitchen-manipulation-001")
# Episode structure (per frame):
{
"observation.image": tensor(H, W, 3), # RGB frame
"observation.depth": tensor(H, W), # metric depth
"observation.state": tensor(14), # joint positions
"annotation.hand_pose": tensor(42), # MANO keypoints
"annotation.segmentation": tensor(H, W), # instance mask
"annotation.qa_score": float, # clip-level score
"annotation.contact": int, # 0=none 1=pre 2=contact 3=grasp
"action": tensor(7)
}TensorFlow · JAX · Reverb-compatible
RLDS (Robot Learning Dataset Specification) is designed for TensorFlow and JAX training pipelines. Native integration with tf.data, Reverb replay buffers, and distributed training. Used by major embodied AI research groups for large-scale pretraining.
import tensorflow_datasets as tfds
ds = tfds.load("gaitlabs_kitchen_001")
# Episode metadata:
{
"episode_id": str,
"task": str, # "manipulation/pick_place"
"embodiment": str, # "custom_rig_v2"
"environment": str, # "kitchen"
"qa_score": float, # 0.94
"n_frames": int,
"annotation_layers": list, # ["depth", "pose", "seg", "contact"]
"rights_cleared": bool,
"gdpr_compliant": bool
}Cross-embodiment training · VLA pretraining
The Open X-Embodiment format enables training across multiple robot embodiments and data sources. Designed for foundation model pretraining — combine GaitLabs egocentric human data with robot demonstrations for richer pretraining signal.
{
"dataset_name": "gaitlabs_kitchen_001",
"embodiment": "human",
"modality": ["vision", "depth", "proprioception"],
"task_type": "manipulation",
"language_instruction": "pick up the red cup",
"license": "gaitlabs-commercial-v1",
"format": "open_x_embodiment",
"annotation_layers": {
"depth": "per_frame_metric",
"hand_pose": "mano_42kp",
"segmentation": "instance_mask",
"contact": "phase_label"
}
}Additional formats
HDF5, zarr, WebDataset, and custom
Hierarchical file format for large-scale numerical data. Supported for teams with existing HDF5 pipelines.
Cloud-native chunked array storage. Efficient for distributed access and large annotation arrays (masks, depth volumes).
Tar-based streaming format for high-throughput data loading. Suitable for very large datasets and distributed training.
Bespoke schema, field naming, or packaging? We agree format spec at scoping. If it's trainable, we can deliver it.
Don’t see your format? Custom schemas, field naming, and packaging are agreed at scoping. If your training pipeline can consume it, we can deliver it. Tell us what you need →
Format-native means the dataset arrives in the form your pipeline expects. No conversion step. No preprocessing overhead. Just load and train.
Ready to specify your dataset?
Format, volume, and annotation requirements are all agreed at scoping.