Early access · First cohort

Training data scoped to your model, not a catalog.

We design the dataset around your task, environment, and annotation requirements — then deliver it in your format with a QA report attached.

Use cases

What teams train with our data

Egocentric video with structured annotation — applicable across the core tasks in physical AI development.

Manipulation policy training

Demonstrations for pick-and-place, tool use, assembly, and fine-motor tasks. Egocentric view with hand pose + depth annotation for contact-rich manipulation.

Hand poseDepthSegmentationContact timing

Imitation learning

High-quality, consistent demonstrations across diverse conditions — same task, varied environment, varied operator. Enables robust policy generalization.

Full-body poseSegmentationStructured metadata

Foundation model pretraining

Large-scale diverse egocentric video with rich annotation structure — suitable for VLA pretraining and world model initialization across multiple embodiment types.

DepthPoseSegmentationActivity labels

World model training

Temporally-coherent egocentric sequences with depth and object-level annotation. Captures physical interactions, state changes, and environment transitions.

DepthSegmentationContact timingMetadata

Why GaitLabs

Five things that matter to a robotics data lead

Design-partner approach

We build the dataset with you, not for a generic catalog. Scoped to your task, environment, and model requirements — before capture begins.

Format-native delivery

LeRobot, RLDS, Open X-Embodiment, or a custom schema. No conversion step. No preprocessing overhead. Directly into your training pipeline.

Rights-cleared globally

Licensed real-world data. Multi-jurisdiction rights clearance, GDPR-aligned. Not synthetic, not scraped. Safe for commercial model training.

Technical transparency

We show the annotation methodology, QA scoring, and provenance upfront. You know exactly what you're training on before you commit.

Transparent scoping

One short call to confirm scope and timeline. No long RFP cycle, no black-box pricing. You get a dataset design proposal before any work starts.

What you get

Every dataset includes

Annotation layers

RGB, depth maps, pose estimation, segmentation masks, contact timing, and structured metadata — per clip.

QA report

Per-clip annotation accuracy scores and an overall dataset quality summary, delivered with the data.

Format-native delivery

LeRobot, RLDS, Open X-Embodiment, or custom. No conversion step on your side.

Provenance record

Full chain-of-custody: capture date, environment, consent reference, annotation version, QA score.

Questions about annotation methodology or QA scoring? Ask us →

Ready to scope your dataset?

Tell us what you’re training. We’ll propose a dataset design.