Training data scoped to your model, not a catalog.
We design the dataset around your task, environment, and annotation requirements — then deliver it in your format with a QA report attached.
Use cases
What teams train with our data
Egocentric video with structured annotation — applicable across the core tasks in physical AI development.
Manipulation policy training
Demonstrations for pick-and-place, tool use, assembly, and fine-motor tasks. Egocentric view with hand pose + depth annotation for contact-rich manipulation.
Imitation learning
High-quality, consistent demonstrations across diverse conditions — same task, varied environment, varied operator. Enables robust policy generalization.
Foundation model pretraining
Large-scale diverse egocentric video with rich annotation structure — suitable for VLA pretraining and world model initialization across multiple embodiment types.
World model training
Temporally-coherent egocentric sequences with depth and object-level annotation. Captures physical interactions, state changes, and environment transitions.
Why GaitLabs
Five things that matter to a robotics data lead
Design-partner approach
We build the dataset with you, not for a generic catalog. Scoped to your task, environment, and model requirements — before capture begins.
Format-native delivery
LeRobot, RLDS, Open X-Embodiment, or a custom schema. No conversion step. No preprocessing overhead. Directly into your training pipeline.
Rights-cleared globally
Licensed real-world data. Multi-jurisdiction rights clearance, GDPR-aligned. Not synthetic, not scraped. Safe for commercial model training.
Technical transparency
We show the annotation methodology, QA scoring, and provenance upfront. You know exactly what you're training on before you commit.
Transparent scoping
One short call to confirm scope and timeline. No long RFP cycle, no black-box pricing. You get a dataset design proposal before any work starts.
What you get
Every dataset includes
Annotation layers
RGB, depth maps, pose estimation, segmentation masks, contact timing, and structured metadata — per clip.
QA report
Per-clip annotation accuracy scores and an overall dataset quality summary, delivered with the data.
Format-native delivery
LeRobot, RLDS, Open X-Embodiment, or custom. No conversion step on your side.
Provenance record
Full chain-of-custody: capture date, environment, consent reference, annotation version, QA score.
Questions about annotation methodology or QA scoring? Ask us →
Ready to scope your dataset?
Tell us what you’re training. We’ll propose a dataset design.