Data quality
Quality you can verify.
Six annotation layers, multi-stage QA, and full provenance on every dataset. Here’s exactly how the pipeline works.
Annotation layers
More than video
Every clip ships with six structured annotation layers — all inter-referenced, all QA-verified.
RGB Video
Multi-view + egocentric, calibrated intrinsics
Why it matters
Foundation layer — all other annotations reference the RGB frames. Calibrated camera models enable metric reconstruction.
How it’s produced
Captured with egocentric rigs at 24–60 fps. Camera intrinsics and extrinsics calibrated per device. Multi-view setups provide stereo depth reference.
Depth Maps
Per-frame metric depth, RGB-aligned
Why it matters
Spatial awareness for manipulation — grasp distance, object height, workspace geometry. Enables contact prediction and 3D trajectory learning.
How it’s produced
Depth estimation aligned to RGB frames using stereo or structured-light sensors. Metric scale. Holes and artifacts flagged in QA.
Pose Estimation
Full-body + hand/wrist keypoints, SMPL-compatible
Why it matters
Human motion imitation requires accurate joint-level representation. Hand keypoints are critical for dexterous manipulation policies.
How it’s produced
Full-body pose via lifting from egocentric video. Hand pose via dedicated hand tracker (42 keypoints, MANO-compatible). Confidence scores per keypoint.
Segmentation Masks
Object + scene segmentation, temporally consistent
Why it matters
Object-centric training, affordance learning, and scene understanding. Temporal consistency enables tracking-based training objectives.
How it’s produced
Instance and semantic segmentation on each frame. Temporal propagation for consistency. Object IDs maintained across clips.
Contact Timing
Grasp initiation, release, and contact-state labels
Why it matters
Contact-rich manipulation requires knowing exactly when and how the hand contacts the object — pre-grasp, contact, grasp, release phases.
How it’s produced
Annotated via combination of hand pose signals and reviewer marking. Per-frame contact state label: none / pre-contact / contact / grasp / release.
Structured Metadata
Environment, task, embodiment, QA score — per clip
Why it matters
Enables filtering, curriculum learning, and dataset management. Task-level labels connect clips to training objectives.
How it’s produced
Annotated at capture time and reviewed post-annotation. Labels: environment category, task type, embodiment type, QA score, capture date.
QA methodology
Four-stage review before delivery
No clip reaches delivery without passing automated checks, human review, and a scoring threshold.
Automated check
Frame completeness, blur detection, annotation file integrity, metadata schema validation. Clips failing automated checks are flagged before human review.
Human review
Reviewer checks annotation accuracy, task completion, and content quality. Rejection criteria: incomplete task, misaligned annotations, out-of-spec content.
QA scoring
Each approved clip receives a per-layer accuracy score. Dataset-level QA score = weighted mean across layers and clips. Threshold enforced before delivery.
Final report
Per-dataset QA report generated: clip count, rejection rate, per-layer scores, provenance summary. Delivered alongside the dataset.
Example QA score
0.94
Clips scoring below the threshold are rejected before delivery. The exact threshold is agreed at scoping.
Provenance
Full audit trail on every dataset
Every dataset includes a provenance record — traceable, inspection-ready, no gaps.
Capture context
Capture date, environment category, and task type — recorded at time of capture.
Collector consent reference
Consent record ID for each collector. Verifiable without exposing personal identity.
Annotation pipeline version
Which annotation pipeline version processed each clip — reproducible and auditable.
QA record
Reviewer ID, review date, pass/fail outcome, and score for every reviewed clip.
Rights clearance status
Global rights clearance status per clip — confirmed before delivery.
Delivery manifest
Full manifest of files, checksums, and layer availability delivered with each dataset.
See what we can capture or request a sample to review annotation quality first-hand.