PaXini Omnimodal Embodied Intelligence Dataset

OmniSharingDB

OmniSharing DB is a omni-modal embodied dataset covering tactile, visual, audio, text, and joint data, providing comprehensive support for embodied AI models from physical perception to semantic cognition.China’s first government-supported, fully compliant embodied AI cross-border data project, strictly adhering to the Cybersecurity Law, Data Security Law, and PIPL.

Legally Exportable

Strictly compliant with China’s Cybersecurity Law, Data Security Law, and Personal Information Protection Law (PIPL), with official approval for cross-border data transfer.

High-speed Transfer

PaXini collaborates with leading global cloud service partners to build a “fast lane” for large-scale data delivery.

Compliance Channel

Establishes a compliant pipeline from collection, processing, and certification to cross-border transfer, enabling global circulation for embodied data.

Dataset Features

Comprehensive data diversity for embodied AI development

Tactile Module

Records high-precision, distributed tactile interaction data.

Joint Module

Records joint angles from the collection glove and tracks global trajectory data.

Visual Module

Provides synchronized multi-view, high-frame-rate video streams that comprehensively capture the task scenario.

Spatial Trajectory

Process multi-view synchronized video streams and annotate corresponding spatial trajectories using estimation algorithms.

Voice Module

Voice data collected before task execution, describing the task.

Text Module

Structured textual information obtained by converting the collected Voice data.

Collection Devices

PXCap V Ultra

Proprietary multi-modal hand sensing system integrates sensing glove, camera array and IPC to acquire hand kinematics, full-domain tactile data and task videos.

82 DoFFull-hand tactile coverageUltra-compact anti-magnetic encoder

PXCap III

PXCap III combines isomorphic data collection and execution with high-precision tactile and motion capture. It provides reliable, high-quality data for embodied AI in real-world scenarios.

Isomorphic Collection and ExecutionHigh-Precision SensingStrong Actuation PerformanceHuman-Centric

Collection Solutions

In-Domain Collection

Targeting dexterous manipulation scenarios, human-centric high-precision sampling generates high-fidelity haptic data to close model data loops.

Operation-focusedHuman-CentricPrecision-guaranteed

In the Wild Collection

Open collection builds upon In the Wild data collection, relaxes scenario constraints to enrich the model’s environmental awareness, improves long-sequence task planning performance, and evolves iteratively to expand diversified data collection solutions.

Egocentric DataLow-Cost Large-Scale CollectionReal-Environment Collection

Scene Types

15+N

Task Categories

100+

Task Categories

1000+

All Scenes

Office

Home

Healthcare

Medical

Rehabilitation

Logistics

Education

Entertainment

3C Manufacturing

Restaurant

Agriculture

Business

Retail

Disaster Rescue

Automotive

Industrial

Meta Action

Data Generalization

Cross-embodiment capabilities and comprehensive data characteristics

Data Features

Simulation Env

High-Fidelity physical sensing&Sim Platform

Digital Assets

High Fidelity 3D Gasuusian splatting

Multi-Adaptability

High-Accuracy Retargeting&Response

Different Method

Comparing different data collection approaches for embodied AI

approach

Efficiency
Motion Fidelity
Tactile Fidelity
Cross-embodiment Naturalness

Human-Centric Data Collection

(PaXini PMEC Hyper-collection Technology)

Teleoperation

(Traditional Technology)

Simulated Data Synthesis

(Traditional Technology)

Data Format

Hierarchical dataset organization across all processing stages

DF-1

The overall input: raw data after preprocessing and quality inspection

DF-2

1st output: DF-1 with encoder and tactile data parsed; adds bimanual and object poses; includes both action and observation

The observation layer in DF-2 and DF-2R is the state data in the episode, while action is one frame behind observation. To ensure both arrays have equal length, the last frame in action is repeated.

DF-2R

2nd output: DF-2 retargeted to a dexterous hand model

The observation layer in DF-2 and DF-2R is the state data in the episode, while action is one frame behind observation. To ensure both arrays have equal length, the last frame in action is repeated.

DF-3

3rd output: converts DF-2R to the LeRobot dataset format; can be used for VLA model training

Dataset Structure DF-1
task_list
task_XXX
episode_0_HHMMSS_CollectionDate_RoomID_CollectorsID.hdf5
dataset
-
metaTask Description
observation
-
audioCompressed Audio Stream, Including Text Description
state
left_hand
encoder
-
data
-
timestamp
tactile
-
data
-
timestamp
right_hand
encoder
-
data
-
timestamp
tactile
-
data
-
timestamp
image
-
rgbd_rgb_extrinsic
RGB_CameraXXX
-
timestamp
-
data
-
extrinsics
-
intrinsics
RGBD_CameraXXX
-
timestamp
-
extrinsics
-
inner_extrinsics
color
-
data
-
intrinsics
depth
-
data
left
-
data
-
intrinsics
right
-
data
-
intrinsics
-
episode_1_HHMMSS_CollectionDate_RoomID_CollectorsID.hdf5
-
episode_2_HHMMSS_CollectionDate_RoomID_CollectorsID.hdf5
/
......
task_XXX
/
......

Omnisharing Toolkit

Comparing different data collection approaches for embodied AI

PX Pose

the pose estimation and processing module

Raw visual data parsing

DF-1 format parses RGBD camera information and aligns multi camera temporal data to provide input for pose estimation.

Pose Estimation of Wristbands and Objects

Binocular infrared depth estimation, wristband automatically detects pose, and objects need to be manually labeled with masks to estimate pose.

PX Post-Process

Parses and converts after all raw data is obtained

Parsing

Processes the raw data output from PX Pose and generates the DF-2 data format.

Retargeting

Retargets DF-2 data to the embodiment and outputs DF-2R.

Conversion

Processes DF-2R data and exports it in the LeRobot data format.

Data Closed Loop

Cross-embodiment capabilities and comprehensive data characteristics

Real & Simulated Data Verification

Publications

Cross-embodiment capabilities and comprehensive data characteristics

PaperDatasetCode

OmniVTLA: Vision-Tactile-Language-Action Model with Semantic-Aligned Tactile Sensing

Paper

TacCompress: A Benchmark for Multi-Point Tactile Data Compression in Dexterous Manipulation

PaXini · Shenzhen

+86 755 2357 4593

PaXini · Shanghai

+86 21 5456 1536

PaXini · Tianjin

+86 22 8241 1882

mkt@paxini.com China

sales_global@paxini.com Overseas

hr@paxini.com