BUILD. TEST. DEPLOY

UnitPort

UnitPort

A free, unified & open-Source gateway for robot community.
One tool to rule them all.

Demo Available - Windows ✔ - Linux ✔ - MacOS ⚠ (WIP)

Demo Available - Windows ✔ - Linux ✔ - MacOS ⚠ (WIP)

ROS2 & MORE

# ☕️Updates : V1.0.0 Released

# =========== 2026-05-20 =========== #

> First stable public release;

> Mature Stable-Baselines3 on MuJoCo;

> IsaacLab/IsaacSim support is present & wired, including PPO and AMP workflows, but remains beta for advance functions;

> Exported artifacts (manifest, policy & deploy/runtime contract) can be loadable on another machine without reconstructing;

> Basic ROS2 connectivity, Greenfield + Brownfield driven project deployment.

> UAVs, waypoint autonomy, rich scene interaction, VLA are not part of the supported v1.0.0 training envelope.

> ⚠ Bipedal / humanoid training requires detailed instruction with professional knowledge.


# =========== WHAT'S NEXT =========== #

> Fixing rough edges discovered during use.

> Refine the training framework for quadruped to a stable level.

> Fill the key gaps in the humanoid training framework and add DeepMimic-style motion tracking.

> Improve the device connection function and provide an Actuator model calibration interface.

What is UnitPort?

Making robotics training and deployment more accessible to everyone.

Code + Visual Nodes

Build robot training pipelines visually with nodes, while keeping full control to source codes when needed.

Design, edit, and debug workflows from a unified workspace.

Community Driven & Open Source

No paywalls, locked workflows, or gated core features. UnitPort Studio is built as a community friendly robotics studio with free and open development at its core.

Multi-Robot Training, easy switch

Switch between robot, policies, assets, and training environments with minimal setup. Build reusable workflows and sim2sim across different robotics stacks.

Easy Sim2Sim Deployment

Train and transfer policies across IsaacLab, Mujoco, and real robot environments with streamlined sim2sim workflows and deployment tools.

Sim 2 Sim Design

How we deal with the IsaacLab & MuJoCo sim2sim Gap

1

Unified Joint Schema

2

Stance Auto Calibration

3

Mass Matrix Adaptive PD Gains

4

Closed Loop Validation

Why this matters

Most sim2real failures attributed to "the reality gap" are in fact sim2sim failures hiding upstream — the policy never executed faithfully even on the second simulator, let alone hardware.

By making joint topology, initial state, and actuator dynamics provably consistent across engines, UnitPort reduces the deployment problem to a single remaining unknown: contact and friction modeling, which is the honest reality gap and the right place to spend domain randomization budget.

Composite Rewards

1

Reward Decomposition

Composite Per-Motion Rewards — Design Notes: A locomotion policy trained against a flat-sum reward routinely finds local optima that exploit terms shaped for a different motion regime — a standing policy farming gait-cadence reward designed for running is the textbook failure. We decompose the per-step return into a command-conditioned mixture over motion items, each carrying its own term bag with locally-calibrated shaping coefficients.

Instead of the flat sum R(s,a) = Σᵢ wᵢ · rᵢ(s,a), we factor the return through a per-item mixture indexed by motion k (stand / walk / run / turn / strafe / …):

R(s, a, c) = Σ_k  α_k(c) · ( Σ_t  w_{k,t} · r_t(s, a) )

where c ∈ ℝ³ is the velocity command [v_x, v_y, ω_yaw], and αk(c) is a unit-sum mixture recomputed each step. Shaping coefficients w_{k,t} are calibrated per motion: the gait-cadence weight for run can be aggressive without leaking into stand, because αstand(c) attenuates it to zero whenever the command says stop.

2

Trapezoidal Membership with Adaptive Width

Each motion item declares per-channel command ranges [lo_{k,j}, hi_{k,j}]. The per-axis membership is a clipped-linear ramp; the full item score is the product across constrained axes:

	m_{k,j}(c_j) = clip( (c_j − lo_{k,j}) / w_{k,j} , 0, 1 )
	· clip( (hi_{k,j} − c_j) / w_{k,j} , 0, 1 )

	w_{k,j} = min( w_global , (hi_{k,j} − lo_{k,j}) / 2 )
	α_k(c) = ( ∏_j  m_{k,j}(c_j) )  /  Z(c) ,
	Z(c) = Σ_k    ∏_j  m_{k,j}(c_j)

The adaptive width w_{k,j} = min(w_global, ½·range) is the load-bearing detail. A naïve fixed-width ramp lets wide items dominate near the origin: a stand item with range ±0.05 would lose to a walk item [0.1, 0.6] at c = 0 because walk's edges sit further from any single cmd point. Capping w_{k,j} at each item's half-range forces every item to saturate at membership 1 inside its own hyperbox, restoring scale-invariance. When Z(c) → 0 (cmd lies outside every item's hyperbox by more than a few blend widths), the resolver argmin-falls back to the nearest item by L2 distance to range centre — a uniform smear over unrelated motions corrupts the reward signal harder than a single misattribution.

3

Critic Augmentation for Boundary Stability

If the value function V(s) cannot predict the reward cross-fade at item boundaries, the value loss oscillates whenever c traverses an αk(c) ramp. We extend the observation with the command magnitude and the active-item mixture vector — explicitly, not the argmax:

	o_t = [ … base_obs … , ‖c_t‖₂ , α_1(c_t), α_2(c_t), … , α_K(c_t) ]

The critic now sees the same blend distribution the reward uses, so V can represent itself as a smooth interpolation between per-item value tables, V(s) ≈ Σ_k α_k(c) · V_k(s_{base}), instead of fitting a discontinuous step function across the c-space.

4

Fail-Loud Wiring Contract

Composite reward setups fail in subtle ways when half-wired: a canvas with per-item reward edges but no motion-item command envelopes looks like composite but silently falls back to the global term bag (a category of bug that has bitten every prior iteration of this pipeline). The contract is enforced by an explicit raise rather than a fallback:

if rewards_cfg.terms_by_item and not motion_cfg.training_items:
    raise RuntimeError(
        "spec.rewards.terms_by_item is non-empty but "
        "spec.motion.training_items is missing — cannot map "
        "the command vector to an active item."
    )

A static scale-skew check (R_REWARD_SCALE) complements this at compile-time: a per-item Σ|w| ratio above 3:1 raises a WARNING flagging cross-item budget imbalance, the failure mode where the policy preferentially chases the high-budget item even on commands that should activate a different one.

5

Per-Term & Per-Item Telemetry

Without per-term breakdowns, debugging a composite policy is guesswork — an aggregate reward curve cannot distinguish "walk learnt poorly, stand learnt well" from the inverse. Each step writes a structured breakdown into the info dict, which a callback streams to TensorBoard:

info["reward/<term>"]          # weighted contribution per term
info["reward/item/<item_id>"]  # blended total per motion item
info["reward/active_item"]     # argmax-α item id this step

# Aggregated scalars in TensorBoard:
rollout/reward/<term>
rollout/reward/item/<item_id>
rollout/reward/active_item_frac/<item_id>

The active_item_frac family surfaces the empirical command distribution the policy actually sees during training — the difference between "trained on all items uniformly" and "spent 90% of rollout near c = 0" is exactly the kind of silent bias that flat-sum debugging cannot catch.

For more infos and details, you can find us here

contact@unitport.ai


License

Made with love by UnitPort Team.

Made with love by UnitPort Team

Made with love by UnitPort Team.