Privacy-centric Motion Retargeting

Anonymizing skeleton-based motion data while preserving action utility through adversarial deep learning

ICCV 2025 - International Conference on Computer Vision
Read Paper View on GitHub Documentation

Overview

Protecting privacy in skeleton-based motion capture while maintaining utility

Privacy Protection

Reduces re-identification risk from 87.8% to 7.8% by masking personally identifiable information like body shape, gait patterns, and limb lengths.

Action Preservation

Maintains 35.7% action recognition accuracy, significantly outperforming baseline anonymization methods (2-3%).

Motion Retargeting

Transfers motion from original skeleton to dummy skeleton, effectively replacing identity while preserving movement semantics.

Adversarial Learning

Uses cooperative and adversarial training to disentangle identity from motion without prior knowledge of attack models.

Fast Inference

Processes 75 frames (~2.5s of motion) in just 0.006 seconds, enabling real-time anonymization applications.

Robust Evaluation

Tested on NTU RGB+D 60 and 120 datasets with comprehensive privacy and utility metrics.

Architecture

Two-encoder/one-decoder paradigm with adversarial and cooperative learning

flowchart TB subgraph Input S[Original Skeleton] D[Dummy Skeleton] end subgraph Encoders EM[Motion Encoder] EP[Privacy Encoder] end subgraph Decoder DEC[Decoder] end subgraph Classifiers M[Motion Classifier] P[Privacy Classifier] Q[Quality Controller] end S --> EM S --> EP D --> EP EM --> DEC EP --> DEC EM -.-> M EP -.-> M EP -.-> P EM -.-> P DEC --> Q DEC --> OUT[Anonymized Skeleton] style EM fill:#6366f1 style EP fill:#8b5cf6 style DEC fill:#10b981 style M fill:#f59e0b style P fill:#f59e0b style Q fill:#ef4444 style OUT fill:#10b981

PMR architecture showing the flow from input skeletons through encoders, decoder, and classifiers to produce anonymized output.

Cross-Reconstruction Data Flow

flowchart LR A[Original Skeleton] B[Motion Embedding] C[Dummy Skeleton] D[Privacy Embedding] E[Decoder] F[Anonymized Skeleton] A --> B C --> D B --> E D --> E E --> F style A fill:#6366f1 style B fill:#6366f1 style C fill:#8b5cf6 style D fill:#8b5cf6 style E fill:#10b981 style F fill:#10b981

Motion from original skeleton (via EM), privacy/structure from dummy skeleton (via EP).

Cooperative and Adversarial Training

flowchart LR S[Skeleton] --> EM[Motion Encoder] S --> EP[Privacy Encoder] EM -->|Cooperative| M[Motion Classifier] EM -.->|Adversarial| P[Privacy Classifier] EP -->|Cooperative| P EP -.->|Adversarial| M EM --> D[Decoder] EP --> D D --> Q[Quality Controller] D --> OUT[Output] style S fill:#6366f1 style EM fill:#6366f1 style EP fill:#8b5cf6 style M fill:#f59e0b style P fill:#f59e0b style D fill:#10b981 style Q fill:#ef4444 style OUT fill:#10b981

Cooperative (solid): EM ↔ M (motion), EP ↔ P (privacy)
Adversarial (dashed): EM ⟷ P, EP ⟷ M
This disentangles motion from identity information.

Stage 1: Autoencoder Warm-up

EM
TRAIN
EP
TRAIN
D
TRAIN
M
FROZEN
P
FROZEN
Q
FROZEN

25 epochs (5 paired + 20 unpaired)
Trains encoders and decoder to reconstruct skeletons using reconstruction and smoothness losses.

Stage 2: Classifier Pre-training

EM
FROZEN
EP
FROZEN
D
FROZEN
M
TRAIN
P
TRAIN
Q
FROZEN

70 epochs (20 paired + 50 unpaired)
Pre-trains motion and privacy classifiers on frozen encoder embeddings.

Stage 3: Adversarial Training

EM
TRAIN
EP
TRAIN
D
TRAIN
M
TRAIN
P
TRAIN
Q
TRAIN

100 epochs (unpaired)
Cooperative and adversarial training. M cooperates with EM but adversarially trains against EP, and vice versa for P.

Stage 4: Motion Retargeting

EM
TRAIN
EP
TRAIN
D
TRAIN
M
FROZEN
P
FROZEN
Q
TRAIN

100 epochs (paired)
Cross-reconstruction with triplet loss, latent consistency, and end-effector losses. Fine-tunes for anonymization quality.

Motion Encoder

EM

Purpose: Extract action-specific temporal information

Architecture: 4-layer CNN with reflection padding and max pooling

Output: Motion embedding (256 × 32)

Privacy Encoder

EP

Purpose: Extract skeleton structure and style (PII)

Architecture: Identical to EM but serves different purpose

Output: Privacy embedding (256 × 32)

Decoder

D

Purpose: Reconstruct skeleton from embeddings

Architecture: 4-layer transpose CNN with upsampling

Input: Concatenated embeddings (512 × 32)

Motion Classifier

M

Purpose: Predict action from embeddings

Training: Cooperative with EM, adversarial with EP

Output: Action probabilities (60 or 120 classes)

Privacy Classifier

P

Purpose: Predict actor ID from embeddings

Training: Cooperative with EP, adversarial with EM

Output: Actor probabilities (40 or 106 classes)

Quality Controller

Q

Purpose: Distinguish real vs generated skeletons

Architecture: GAN-style discriminator

Output: Real/fake probability

Results

Comprehensive evaluation on NTU RGB+D datasets

Performance Comparison on NTU RGB+D 60

Method MSE ↓ AR Top-1 ↑ AR Top-5 ↑ Re-ID Top-1 ↓ Re-ID Top-5 ↓ Gender ↓ Linkage ↓
Original - 82.2% 85.0% 87.8% 97.3% 88.7% 69.6%
UNet (Moon et al.) 0.0834 2.6% 11.1% 3.0% 26.8% 3.0% 50.0%
ResNet (Moon et al.) 0.2988 1.8% 11.2% 9.1% 34.1% 9.0% 50.8%
DMR (Baseline) 0.0071 49.1% 73.1% 25.7% 60.3% 25.7% 50.0%
PMR (Ours) 0.0138 35.7% 63.0% 7.8% 26.4% 7.8% 50.0%

Key Findings

Optimal Privacy-Utility Trade-off

PMR achieves the best balance between privacy protection (7.8% Top-1 re-ID) and utility preservation (35.7% Top-1 action recognition).

Superior to Baselines

Outperforms UNet by 13.7× in action recognition (35.7% vs 2.6%) while maintaining strong privacy protection.

Strong Privacy Guarantees

Reduces re-identification from 87.8% to 7.8% (Top-1) and linkage attacks to random chance (50.0%).

Low Reconstruction Error

MSE of 0.0138 ensures visually plausible skeletons while anonymizing identity, balancing between DMR (0.0071) and Moon's methods (0.0834+).

Embedding Space Visualizations

Motion Embedding Clustering
Motion embeddings clustered by action class, showing clear separation of different actions.
Privacy Embedding Clustering
Privacy embeddings showing actor identity information is captured but not leaked to motion encoder.

Classifier Performance

Utility Classifier Accuracy
Motion classifier accuracy on motion vs privacy embeddings, demonstrating successful disentanglement.
Privacy Classifier Accuracy
Privacy classifier accuracy showing identity information is removed from motion embeddings.

Privacy-Utility Trade-off

Results Scatter Plot
Scatter plot showing PMR achieves optimal balance between privacy protection and utility preservation.

Ablation Study: Impact of Components (NTU-60 CV)

Configuration M & P Lsmooth Llatent Q MSE ↓ Re-ID ↓ AR ↑
DMR (Baseline) 0.0060 25.7% 49.1%
PMR w/o M & P 0.0081 15.8% 37.5%
PMR w/o Lsmooth 0.0117 13.6% 27.4%
PMR w/o Llatent 0.0144 12.6% 32.6%
PMR w/o Q 0.0139 9.3% 29.6%
PMR (Full) 0.0138 7.8% 35.6%

Key Insights

Adversarial Training Critical

Removing M & P classifiers increases re-ID from 7.8% to 15.8%, showing adversarial training is essential for privacy.

Smoothness Loss Matters

Without L_smooth, action recognition drops to 27.4%, indicating smoothness is critical for temporal coherence.

Latent Consistency Helps

Removing L_latent increases re-ID to 12.6%, showing it helps maintain the privacy-utility balance.

Quality Controller Improves Realism

Without Q, re-ID increases to 9.3%, showing the quality controller helps prevent identity leakage.

Getting Started

Quick start guide and usage examples

Installation

# Clone the repository
git clone https://github.com/Thomasc33/Privacy-Retargeting.git
cd Privacy-Retargeting

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Requirements

Python 3.8+

Modern Python version with type hints support

PyTorch 2.0+

Deep learning framework with CUDA support

CUDA 11.8+

For GPU acceleration (optional but recommended)

Training PMR Model

# Basic training on NTU-60
python cli.py train --dataset ntu60 --model pmr

# Training with custom parameters
python cli.py train \
    --dataset ntu60 \
    --model pmr \
    --batch-size 32 \
    --lr 1e-5 \
    --device cuda:0 \
    --checkpoint-dir checkpoints \
    --use-mlflow

# Training on NTU-120
python cli.py train --dataset ntu120 --model pmr

# Training DMR baseline (without adversarial learning)
python cli.py train --dataset ntu60 --model dmr

Training Stages

PMR training consists of 4 stages totaling ~295 epochs:

  • Stage 1: Autoencoder warm-up (25 epochs)
  • Stage 2: Classifier pre-training (70 epochs)
  • Stage 3: Adversarial training (100 epochs)
  • Stage 4: Motion retargeting (100 epochs)

Training takes approximately 6.5 hours on an NVIDIA RTX 3090 GPU.

Evaluation

# Evaluate trained model
python cli.py evaluate \
    --model-path checkpoints/pmr_best.pt \
    --dataset ntu60 \
    --output results.json

# Anonymize a skeleton sequence
python cli.py anonymize \
    --model-path checkpoints/pmr_best.pt \
    --input data/sample.pkl \
    --output data/anonymized.pkl

# Visualize results
python cli.py visualize \
    --input data/anonymized.pkl \
    --output video.gif

# Create comparison video
python cli.py compare \
    --original data/sample.pkl \
    --anonymized data/anonymized.pkl \
    --output comparison.gif

Evaluation Metrics

Utility Metrics

MSE: Reconstruction quality
Action Recognition: Top-1 and Top-5 accuracy

Privacy Metrics

Re-identification: Top-1 and Top-5 accuracy
Gender Classification: Binary accuracy
Linkage Attack: Matching accuracy

Python API

from models.pmr import PMRModel
import torch

# Load model
model = PMRModel(T=75, encoded_channels=(256, 32))
checkpoint = torch.load('checkpoints/pmr_best.pt')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Anonymize skeleton
original = torch.randn(1, 75, 25, 3)  # (batch, frames, joints, coords)
dummy = torch.randn(1, 75, 25, 3)

with torch.no_grad():
    anonymized = model.cross_reconstruct(original, dummy)

# Get embeddings for analysis
motion_emb, privacy_emb = model.get_embeddings(original)

print(f"Motion embedding shape: {motion_emb.shape}")
print(f"Privacy embedding shape: {privacy_emb.shape}")
print(f"Anonymized skeleton shape: {anonymized.shape}")

Configuration

from configs.default_config import Config

# Load default config
config = Config()

# Customize parameters
config.training.batch_size = 64
config.training.lr = 5e-5
config.training.alpha_emb = 1.0  # Increase adversarial strength

# Use custom config in training
from training.trainer import PMRTrainer
trainer = PMRTrainer(config)
trainer.train()

Implementation Details

Technical specifications from the paper

Hardware

GPU: NVIDIA RTX 3090 (24GB)
Training Time: ~6.5 hours
Inference: 0.006s per sequence

Framework

Deep Learning: PyTorch 2.0+
Optimization: Adam optimizer
Learning Rate: 1e-5

Data

Datasets: NTU RGB+D 60/120
Sequence Length: 75 frames
Joints: 25 per skeleton

Architecture

Encoders: 4-layer CNN
Decoder: 4-layer Transpose CNN
Embedding: 256 × 32

Loss Function Weights

Loss Component Weight (α) Purpose
Reconstruction 2.0 Ensure accurate skeleton reconstruction
Smoothness 3.0 Temporal consistency between frames
Cross-reconstruction 0.1 Motion transfer quality
Triplet 1.0 Embedding space structure
Latent Consistency 10.0 Consistent embeddings across views
End-effector 1.0 Preserve hand/foot positions
Adversarial (Embedding) 0.5 Disentangle motion and identity
Quality Controller 0.5 Realistic skeleton generation

Citation

If you use this work in your research, please cite our ICCV 2025 paper

@InProceedings{Carr_2025_ICCV,
    author    = {Carr, Thomas and Xu, Depeng and Yuan, Shuhan and Lu, Aidong},
    title     = {Privacy-centric Deep Motion Retargeting for Anonymization of Skeleton-Based Motion Visualization},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {13162-13170}
}

Publication Details

Conference: IEEE/CVF International Conference on Computer Vision (ICCV) 2025

Pages: 13162-13170

Date: October 2025

View on CVF Open Access