Latest advances have achieved realistic virtual try-on (VTON) through localized garment inpainting using latent diffusion models, significantly enhancing consumers' online shopping experience. However, existing VTON technologies neglect the need for merchants to showcase garments comprehensively, including flexible control over garments, optional faces, poses, and scenes. To address this issue, we define a virtual dressing (VD) task focused on generating freely editable human images with fixed garments and optional conditions. Meanwhile, we design a comprehensive affinity metric index (CAMI) to evaluate the consistency between generated images and reference garments. Then, we propose IMAGDressing-v1, which incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE. We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet, ensuring users can control different scenes through text. IMAGDressing-v1 can be combined with other extension plugins, such as ControlNet and IP-Adapter, to enhance the diversity and controllability of generated images. Furthermore, to address the lack of data, we release the interactive garment pairing (IGPair) dataset, containing over 300,000 pairs of clothing and dressed images, and establish a standard pipeline for data assembly. Extensive experiments demonstrate that our IMAGDressing-v1 achieves state-of-the-art human image synthesis performance under various controlled conditions. The code and model will be available at https://github.com/muzishen/IMAGDressing.
IGPair includes multiple models for each clothing item. It is also the first dataset with a resolution exceeding 2k*2k. Additionally, IGPair is the first publicly available dataset that includes textual descriptions, diverse scenes, and various styles. Specifically, IGPair includes 86,873 garments. We categorize the garments into 18 types, and the dataset consists of 324,857 image pairs.
Now we release the body mask (in folder './body_mask/'), clothes (in folder './clothes/'), densepose (in folder './densepose/'), openpose (in folder './openpose/'), IGPair Test Data (in folder './IGPair_Test/'), and annotations (in folder './IGPair.json/') for IGPair.
Each annotation adheres to the subsequent Parquet format specifications, including column names and corresponding content examples:
{
"text": "caption",
"image_file": "model_path",
"cloth_file": "cloth_path",
"cloth_type": "type"
}
The test set includes 1600 upper-body cloth, 200 dress, and 200 lower-body cloth images in total. The image naming format is upper_body_00001.jpg, dress_01601.jpg, and lower_body_01801.jpg.
cd IGPair/IGPair_dataset
cat IGPair.zip.* > IGPair.zip
@article{shen2024IMAGDressing-v1,
title={IMAGDressing-v1: Customizable Virtual Dressing},
author={Shen, Fei and Jiang, Xin and He, Xin and Ye, Hu and Wang, Cong, and Du, Xiaoyu, Zechao, Li and Tang, Jinhui},
booktitle={arXiv preprint arXiv:2407.12705},
year={2024}
}