Telegram Group & Telegram Channel
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control

7 Mar 2025 · Yuxuan Bian, Zhaoyang Zhang, Xuan Ju, Mingdeng Cao, Liangbin Xie, Ying Shan, Qiang Xu ·

Video inpainting, which aims to restore corrupted video content, has experienced substantial progress. Despite these advances, existing methods, whether propagating unmasked region pixels through optical flow and receptive field priors, or extending image-inpainting models temporally, face challenges in generating fully masked objects or balancing the competing objectives of background context preservation and foreground generation in one model, respectively. To address these limitations, we propose a novel dual-stream paradigm VideoPainter that incorporates an efficient context encoder (comprising only 6% of the backbone parameters) to process masked videos and inject backbone-aware background contextual cues to any pre-trained video DiT, producing semantically consistent content in a plug-and-play manner. This architectural separation significantly reduces the model's learning complexity while enabling nuanced integration of crucial background context. We also introduce a novel target region ID resampling technique that enables any-length video inpainting, greatly enhancing our practical applicability. Additionally, we establish a scalable dataset pipeline leveraging current vision understanding models, contributing VPData and VPBench to facilitate segmentation-based inpainting training and assessment, the largest video inpainting dataset and benchmark to date with over 390K diverse clips. Using inpainting as a pipeline basis, we also explore downstream applications including video editing and video editing pair data generation, demonstrating competitive performance and significant practical potential. Extensive experiments demonstrate VideoPainter's superior performance in both any-length video inpainting and editing, across eight key metrics, including video quality, mask region preservation, and textual coherence.

Paper: https://arxiv.org/pdf/2503.05639v2.pdf

Code: https://github.com/TencentARC/VideoPainter

Datasets: VPData - VPBench

@Machine_learn



tg-me.com/Machine_learn/3491
Create:
Last Update:

VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control

7 Mar 2025 · Yuxuan Bian, Zhaoyang Zhang, Xuan Ju, Mingdeng Cao, Liangbin Xie, Ying Shan, Qiang Xu ·

Video inpainting, which aims to restore corrupted video content, has experienced substantial progress. Despite these advances, existing methods, whether propagating unmasked region pixels through optical flow and receptive field priors, or extending image-inpainting models temporally, face challenges in generating fully masked objects or balancing the competing objectives of background context preservation and foreground generation in one model, respectively. To address these limitations, we propose a novel dual-stream paradigm VideoPainter that incorporates an efficient context encoder (comprising only 6% of the backbone parameters) to process masked videos and inject backbone-aware background contextual cues to any pre-trained video DiT, producing semantically consistent content in a plug-and-play manner. This architectural separation significantly reduces the model's learning complexity while enabling nuanced integration of crucial background context. We also introduce a novel target region ID resampling technique that enables any-length video inpainting, greatly enhancing our practical applicability. Additionally, we establish a scalable dataset pipeline leveraging current vision understanding models, contributing VPData and VPBench to facilitate segmentation-based inpainting training and assessment, the largest video inpainting dataset and benchmark to date with over 390K diverse clips. Using inpainting as a pipeline basis, we also explore downstream applications including video editing and video editing pair data generation, demonstrating competitive performance and significant practical potential. Extensive experiments demonstrate VideoPainter's superior performance in both any-length video inpainting and editing, across eight key metrics, including video quality, mask region preservation, and textual coherence.

Paper: https://arxiv.org/pdf/2503.05639v2.pdf

Code: https://github.com/TencentARC/VideoPainter

Datasets: VPData - VPBench

@Machine_learn

BY Machine learning books and papers




Share with your friend now:
tg-me.com/Machine_learn/3491

View MORE
Open in Telegram


Machine learning books and papers Telegram | DID YOU KNOW?

Date: |

However, analysts are positive on the stock now. “We have seen a huge downside movement in the stock due to the central electricity regulatory commission’s (CERC) order that seems to be negative from 2014-15 onwards but we cannot take a linear negative view on the stock and further downside movement on the stock is unlikely. Currently stock is underpriced. Investors can bet on it for a longer horizon," said Vivek Gupta, director research at CapitalVia Global Research.

Machine learning books and papers from us


Telegram Machine learning books and papers
FROM USA