tg-me.com/RIMLLab/144
Last Update:
π Compositional Learning Journal Club
Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.
β
This Week's Presentation:
πΉ Title: Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control
πΈ Presenter: Arshia Hemmat
π Abstract:
This presentation introduces advancements in addressing compositional challenges in text-to-image (T2I) generation models. Current diffusion models often struggle to associate attributes accurately with the intended objects based on text prompts. To address this, a new Edge Prediction Vision Transformer (EPViT) is introduced for improved image-text alignment evaluation. Additionally, the proposed Focused Cross-Attention (FCA) mechanism uses syntactic constraints from input sentences to enhance visual attention maps. DisCLIP embeddings further disentangle multimodal embeddings, improving attribute-object alignment. These innovations integrate seamlessly into state-of-the-art diffusion models, enhancing T2I generation quality without additional model training.
π Paper: Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control
Session Details:
- π
Date: Sunday
- π Time: 5:00 - 6:00 PM
- π Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! βοΈ
BY RIML Lab

Share with your friend now:
tg-me.com/RIMLLab/144