tg-me.com/RIMLLab/147
Last Update:
๐ Compositional Learning Journal Club
Join us this week for an in-depth discussion on Compositional Learning in the context of cutting-edge text-to-image generative models. We will explore recent breakthroughs and challenges, focusing on how these models handle compositional tasks and where improvements can be made.
โ
This Week's Presentation:
๐น Title: GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
๐ธ Presenter: Dr Rohban
๐ Abstract:
This innovative framework addresses the limitations of current image generation models in handling intricate text prompts and ensuring reliability through verification and self-correction mechanisms. Coordinated by a multimodal large language model (MLLM) agent, GenArtist integrates a diverse library of tools, enabling seamless task decomposition, step-by-step execution, and systematic self-correction. With its tree-structured planning and advanced use of position-related inputs, GenArtist achieves state-of-the-art performance, outperforming models like SDXL and DALL-E 3. This session will delve into the systemโs architecture and its groundbreaking potential for advancing image generation and editing tasks.
๐ Papers: GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
Session Details:
- ๐
Date: Wednesday
- ๐ Time: 3:30 - 4:30 PM
- ๐ Location: Online at vc.sharif.edu/ch/rohban
We look forward to your participation! โ๏ธ
BY RIML Lab

Share with your friend now:
tg-me.com/RIMLLab/147