close
close

Association-anemone

Bite-sized brilliance in every update

OmniGen: An open source AI model that lets you edit images in conversation
asane

OmniGen: An open source AI model that lets you edit images in conversation

This is it decryptsits co-founder Josh Quittner had a casual meeting with his friend Vitalik Buterin.

No, not really. They had never met, let alone been in the same place at the same time. This image is a fake, which is not surprising. What’s surprising is that it took us less than a minute to build, using two photos and a simple message: “The man in picture 1 and the man in picture 2 posing for the cameras at a BBQ.” Quite ingenious.

The model is Omnigen and is much more than just an image generator. Instead, focus on the image publishing and context understanding, allowing users to modify their builds simply by chatting with the model, rather than loading standalone third-party tools. It is able to “reason” and understand commands thanks to the built-in LLM.

Researchers at the Beijing Academy of Artificial Intelligence have finally launched weight— executable AI models that users can run on their computer — of this new type of AI model that can be an all-in-one source for creating images. Unlike its predecessors, which functioned as single-purpose task executors (making artists load separate image generators, control networks, IP adapters, inpainting models, and so on), OmniGen functions as a comprehensive creative suite . It handles everything from basic image editing to complex visual reasoning tasks in a single, streamlined framework.

OmniGen is based on two core components: a Variational Autoencoder – the old VAE that all AI artists are so familiar with – that deconstructs images into their fundamental building blocks, and a Transformer model that processes varied inputs with remarkable flexibility. This lean approach eliminates the need for additional modules that often bog down other imaging systems.

Trained on a dataset of one billion images, called X2I (anything-to-image), OmniGen handles tasks ranging from text-to-image generation and sophisticated photo editing to more nuanced operations such as painting inside and manipulating the depth map. Perhaps most striking is his ability to understand context. So, for example, when asked to identify a place to wash hands, it instantly recognizes and highlights sinks in images, showing a level of reasoning that approaches human understanding.

In other words, unlike any other image generator currently available, users can “talk” to Omnigen in a similar way that they would interact with ChatGPT to generate and modify images – no need to deal with segmentation, masking or other complex techniques, because the model is able to understand everything simply by commands.

So imagine basically telling an open source pattern to create a herringbone winter coat, add fur trim and adjust the length – all in one go. If you don’t like it, you can just request “make white coat” and it would handle the task without you having to manually select the coat, upload a new model, request “white coat” and pray for the coat to look similar to your generation or opening Photoshop and having to deal with some color manipulation.

This is quite a significant discovery.

One of the exciting achievements of this new model is that OmniGen has built-in Microsoft’t Phi-3 LLM, and the researchers trained the model to apply a chain-of-thinking approach to image generation, breaking down complex creative tasks into smaller, easier steps to manage. , similar to how human artists work. This methodical process allows for unprecedented control over the creative workflow, although the researchers note that the quality of the results currently matches rather than exceeds standard generation methods.

Looking ahead, researchers are already exploring ways to improve OmniGen’s capabilities. Future iterations may include improved handling of text-heavy images and more sophisticated reasoning abilities, which may lead to an even more natural interaction between human creators and AI tools.

How to run Omnigen

Omnigen is open source, so users can run it locally. However, users have a few free generations thanks to Hugging Face — the world’s largest open source AI community/repository — so they can use its servers to test the model if they don’t have the necessary hardware.

Those who don’t want to bother much with the model can go to this free Hugging Face space and play with the pattern. A very intuitive user interface will open.

Basically, the template can handle up to three context images and a good amount of text input. It also shows a very detailed set of instructions for generating or editing images. If you are new to this, don’t bother too much with all the parameters. Simply enter the image (or images) you want to edit or use as inspiration into the program and request it the same way you would with ChatGPT, using natural language.

However, those who want to generate images locally will need to download the weights and some libraries. Given its capabilities, it is expected to require a lot of VRam to run. Some reports suggest that the model runs well on 12GB of VRam and is currently only compatible with Nvidia cards.

To install the templates locally, simply follow the instructions provided on the Github page: Basically, you create a new installation folder, clone the github repository, install the dependencies and you’re good to go. To have a nice UI instead of just using text, install the Gradio interface by following the steps provided in the Github page. Alternatively, you can follow this tutorial in case you prefer video instructions.

If you’re a bit more experienced, you can use ComfyUI to generate images. To install Omnigen, simply go to your download manager, search for the Omnigen node, and install it. Once you’re done, restart ComfyUI and you’re done. When executed, the node itself will unload the weights.

I was able to test the model and it takes a lot more to generate images compared to SD 3.5 or Flux. Its strength is not quality, but accuracy, meaning some images may lack detail or realism, but will exhibit high levels of prompt adherence, especially when dealing with natural language requests in edits.

In its current state, Omnigen is not a good picture generator for those looking for a model capable of surpassing Flux or SD 3.5. However, this model does not intend to be that way.

For those looking for an AI based image editorthis is probably one of the most powerful and easy-to-use options available today. With simple, prompt commands, it achieves results similar to what professional AI artists achieve with highly complex workflows dealing with highly specialized tools.

Overall, the model is a great alternative for beginners testing the Open Source AI waters. However, it could be great for professional AI artists if they combine its powerful capabilities into their own workflows. It could also drastically simplify workflows from dozens of different nodes or switch to a single generation with a few less elements to run and load.

For example, using it as a primary source to blend different elements into a composition and then discarding that result so it can go through a second pass with a stronger AI model could prove to be a very good solution and versatile to obtain large generations.

Generally intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.