From Noise to Masterpiece: A Friendly Deep Dive into the Midjourney image generation process for Creative Souls
If you have ever seen beautiful, imaginative, or very realistic images while using Discord or Instagram. These may be examples of Midjourney.v, an application that allows users to create images using artificial intelligence. Most people refer to Midjourney.v as just a “Google image search on steroids”. When they perform a search, they get an image with no idea how it made. However, if you have attempted to create own images using Midjourney.v, you will notice that when you perform a search. You get four small boxes that start out blurry and gradually become clearer until they appear as complete works of art. This is how Midjourney image generation process. The application is actually generating the final image from scratch, rather than “finding” an existing image.
Why does it start with that “fuzzy” static?

The first thing you will notice upon hitting ENTER on a prompt is the chaos of static. This is not a loading screen or a glitch. The collective whiteness, blackness, and greenness represent the conceptual ‘canvas’ that the AI is creating. For all intents and purposes, you may think of the ‘Canvas’ as Michelangelo looking at a large block of marble. To him, it is not a rock but rather the statue inside, which he will chip away at until only what doesn’t belong is left.
In the Midjourney Image Creation Process, the AI begins with a blank state (random noise). The AI knows that there is an image of a cyberpunk Kuching city in the amount of random noise created. Therefore, the AI will work through its process to identify and refine an image (not just by pasting together pixels). As you watch the AI eliminate noise to produce an actual image, the progress bar moving from 0-100% will provide a very satisfactory feeling of watching an AI create in real-time.
Understanding Midjourney technology of image generation process
The next point of discussion is an explanation of the engine that powers our program. The diffusion model. The word “diffusion” conjures up images of a chemistry lab, but when understood in terms of creating a product by “reversing disorder” is simple. For example, when you take a clear image of nasi lemak, put a small amount of “digital dust” on it, and then put a little more on until the image is a mound of dust. You can no longer see any of the rice or sambal, and this would be an example of forward diffusion.
Whereas, the midjourney diffusion model technology is trained using millions of images. With the images being converted to noise progressively over time as the AI’s training occurs. As a result, when a user submits a prompt, the AI starts its calculations by seeing pure noise. And then asks itself, “If I removed a little of this dust, is there more of what the user described in this image?” In this way, midjourney can generate thousands of images that are so coherent and artistically pleasing, compared to other AI technologies of the past.
The “Secret Sauce” of the Midjourney image generation process

You may be wondering, “How does it determine my written words relate to particular shapes and colors”? This is where what we call the “Bridge” comes into play. The AI does not view words the same as we do. When you type out the term: “Stunning sunset over Mount Kinabalu”, the AI’s vector system creates numerical representations of the written words. As a result of this numerical representation, when you generate an image using the Midjourney image generator. The vectors represent GPS coordinates within the AI’s map of all the items it has viewed within what is referred to as the Latent Space.
Within this Latent Space, you will have coordinates representing sunsets, landscapes, and location. Wherein high-quality photographic imagery will be located respectively as determined by your input. Your input coordinates identified with a given image will allow the AI to denoise within the given area of coordinates provided. Midjourney prompt-based image generation is considered an art, as you are not just ordering a pizza. However, instead, you are supplying the AI with the coordinates representing your preferred taste of pizza out of a sea of noise.
How the Midjourney neural network image model “thinks”
You can think of how a Midjourney AI image generation process since a neural network has been distorted by sound waves when creating your art. Each layer of the network acts as a large layer that will respond to different aspects of generating your image. It starts from what’s on your horizon down to how the smoke curls around anything in the air.
In addition to being able to create high-quality images very quickly, Midjourney AI images are also referred to as generative arts. Because they utilize a system of checks and balances within each portion of the brain for the neural network, so the “brain cells” and the “critics” (often referred to as “discriminators” or “post-checkers”). While one part of the neural network suggests what a lens flare looks like, the second part (the “critic”) will argue back and forth to decide whether it does or does not look like a real lens flare before producing your final digital work of art.
Making it work for you in the real world

In the end, the key to effectively using creative AI tools is “collaboration;” you supply the prompt (soul) and the Midjourney visual AI system does the math and creates the pixels (the technical work). Midjourney’s digital art generation is designed to be unpredictable—the system starts with “random noise,” meaning that entering the same prompt two times could produce very different results, similar to fishing in a digital ocean: you know what you’re looking for, but your “catch” will always be a surprise. In your next wait for the 0% to reach the 100%, remember: you are witnessing an advanced AI text to image generation system “hallucinating” as they create a reality (influenced by your input). There is both mathematical computation and data processing occurring, in addition to a considerable amount of “magic” happening during the process as well.