Hello Dev.to! I want to share my experience of working with neural networks. Lately, they have been making a lot of noise, and there are many examples of high-quality and well-crafted generated images on the internet. Inspired by these examples, I decided to test Midjourney, Stable Diffusion, and Kandinsky on a real project to identify their strengths and weaknesses and understand which neural network would be best suited for my future work. The main goal was to generate a button for a landing page that would be shaped like a classic Enter button but visually more interesting with highlights, neon, and futurism. Below, I will describe my process and the results of the neural network generations.
First attempt
I took a simple approach and wrote a basic prompt, listing all the characteristics I wanted to see in the image. But the result was far from what I had in mind, and the button looked nothing like Enter.
Promt: enter button black color, luminous inscription, white background, strongly detailed, top view
I ran the generation of this prompt many times, but the result did not improve. Midjourney and Kandinsky generated a beautiful button with many details, but it was far from what I wanted. Stable Diffusion generated anything but keyboard buttons. I realized that the number of generations would not change anything, so I decided to try a different tactic.
Reference image
Since none of the neural networks seemed to understand what a classic Enter button looked like, I decided to show them with an image. I found an image that matched my wishes and fed it to each neural network: for Midjourney, I provided a link to the image and prompt, for Kandinsky, I used the "Image Variation" mode, and for Stable Diffusion, I used the ControlNet extension. And then I waited.
Promt: black enter button, luminous inscription, white background, strongly detailed, top view
Midjourney was far from the desired shape, but it generated a button with some angles and inscriptions. Stable Diffusion made a button that was very similar to the reference image, but the result was very boring. Kandinsky generated something abstract, and I needed to experiment more with it.
Realizing that each neural network required its own approach to achieve a better result, I started experimenting with each of them separately, hoping to find the perfect method.
Midjourney
I continued to use the original reference image with a slightly modified prompt. For some of the good results, I requested additional variations, hoping to get something more interesting. Out of the huge number of variations, only a few were close to what I had in mind. I still needed to work on them.
Promt: black enter button, irregular shape, white background, super detailed, top view
Stable Diffusion
Since the neural network showed good results when working through ControlNet, I repeated the process with the reference image of the button and simplified the prompt to "black button, luminous inscription, white background, strongly detailed, top view." I had to experiment with the Preprocessor modes in ControlNet: sometimes the neural network saw the outline of the button, sometimes it could reproduce the volume, and sometimes it drew the button in simple straight lines (for clarity, I displayed the image on the left). It was funny that despite the reference image, it added some new and strange objects: a portrait, a button, symbols, etc.
I liked the high speed of work and the ability to adjust the number of results, but Stable Diffusion generated a lot of irrelevant images.
Preprocessor, Model: normal_map
Preprocessor, Model: canny
Preprocessor, Model: depth
Preprocessor, Model: hed
Preprocessor, Model: mlsd
Preprocessor, Model: segmentation
Kandinsky
At first, I tried to feed it the reference image several times and see what would happen. Apparently, due to the lack of the ability to add a prompt in this mode, it generated anything but what I wanted.
I managed to get several images that looked visually good but were far from what I had in mind. Then I switched to the usual method of generating an image through a prompt, changing and simplifying it in every way to find some pattern, but I never found it.
promt: black enter button with luminous inscription
promt: black enter button with glowing inscription
promt: enter button from the keyboard, black color, white background, glowing inscription
promt: enter button shape, black color, white background, luminous inscription
In the end
Initially, I had a specific shape in mind for the Enter button with a few added elements and symbols, but after generating images in different neural networks, I decided to use the Midjourney result as a basis and refine it in Figma.
I had to start the entire drawing process from scratch: I created the shape and volume, changed the button color from black to gray, added highlights, shadows, glowing icons, and suitable thematic text.
Conclusion
There are no bad or good neural networks. From my personal experience, I realized that each has its own purpose. Midjourney and Kandinsky are very good for creating some creative images, covers, or for finding ideas. Midjourney has an undoubted advantage of generating new variations based on the previous result. As for Stable Diffusion, in combination with ControlNet, you can get a more predictable result, and you can draw a reference shape in any graphic editor in 1 minute, which is what we did in the future
Top comments (0)