Learning new concepts like Machine Learning (ML) can often feel superficial. However, when you encounter these concepts in real life, it can be fun and memorable. I want to share my story of how a simple request from my better half, Tina, led me to apply these ML concepts:
- Fine Tuning
- Data Preparation
- Hyper Parameters
- Over Fitting
NOTE: Jump to "Fun Images" at the bottom if you want to skip the text.
Back story
About 6 months ago, I got curious about ML again. The first thing I tried is Stable Diffusion (SD1.5 and SDXL) using in ComfyUI. I believe to better understand ML I have to be a user of ML-enabled apps. Another way for me to understand it is to explain ML to Tina especially why it can be useful. While I was generating images, Tina casually remarked, "Well, let's see if you can make images like me". I tried a few things, face swap and control nets. It was ok but she was very unimpressed. So I left it and moved on with other boring things like RAG, Text Classification, watching more AI YouTube content until a few weeks ago.
Fine Tuning
When I started to hear about Flux.1, I tried it and I was very impressed. With Flux.1 I was able to prompt engineer to generate images without using complicated ComfyUI workflows. I tried to mimic my photos by using text-to-image and image-to-image. After watching Matt Wolfe's fine-tuning video of Flux, and also seeing Levelsio date night Flux X/Twitter posts I tried fine tuning it. The ability to fine tune meant I could use Tina's photos to train the AI to generate images to her likeness. I gathered 23 photos of Tina and started to fine tune using Low-rank adaptation (LoRA) with a Flux-dev model at replicate.com. Most of the images were just fine, but there were a few images that were very impressive. Tina was starting to get impressed, progress! She posted it on FB, some of her close friends and even my mom didn't realize it wasn't her until they read my comments.
Data Preparation
There were issues, specifically when I generated full-body images. Tina, usually nearby would see some of the images, and she would comment on images that didn’t resemble her. Some comments were hilariously blunt and not politically correct to the point that I wouldn't share them. I realized I needed better Data Preparation to generate images of Tina in varying compositions true to her likeness. From my initial 23 photos, I increased it to 57 photos. Not only did I increase the quantity, but I also got a better variety of poses and sizes. So I gave it another try and started training a version 2. Version 2 generated better images of her, including the likeness of her body, expression and poses. Tina is a frustrated fashion model, she had aspired to be one but never went for it. So one of the things she wanted to see was what she would look like as a fashion model. Not all the images were great, but some were very impressive, and it was plausible it was her. The model also picked up some inspiration from the clothes she wore and her poses, not just her facial features.
Hyper Parameters
Once Tina and I found an image with a composition we liked, I experimented on the hyper parameters. To get the image closer to what Tina would look like, I took note of the random seed of a particular image, and used the seed to give it a consistent image generation. I played around with: inference steps, guidance scale and lora scale to tweak the image to our liking. I found that increasing inference steps made usually the face closer to Tina's features. Changing guidance scale makes it closer to the prompts, while the Lora scale makes it closer to Tina's general looks. I don't think you should just max out the parameters, as that usually creates worse images. I usually put back the default or slightly higher settings and unset the random seed when generating the next set of images. I would also lower them sometimes especially the guidance scale if I accept some flexibility.
NOTE: Hyper parameters are more complicated than what I currently understand them to be. Please comment below your experience or deeper explanation of hyper parameters.
Over Fitting
One of the things in ML that can be easily overlooked is over fitting. Fine tuning the model also had the effect of skewing the images towards everyone looking like Tina. In practical terms, the model generates a world filled with Tinas, so everyone will look like her. Over fitting for image generation can be fun, but models that have bigger implication or impact overfitting can be a serious issue.
NOTE: Does anybody have a solution for this overfitting issue? I’d like to train using LoRA with more than one subject/style.
Closing Remarks
Tina is slightly impressed, but I have more work to do. What she really wants is to clone herself. I need Artificial General Intelligence (AGI) to be release, when is it coming? I need it now, so I can fine-tune Tina on it. Although, I think the safety team will likely not allow this fine-tuning to happen - for the sake of the world! :)
Fun Images
Control Nets in ComfyUI
I searched for the simplest images I had of Tina for the control net to have the best success. Here is Tina standing on the Te Paki sand dunes, wearing a plain dress with not much detail in the background. Meh results.
Fine Tuning using initial dataset
Very impressive half-body images, as I had several on my initial training photos. I also made it look as if I took them with my camera in low aperture and low light scenario. The low light also helped set the mood and hide some telltale signs of AI image generation.
Fine Tuning with better prepared data and tweaking hyper parameters
Super impressive; it even gets Tina's body shape and poses. With some tweaking of the parameters we made some images closer to her likeness.
Overfitting
Each version of Tina (member of the band) had her pose. The peace/V fingers. The hand on the hips. The side pose. Every suspect in the fine tuned world will look like Tina. Can the real Tina please step forward?
Dataset for training
The images that were used to train. This is where the model got Tina's facial features, poses and expression.
Top comments (0)