Jump to content

AI Art Generation Handbook/FLUX/Fine-tuning

From Wikibooks, open books for an open world

This page is under construction and only majorly addresses prompting.

This page uses LoRas trained and generations produced on the site Weights for examples. On Weights, image generation is free for unlimited uses with a queue, and up to seven custom LoRas can be trained on-site per week for free. It should be noted that training on Weights is intended to be a beginner-friendly process, with not many technical specifications, and the limit for images in the dataset is 30. LoRas trained on Weights have their datasets public, which can allow for inspection of how they were trained.

Flux is great at replicating styles on its own when being fine-tuned. The following image was generated with the "Hazbin Hotel Style Animation" LoRa by worst pres, which was trained on screenshots from the animated series Hazbin Hotel and is among the most popular LoRas on Weights. The images in the dataset are low-quality and have vague descriptions, such as "Anthropomorphic character with pink hair and large smile, sitting at a desk." for the image of the character Alastor. However, the dataset's low-quality makes the LoRa less reliable, as evidenced by the artifacts in the below example and many images in gallery on the LoRa's page having more severe ones.


Prompt: "Javan rhinoceroses wearing a business suit and walking in the city of Jakarta, Indonesia; Hazbin Hotel animation style"

In spite of the fine-tuning, Flux often requires additional prompting for intended output, even for styles. The following image was generated with the "The Murder of Sonic the Hedgehog style" LoRa by PrincessPandaAI, which was trained on sprites from the official Sonic the Hedgehog visual novel The Murder of Sonic the Hedgehog. By specifying the LoRa's trigger word, "tmosth style", in the prompt, the image was able to achieve the style's black outlines, black solid shading, and texture, it presents cel and soft shading using highlights and shadows, while the original style aimed for flat colors.


Prompt: "Javan rhinoceroses wearing a business suit; tmosth style"

The following image was generated with the same LoRa, but with prompting for flat colors and black solid shading. It is now almost properly in the The Murder of Sonic the Hedgehog style, just missing the visible texture and the coloring's effect of going beyond the lines (the dataset's descriptions do not specify this effect). The rhino character even slightly resembles a Sonic character.


Prompt: "Javan rhinoceroses wearing a business suit; tmosth style, flat colors, black cel shading"

When Flux is replicating a style, it seeks to replicate the overall look presented in the dataset images, such as the appearance of the line art and coloring. It is important that if a fine-tuned model is supposed to mimic a character design style, its dataset focuses on the characters. For example, the following image was generated by a LoRa mostly trained on late children's author Richard Scarry's artwork of scenes. The image manages to mimic not only the lineart and watercolor painting style, but the colorfulness present in Scarry's work.


Prompt: "Richard Scarry style, ink and watercolor illustration, storybook illustration, thin lines, cityscape of Jakarta, Indonesia at sunset"

The following image used the same LoRa and attempted to replicate Scarry's trademark anthropomorphic animal character design. However, no matter how much the prompt tried to specify with traits of the character design, the attempt failed.


Prompt: "Richard Scarry style, ink and watercolor illustration, storybook illustration, thin lines, anthropomorphic Javan rhinoceroses wearing a business suit, short, thick, somewhat pear-shaped body, a large, expressive head relative to his body size, short, stubby limbs, small, round eyes with white sclera and black dots for the pupils, prominent rhino snout, barefoot, short, hooved feet"

On the other hand, the following image was generated with a LoRa trained on images focusing on animal characters from Scarry's books. The prompt still specifies the traits; Flux requires more prompting than styles for character replication, especially for a character that it does not know at all.


Prompt: "Richard Scarry-style anthropomorphic animal character, Javan rhinoceroses wearing a business suit, large size, thick, somewhat pear-shaped body, short torso relative to body size, a large, expressive head relative to his body size, short, stubby limbs, small, round eyes with white sclera and black dots for the pupils, prominent rhino snout, hands are rhino hooves, barefoot, short, hooved feet"