Stable Diffusion LoRA Dataset Guide: How Many Images, What Quality, and What to Avoid
Most failed LoRA training runs are not a settings problem. They are a dataset problem that no amount of tuning will fix.
I have trained a lot of LoRAs — style LoRAs, character LoRAs, object LoRAs — and the single biggest predictor of a good result is whether the dataset was built carefully. This guide covers exactly that: what makes a good dataset, what tanks one, and the specific decisions you need to make before you run a single training step.
How Many Images Do You Actually Need?
The honest answer: fewer than you think, but only if they are good.
A character LoRA trained on 15–25 high-quality, diverse images will beat one trained on 150 scraped images that all show the same pose and background. Volume does not compensate for diversity.
Here are rough targets that work in practice:
| LoRA type | Minimum | Sweet spot | Diminishing returns after |
|---|---|---|---|
| Character | 15 | 25–40 | 60 |
| Style | 20 | 40–80 | 100 |
| Object/concept | 10 | 15–25 | 40 |
| Face (specific person) | 20 | 30–50 | 70 |
More images become useful only if they add genuinely new information — new lighting, new angle, new expression, new context. If image 30 looks like a slight variation of image 12, it is not helping.
Resolution and Aspect Ratios
Training resolution matters more than most guides admit.
For SDXL LoRAs: 1024×1024 is the standard. You can use mixed aspect ratios (768×1280, 1280×768, etc.) if your trainer supports bucketing.
For SD1.5 LoRAs: 512×512 or 768×768 for full-body shots.
The practical rule: your training images should be at or above the target resolution. Upscaling small images (say, 400px face crops to 1024px) introduces blur and softness that the model learns. The LoRA will generate soft results because it was trained on soft data.
If you only have lower-resolution source images, use a good upscaler (RealESRGAN, 4x-UltraSharp) before training — not after.
On cropping: Square crops are not mandatory. Most modern trainers support aspect ratio bucketing, which means you can feed in a portrait image at 768×1024 and the trainer will handle it correctly. Forcing everything into a square by squashing or adding black bars is worse than just using bucketed aspect ratios.
What Diversity Actually Means
"Diverse dataset" gets repeated like a mantra, but nobody says what it means in practice. Here is the breakdown:
Pose diversity — Do not let 80% of your images be front-facing standing shots. Include sitting, three-quarter view, looking away, hands in frame, close-up face, full body. Each distinct pose the model sees during training is a pose it can generate during inference.
Lighting diversity — Flat studio lighting, natural window light, warm indoor light, outdoor midday, shadows. A LoRA trained only on soft-lit images will fail when prompted with harsh shadows or backlit.
Background diversity — This matters especially for style and character LoRAs. If every image has a white background, the LoRA will associate your subject with white backgrounds. The model has not learned to separate the subject from its environment.
Expression/emotion diversity — For character or person LoRAs: neutral, smiling, serious, looking off-camera. A dataset of 30 identical "looking at viewer, smile" images will produce a monotonous LoRA.
Clothing and context diversity — For character LoRAs: vary outfits if the character wears different ones. For style LoRAs: vary subject matter (landscape, portrait, object) to ensure the model learns the style, not the content.
Images to Remove From Your Dataset
Bad images in the dataset are not just useless — they actively harm training. Remove:
Watermarked images. The model will learn to generate subtle watermark artifacts. They are surprisingly hard to get rid of once trained in.
Blurry or low-detail images. Any image where the key feature (the face, the specific style detail, the object edge) is soft or unresolved. The model cannot learn what it cannot see clearly.
Images with heavy compression artifacts. JPEG artifacts at low quality settings create a specific visual signature. If your training set is 50% heavily compressed JPEGs, your LoRA outputs will look compressed.
Images where the subject is too small. If you are training a character LoRA and the character occupies 10% of a landscape image, the model barely sees the character. Crop in closer, or drop the image.
Duplicates and near-duplicates. Running the same image (or one with trivial differences) multiple times skews the training distribution. Use a deduplication tool or just sort your images visually and remove runs of similar shots.
Images with other faces, if training a person LoRA. Group photos are trouble. The model will learn that the person appears near other people, and will often generate those people automatically. Crop or mask out other faces.
The Cropping Step Most People Skip
Raw images almost never frame the subject the way you want the model to learn it. Before tagging or training anything, go through every image and:
- Crop tightly enough that the subject fills at least 40–60% of the frame (for character/face LoRAs)
- Ensure the crop is at or above your training resolution
- Remove dead space — large empty sky or background areas that add no signal
For face LoRAs specifically: a close crop showing mainly the face is more valuable per image than a wide shot where the face is small. The model has limited capacity per training step. Make it count.
A Word on Image Quality vs Image Count
I would rather train on 15 exceptional images than 100 mediocre ones.
Exceptional means: sharp, well-lit, clearly showing the features you want to capture, varied in angle and context. If you cannot find 15 images that meet that bar, the problem is not your dataset size — it is your source material.
For original characters (OCs) or AI-generated training images: generate your dataset intentionally. Use a base model to generate 40 images with explicit variation in pose, lighting, and angle. Filter down to the best 20. Then tag and train. This gives you far more control than scraping random images.
Putting It Together: A Practical Pre-Training Checklist
Before you start a training run, go through this:
- Every image is at or above training resolution (no upscaling from tiny sources)
- No watermarks visible
- No heavy compression artifacts
- No duplicate or near-duplicate images
- Subject occupies a reasonable portion of each frame
- Pose variety: at least 4–5 distinct poses represented
- Lighting variety: at least 2–3 distinct lighting conditions
- Background variety: subject is not always on the same background
- Other faces removed from frame (for person/character LoRAs)
- Total image count is in the appropriate range for LoRA type
If your dataset passes this list, the training run has a genuine chance of producing a good result. If it does not, fix the dataset before touching your learning rate.
What Comes After the Dataset
Once your images are ready, the next step is tagging — assigning Danbooru-style text labels to each image so the model knows what it is looking at. I have a separate guide on how to tag images for LoRA training that covers WD14 tagger, trigger words, and tag blacklists in detail. I also built a free browser-based tagging tool that runs the tagger locally without needing Python or AUTOMATIC1111 installed.
Dataset quality and tag quality together determine 80% of your result. Get those right and the training parameters become much more forgiving.
Work with me
Need a senior web developer?
151 projects delivered. 5★ rating. UK & EU businesses. I build custom tools, AI automation, and business systems — one-time payment, you own the code.