> For the complete documentation index, see [llms.txt](https://docs.viesus.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.viesus.com/operations/performance-tuning.md).

# Performance Tuning

This guide covers the key levers for maximizing VIESUS throughput across all on-premise interfaces (CLI, PDF Enhancer, Node.js module).

<button type="button" class="button secondary" data-action="ask" data-query="How do I improve VIESUS throughput for my hardware and workload? Ask me about my setup." data-icon="gitbook-assistant">Help me tune throughput</button>

***

## The three variables that matter most

1. **Worker / instances count** — how many images process simultaneously
2. **Configuration (viesusini.json)** — which features are active; AI features are significantly slower
3. **Image size** — larger images take proportionally longer

Everything else (disk speed, CPU model, RAM bandwidth) is secondary on modern hardware.

***

## CPU: Instances count

### CLI

The CLI is single-threaded per invocation — it processes one image at a time. To parallelize, run multiple CLI instances in parallel with separate image lists:

```bash
# Split image list into 16 parts, run all in parallel
split -n l/16 images.lst /tmp/batch_
for f in /tmp/batch_*; do
    viesus -g "$GUID" -l "$f" -s -p config.json &
done
wait
```

### Node.js module

Set `UV_THREADPOOL_SIZE` to match physical CPU cores. Hyperthreading provides marginal benefit for VIESUS's compute workload:

```bash
export UV_THREADPOOL_SIZE=$(nproc --physical)
node server.js
```

***

## GPU: one worker per GPU

GPU processing requires one process/worker per GPU. Multiple processes sharing a GPU cause VRAM contention:

**CLI:**

```bash
CUDA_VISIBLE_DEVICES=0 viesus -g "$GUID" -l batch1.lst -s -p config.json &
CUDA_VISIBLE_DEVICES=1 viesus -g "$GUID" -l batch2.lst -s -p config.json &
wait
```

**Node.js:**

```js
const nGPUs = 2;
const pool = new StaticPool({
  size: nGPUs,
  task: './worker.js',
  workerData: process.env.VIESUS_GUID,
});
```

***

## Configuration impact on throughput

Features in `viesusini.json` have very different costs:

| Feature                                | CPU cost  | GPU required | Notes                               |
| -------------------------------------- | --------- | ------------ | ----------------------------------- |
| Base enhancement                       | Low       | No           | Always active                       |
| Noise reduction                        | Low       | No           |                                     |
| Face detection                         | Low       | No           | Adds overhead only when faces found |
| JPEG artifact removal (`ARmode: 0`)    | Low       | No           | Classical detection                 |
| JPEG artifact removal (`ARmode: 1`)    | High      | Yes          | AI detection and removal            |
| Background handling (`BGmode: 1`)      | High      | Yes          | AI segmentation                     |
| Classical resize (`ResizeMode` 0–4, 6) | Medium    | No           | CPU interpolation, no AI upscaling  |
| AI Upscaling (`ResizeMode` 5, 7–12)    | Very high | Yes          | 5–50× slower than classical         |

**Profile before optimizing:** run a sample batch with `WriteResultFiles: 1` and measure actual per-image times. Don't disable features without knowing their actual cost.

***

## Quality vs. speed

The `ResizeMode` you choose trades quality against throughput:

| Scenario                           | Recommended `ResizeMode` | Notes                                              |
| ---------------------------------- | ------------------------ | -------------------------------------------------- |
| Highest quality, time not critical | `5` (SR ×4 quality)      | Best results                                       |
| Production batch — balanced        | `7` (SR ×2 / ×4 auto)    | Auto-selects 2× or 4× based on resize factor       |
| Maximum throughput                 | `9` (SR ×4 fast)         | \~2–3× faster than mode 5, small quality trade-off |
| CPU-only (no GPU)                  | `6` (classical, no SR)   | Avoids slow GPU-dependent models                   |

***

## Hardware recommendations

| Use case                     | Recommended hardware              | Why                                                                                |
| ---------------------------- | --------------------------------- | ---------------------------------------------------------------------------------- |
| AI Upscaling (production)    | Nvidia RTX A4000 / A5000 or newer | 16+ GB VRAM handles Extra Large images; current architecture for model support     |
| AI Upscaling (development)   | Nvidia RTX 3060 / 4060 or newer   | 12 GB VRAM is enough for Small–Large images; good iteration speed                  |
| Traditional enhancement only | CPU (8+ cores)                    | Run multiple parallel instances to use all cores; no GPU required                  |
| Mixed AI + traditional       | Single GPU + CPU                  | AI features use the GPU; the traditional pipeline runs on CPU — they don't contend |

***

## Memory sizing

Plan system RAM as: `workers × memory_per_worker + OS overhead`

| Configuration              | RAM per worker        |
| -------------------------- | --------------------- |
| CPU, base enhancement only | 200–500 MB            |
| CPU, with AI features      | 1–2 GB                |
| GPU worker                 | 500 MB RAM + GPU VRAM |

NVIDIA GPU VRAM requirements:

* AI upscaling: \~4–6 GB per instance
* Background Handling: \~2–4 GB per instance
* Combined AI features: \~6–8 GB per instance; requires ≥8 GB VRAM card

***

## Benchmarking methodology

Always benchmark with:

1. **Representative images** — same resolution, format, and quality mix as production
2. **Warm runs** — discard the first run (cold caches, lazy GPU init)
3. **Steady-state measurement** — measure throughput over 500+ images, not a handful
4. **All features enabled** — benchmark the configuration you'll run in production

Measure both images/second and seconds/image. The former measures throughput; the latter measures user-facing latency.

```bash
# Simple throughput benchmark
START=$(date +%s%N)
viesus -g "$GUID" -l 1000_images.lst -s -p config.json
END=$(date +%s%N)
ELAPSED=$(( (END - START) / 1000000 ))  # ms
echo "1000 images in ${ELAPSED}ms = $(echo "scale=2; 1000000/$ELAPSED" | bc) img/sec"
```

***

## Reference benchmarks

Measured throughput figures live on the [Benchmarks](/operations/benchmarks.md) page.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.viesus.com/operations/performance-tuning.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
