Benchmarking DirectML on Windows vs Metal (MPS) on macOS for AMD GPU

I always wondered if the AMD 5500M GPU (Navi 14) in my Macbook Pro can avoid being entirely useless for deep learning. Back in 2020, not much solution exist besides PlaidML running over OpenCL (which is unsurprisingly very unstable). Fast forward to 2023 the software ecosystem is much more mature, and there are two user-friendly solutions for both Windows and MacOS. But which one has the better performance? I can’t find anything online, so I decide to find out myself.

(Of course, there’s a third, most “natural” solution: run ROCm in Linux. This GPU was not officially supported but it should work; I didn’t test it yet because it’s still nontrivial to boot Linux on the laptop. Might try this later.)

The benchmark uses sd-webui, model v1-5-pruned-emaonly.safetensors, with flag –med-vram (otherwise DirectML uses too much VRAM). Hopefully it’s directly comparable to other’s benchmark!

First setup: Windows 10, python3.10 + DirectML, using auto-installer via https://github.com/lshqqytiger/stable-diffusion-webui-directml
Result: 1.94s/it, or 38 seconds per image (20 steps).

The GPU hits 90°C before temperature throttling. However, the inference time seems to be the same regardless of hot or cold start.

Second setup: macOS 13.1, python3.10 + Metal MPS, using auto-installer via https://github.com/AUTOMATIC1111/stable-diffusion-webui
Result: 1.61s/it, or 32 seconds per image (20 steps).

The first few runs are quite slower, indicating some cache warmup issue (not observed in Windows). Otherwise, it is slightly faster.

Conclusion: both works fine, performance are close enough; so just choose the environment you like! DirectML might have VRAM overhead.


发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注

Powered by WordPress. Design: Supermodne.