Benchmarking DirectML on Windows vs Metal (MPS) on macOS for AMD GPU

I always wondered if the AMD 5500M GPU (Navi 14) in my Macbook Pro can avoid being entirely useless for deep learning. Back in 2020, not much solution exist besides PlaidML running over OpenCL (which is unsurprisingly very unstable). Fast forward to 2023 the software ecosystem is much more mature, and there are two user-friendly solutions for both Windows and MacOS. But which one has the better performance? I can’t find anything online, so I decide to find out myself.

(Of course, there’s a third, most “natural” solution: run ROCm in Linux. This GPU was not officially supported but it should work; I didn’t test it yet because it’s still nontrivial to boot Linux on the laptop. Might try this later.)

The benchmark uses sd-webui, model v1-5-pruned-emaonly.safetensors, with flag –med-vram (otherwise DirectML uses too much VRAM). Hopefully it’s directly comparable to other’s benchmark!

First setup: Windows 10, python3.10 + DirectML, using auto-installer via https://github.com/lshqqytiger/stable-diffusion-webui-directml
Result: 1.94s/it, or 38 seconds per image (20 steps).

The GPU hits 90°C before temperature throttling. However, the inference time seems to be the same regardless of hot or cold start.

Second setup: macOS 13.1, python3.10 + Metal MPS, using auto-installer via https://github.com/AUTOMATIC1111/stable-diffusion-webui
Result: 1.61s/it, or 32 seconds per image (20 steps).

The first few runs are quite slower, indicating some cache warmup issue (not observed in Windows). Otherwise, it is slightly faster.

Conclusion: both works fine, performance are close enough; so just choose the environment you like! DirectML might have VRAM overhead.

2023 年 09 月 19 日

AI, GPU

技术派

Benchmarking DirectML on Windows vs Metal (MPS) on macOS for AMD GPU

发表回复取消回复

标签

分类

查找文章

文章归档

最近评论

链接表

反抄袭宣言

Benchmarking DirectML on Windows vs Metal (MPS) on macOS for AMD GPU

发表回复 取消回复

标签

分类

查找文章

文章归档

最近评论

链接表

反抄袭宣言

发表回复取消回复