How usable are AMD GPUs?

RandomLegend [He/Him]@lemmy.dbzer0.com · edit-2 1 year ago

How usable are AMD GPUs?

rufus@discuss.tchncs.de · edit-2 1 year ago

You might also want to have a look at Koboldcpp or llama.cpp performance with ROCm. The LLMs seem mainly to be constrained by memory bandwith anyways. And not raw compute performance.

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

Will do. Ty

Rogers@lemmy.ml · 1 year ago

Wondering the same thing. All the ram in the 790xtx looks nice

ɐɥO@lemmy.ohaa.xyz · 1 year ago

Been using a rx 6600xt for a year now and I never had a single issue with drivers. used rocm once and it worked without issues

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

I’m not asking about drivers or such. I’m asking about performance specifically in Oobabooga and/or StableDiffusion.

Have you done anything with those?

ɐɥO@lemmy.ohaa.xyz · 1 year ago

used stable diffusion and it took like 6 seconds to generate a 512x512px image

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

That’s sadly not that descriptive. It depends on your iterations … Can you tell me how many it/s you get with which sampler you use? That would make it much better comparable for me

ɐɥO@lemmy.ohaa.xyz · edit-2 1 year ago

made this video a while ago. shoud contain all relevant infos

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

Thanks

I was hoping to see a console output that shows me the iterations per second in dependence of the specific sampler. But I guess that suffices

Thanks again!

ɐɥO@lemmy.ohaa.xyz · 1 year ago

can send you a video with my terminal once I get home

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

That’d be awesome. No hurries though

EddyBot@feddit.de · 1 year ago

You probably want to use the amd driver out of the box on your linux distro + ROCm instead of whatever AMD gives you as a driver download on their landing page

Gaming wise the AMD card would win in rasterization performance but PyTorch is made for CUDA (Nvidia only) first instead of OpenCL/HID (which AMD uses)
I couldn’t get my AMD card to run reliable in half-precision (16fp) which actually hurts performance A LOT in comparison to no-half or 32fp

interestingly enough setting up AMD cards on Linux with ROCm is actually easier compared to Windows

anyway my experience is mostly stable difussion and some early gpt4all stuff but oobabooga uses PyTorch too so its probably similar

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

I’ve had AMD cards my whole life and only switched to NVidia 3 years ago where that whole local LLM and ImageAI thing wasn’t even on the table…now i am just pissed that NVidia gives us so little VRAM to play with unless you pay the same price as used car -.-

AMD drivers are available from within the kernel so yeah, i won’t do any downloading for AMD drivers on Linux^^

Oobabooga and Automatic1111 are my main questions - i could actually live with a downgrade in terms of performance if i then atleast can run the bigger models due to having way more VRAM. Can’t even run 17b models on my current 8GB VRAM card…can’t even make 1024x1024 images on Auto1111 without getting Issues aswell. If i can do those things but a bit slower, thats fine for me^^

micheal65536@lemmy.micheal65536.duckdns.org · 1 year ago

What sort of issues are you getting trying to generate 1024x1024 images in Stable Diffusion? I’ve generated up to 1536x1024 without issue on a 1070 (although it takes a few minutes) and could probably go even larger (this was in img2img mode which uses more VRAM as well - although at that size you usually won’t get good results with txt2img anyway). What model are you using?

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

That’s outside the scope of this post and not the goal of it.

I don’t want to start troubleshooting my NVidia stable diffusion setup in a LLM post about AMD :D thanks for trying to help but this isn’t the right place to do that

micheal65536@lemmy.micheal65536.duckdns.org · 1 year ago

Fair enough but if your baseline for comparison is wrong then you can’t make good assessments of the capabilities of different GPUs. And it’s possible that you don’t actually need a new GPU/more VRAM anyway, if your goal is to generate 1024x1024 in Stable Diffusion and run a 13B LLM both of which I can do with 8 GB of VRAM.

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

This is correct, yes. But I want a new GPU because I want to get away from NVidia…

i CAN use 13b models and I can create 1024x1024 but not without issues, not without making sure nothing else uses VRAM and I run out of memory quite often.

I want to make it more stable. And open the door to use bigger models or make bigger images

micheal65536@lemmy.micheal65536.duckdns.org · 1 year ago

Yes, that makes more sense. I was concerned initially that you were looking to buy a new GPU with more VRAM for the sole reason of being unable to do something that you should already be able to do, and that this would be an unnecessary spend of money and/or not actually fix the problem, that you would be somewhat mad at yourself if you found out afterwards that “oh, I just needed to change this setting”.

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

thanks for the concern but no worries, i did my fair share of optimization for my config and i believe i got everything out of it… i will 100% switch to AMD so my question basically just aims at: Can i sell my 3070 or do i have to keep it and put into a “server” on which i can run StableDiffusion and oobabooga because AMD is still too wonky for that…

That’s all. My decision is not depending on whether this AI stuff works, but it just accelerates it if AMD can run this, because i can sell my old card to get the money quicker.

EddyBot@feddit.de · 1 year ago

I only ever used 7b large language models on my RX 6950 XT but PyTorch had or still has some nasty AMD VRAM bugs which didn’t fully utilized all of my VRAM (more like only a quarter of it)

it seems the sad truth is high performance/training of models are just not good on AMD cards as of now

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

Interesting

Do you only use LLMs or also stable diffusion ?

huskypenguin@sh.itjust.works · 1 year ago

If you don’t need cuda or ai, the 7900 is great.

turbodrooler@lemmy.world · 1 year ago

You can run CUDA apps on ROCm HIP. It’s easy.

huskypenguin@sh.itjust.works · 1 year ago

Whoa need to me. I’ll have to dig in on that.

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

Well that’s the question…

What you mean with “not needing ai”? I mean oobabooga and stable diffusion have AMD installers, and that’s exactly what I am asking about. Therefore I post in community…

To find out how good those AIs run on AMD

huskypenguin@sh.itjust.works · 1 year ago

Oops. I wasn’t looking at the community just my main feed. Ok, so from what I understand the amd installer is a bit of a pain on Linux. If you’re on windows it’s probably a different story.

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

I am on Linux, but I can live with a painful install. I wanted to hear if it performs on par with nvidia

huskypenguin@sh.itjust.works · edit-2 1 year ago

Again. Apologies for the confusion. I had thought my initial comment was on a gaming community. Here is puget systems benchmarks and they don’t look great - https://www.pugetsystems.com/labs/articles/stable-diffusion-performance-nvidia-geforce-vs-amd-radeon/#Automatic_1111

“Although this is our first look at Stable Diffusion performance, what is most striking is the disparity in performance between various implementations of Stable Diffusion: up to 11 times the iterations per second for some GPUs. NVIDIA offered the highest performance on Automatic 1111, while AMD had the best results on SHARK, and the highest-end GPU on their respective implementations had relatively similar performance.”

turbodrooler@lemmy.world · 1 year ago

Sorry, not trying to come at you, but I’m just trying to provide a bit of fact checking. In this link, they tested on Windows which would have to be using DirectML which is super slow. Did Linus Tech Tips do this? Anyway, the cool kids use ROCm on Linux. Much, much faster.

huskypenguin@sh.itjust.works · 1 year ago

Haha, you’re not, I definitely stumbled into this. These guys mainly build edit systems for post companies, so they stick to windows. Good to know about ROCm, got something to read up on.

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

Yeah that was what i was worried about after reading the article; I’ve heard about the different backends…

Do you have AMD + Linux + Auto111 / Ooobabooga? Can you give me some real-life feedback? :D

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

No worries

Interesting article Never heard about SHARK, seems interesting then

turbodrooler@lemmy.world · edit-2 1 year ago

Using a 6800xt on Linux for several months and I’m super happy with it. There hasn’t been anything I haven’t been able to do. AMA.

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

what models are you using and how many iterations /s do you get on average with them?

Do you also use StableDiffusion (Auto1111)? If yes, same question as above for that^^

turbodrooler@lemmy.world · 1 year ago

I use a ton of different ones. I can test specific models if you like.

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

The good ol’ anything v3 and DPM Karras 2m+

that would give me a good baseline. Thanks! :)

turbodrooler@lemmy.world · 1 year ago

Does the resolution or steps or anything else matter?

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

512x512 and 1024x1024 would be interesting

and 50 steps

That’d be awesome!

turbodrooler@lemmy.world · edit-2 1 year ago

I ran these last night, but didn’t have the correct VAE, so I’m not sure if that affects anything. 512x512 was about 7.5it/s. 1024x1024 was about 1.3s/it (iirc). I used somebody else’s prompt which used loras and embeddings, so I’m not sure how that affects things either. I’m not a professional benchmarker so consider these numbers anecdotal at best. Hope that helps.

Edit: formatting

RandomLegend [He/Him]@lemmy.dbzer0.com · 1 year ago

7.5it/s for 512x512 is what i was looking for! On par (actually even faster than my 3070) with NVidia!

Thank you very much! And how / what exactly did you use to install?

abrasiveteapot@sh.itjust.works · 1 year ago

deleted by creator