Check out localllama community. Lots of info there.
I use oobabooga + exllama.
Things are a bit budget dependent. If you can afford a rtx 3090 off ebay you can run some decent models (30B) at very good speed. I ended up with 3090 + 4090. You can use system ram with ggml but it’s slow. Mac M1 is not bad for this .
Check out localllama community. Lots of info there.
I use oobabooga + exllama.
Things are a bit budget dependent. If you can afford a rtx 3090 off ebay you can run some decent models (30B) at very good speed. I ended up with 3090 + 4090. You can use system ram with ggml but it’s slow. Mac M1 is not bad for this .
Where did you get the reddit dataset?