3 hours ago · Tech · 0 comments

I’ve been running Ollama on my Mac Studio for local AI experiments. I followed advice to try oMLX instead and it’s ludicrously faster, like maybe 5-10x for both time to first token and completing the response. I haven’t benchmarked it, but it subjectively feels like when I replaced a hard drive with an SSD.

No comments yet. Log in to reply on the Fediverse. Comments will appear here.