LLaMA Now Goes Faster on CPUs
Open link in next tab
LLaMA Now Goes Faster on CPUs
https://justine.lol/matmul/
I wrote 84 new matmul kernels to improve llamafile CPU performance.
My kernels go 2x faster than MKL for matrices that fit in L2 cache, which makes them a work in progress, since the speedup works best for prompts having fewer than 1,000 tokens.