Is there some resource to benchmark LLM performance on a hardware

As the title suggests, basically i have a few LLM models and wanted to see how they perform with different hardware (Cpus only instances, gpus - t4, v100, a100). Ideally it's to get an idea on the performance and overall price(vm hourly rate/ efficiency)

Currently I've written a script to calculate ms per token, ram usage(memory profiler), total time taken.

Wanted to check if there are better methods or tools. Thanks!

See all comments

Thanks. Does this also conduct compute benchmarks too? Looks like this is more focused on model accuracy (if I'm not wrong)

seems like, keep an eye, when i run across one I will post it, usually to the model's community.

sure, thank you!