Mousehub

noneabove1182 commented on I'm having a fantastic time with this model. • •

Yeah this seems less focused on creativity, there's a lot of really good models out there tuned for story telling that will far exceed generalized SoTA models

noneabove1182 commented on Unsloth: 80% faster 50% less memory LLM finetuning • •

Better finetuning is such an important factor, i feel like the future is all of us having our own personal tunes for models that work well with our lives, and iterating for learning more basically every day is also really helpful, so the more barriers we can take down the better!

noneabove1182 commented on I'm having a fantastic time with this model. • •

Hmm had interesting results from both of those base models, haven't tried the combo yet, will start some exllamav2 quants to test

What's it doing well at?

quant link for anyone who may want: https://huggingface.co/bartowski/OpenHermes-2.5-neural-chat-7b-v3-1-7B-exl2

noneabove1182 commented on Orca 2: Teaching Small Language Models How to Reason • •

Btw there's a 16k tune available now:

https://huggingface.co/bartowski/Orca-2-13B-16k-exl2

noneabove1182 commented on Beginner questions thread • •

I use text-generation-webui mostly. If you're only using GGUF files (llama.cpp), koboldcpp is a really good option

A lot of it is the automatic prompt formatting, there's probably like 5-10 specific formats that are used, and using the right one for your model is very important to achieve optimal output. TheBloke usually lists the prompt format in his model card which is handy

Rope and yarn refer to extending the default context of a model through hacky (but functional) methods and probably deserve their own write up

noneabove1182 commented on Beginner questions thread • •

Yeah so those are mixed, definitely not putting each individual weight to 2 bits because as you said that's very small, i don't even think it averages out to 2 bits but more like 2.56

You can read some details here on bits per weight: https://huggingface.co/TheBloke/LLaMa-30B-GGML/blob/8c7fb5fb46c53d98ee377f841419f1033a32301d/README.md#explanation-of-the-new-k-quant-methods

Unfortunately this is not the whole story either, as they get further combined with other bits per weight, like q2_k is Q4_K for some of the weights and Q2_K for others, resulting in more like 2.8 bits per weight

Generally speaking you'll want to use Q4_K_M unless going smaller really benefits you (like you can fit the full thing on GPU)

Also, the bigger the model you have (70B vs 7B) the lower you can go on quantization bits before it degrades to complete garbage

noneabove1182 commented on Beginner questions thread • •

If you're using llama.cpp chances are you're already using a quantized model, if not then yes you should be. Unfortunately without crazy fast ram you're basically limited to 7B models if you want any amount of speed (5-10 tokens/s)

noneabove1182 commented on [mini pc] MINISFORUM EM680: The Smallest and Most Powerful High-Performance Ultra Mini PC ($469 - $70 = $399) [MINISFORUM Store] • •

Great little machine on paper, was hoping for a bit more out for the RAM but sadly it's really low throughput due to a low bit width.. CPU is quite impressive though as is the cooling

noneabove1182 commented on Orca 2: Teaching Small Language Models How to Reason • •

I'm looking forward to trying it today, I think this might make a good RAG model based on the orca 2 paper, but testing will be needed

noneabove1182 commented on Orca 2: Teaching Small Language Models How to Reason • •

according to the config it looks like it's only 4096, and they specify in the arxiv that they kept the training data under that value so it must be 4096.. i'm sure people will expand it soon like they have with others

Last page Next page

noneabove1182

noneabove1182 commented on I'm having a fantastic time with this model. • localllama •

noneabove1182 commented on Unsloth: 80% faster 50% less memory LLM finetuning • localllama •

noneabove1182 commented on I'm having a fantastic time with this model. • localllama •

noneabove1182 commented on Orca 2: Teaching Small Language Models&nbsp;How to Reason • localllama •

noneabove1182 commented on Beginner questions thread • localllama •

noneabove1182 commented on Beginner questions thread • localllama •

noneabove1182 commented on Beginner questions thread • localllama •

noneabove1182 commented on [mini pc] MINISFORUM EM680: The Smallest and Most Powerful High-Performance Ultra Mini PC ($469 - $70 = $399) [MINISFORUM Store] • bapcsalescanada •

noneabove1182 commented on Orca 2: Teaching Small Language Models&nbsp;How to Reason • localllama •

noneabove1182 commented on Orca 2: Teaching Small Language Models&nbsp;How to Reason • localllama •

noneabove1182 commented on I'm having a fantastic time with this model. • localllama •

noneabove1182 commented on Unsloth: 80% faster 50% less memory LLM finetuning • localllama •

noneabove1182 commented on I'm having a fantastic time with this model. • localllama •

noneabove1182 commented on Orca 2: Teaching Small Language Models&nbsp;How to Reason • localllama •

noneabove1182 commented on Beginner questions thread • localllama •

noneabove1182 commented on Beginner questions thread • localllama •

noneabove1182 commented on Beginner questions thread • localllama •

noneabove1182 commented on [mini pc] MINISFORUM EM680: The Smallest and Most Powerful High-Performance Ultra Mini PC ($469 - $70 = $399) [MINISFORUM Store] • bapcsalescanada •

noneabove1182 commented on Orca 2: Teaching Small Language Models&nbsp;How to Reason • localllama •

noneabove1182 commented on Orca 2: Teaching Small Language Models&nbsp;How to Reason • localllama •

noneabove1182

noneabove1182 commented on I'm having a fantastic time with this model. • •

noneabove1182 commented on Unsloth: 80% faster 50% less memory LLM finetuning • •

noneabove1182 commented on I'm having a fantastic time with this model. • •

noneabove1182 commented on Orca 2: Teaching Small Language Models How to Reason • •

noneabove1182 commented on Beginner questions thread • •

noneabove1182 commented on Beginner questions thread • •

noneabove1182 commented on Beginner questions thread • •

noneabove1182 commented on [mini pc] MINISFORUM EM680: The Smallest and Most Powerful High-Performance Ultra Mini PC ($469 - $70 = $399) [MINISFORUM Store] • •

noneabove1182 commented on Orca 2: Teaching Small Language Models How to Reason • •

noneabove1182 commented on Orca 2: Teaching Small Language Models How to Reason • •

noneabove1182 commented on I'm having a fantastic time with this model. • •

noneabove1182 commented on Unsloth: 80% faster 50% less memory LLM finetuning • •

noneabove1182 commented on I'm having a fantastic time with this model. • •

noneabove1182 commented on Orca 2: Teaching Small Language Models How to Reason • •

noneabove1182 commented on Beginner questions thread • •

noneabove1182 commented on Beginner questions thread • •

noneabove1182 commented on Beginner questions thread • •

noneabove1182 commented on [mini pc] MINISFORUM EM680: The Smallest and Most Powerful High-Performance Ultra Mini PC ($469 - $70 = $399) [MINISFORUM Store] • •

noneabove1182 commented on Orca 2: Teaching Small Language Models How to Reason • •

noneabove1182 commented on Orca 2: Teaching Small Language Models How to Reason • •