!gai
@sopuli.xyzcross-posted from: https://lemmy.world/post/17926715
y2u.be/aVvkUuskmLY
Llama 3.1 (405b) seems 👍. It and Claude 3.5 sonnet are my go-to large language models. I use chat.lmsys.org. Openai may be scrambling now to release Chatgpt 5?
https://y2u.be/sDIi95CqTiM
The new Siri vs the RabbitR1 and Humane pinRabbit R1: https://youtu.be/ddTV12hErTc?si=tLR_GSXyRFtpgpJbHumane AI pin: https://youtu.be/TitZV6k8zfA?si=vI4mZMhN...
Hey, so first off, this is my first time dabbling with LLMs and most of the information I found myself by rummaging through githubs.
I have a fairly modest set-up, an older gaming laptop with a RTX3060 video card with 6 GB VRAM. I run inside WSL2.
I have had some success running fastchat with the vicuna 7B model, but it's extremely slow, at roughly 1 word every 2-3 seconds output, with --load-8bit, lest I get a CUDA OOM error. Starts faster at 1-2 words per second but slows to a crawl later on (I suspect it's because it also uses a bit of the 'Shared video RAM' according to the task manager). So I heard about quantization which is supposed to compress models at the cost of some accuracy. Tried ready-quantized models (compatible with the fastchat implementation) from hugginface.co, but I ran into an issue - whenever I'd ask something, the output would be repeated quite a lot. Say I'd say 'hello' and I'd get 200 'Hello!' in response. Tried quantizing a model myself with exllamav2 (using some .parquet wikitext files also from hugginface for calibration) and then using fastchat but the problem persists. Endless repeated output. It does work faster, though at the actual generation, so at least that part is going well.
Any ideas on what I'm doing wrong?
https://twitter.com/LakshyAAAgrawal/status/1671498941009997828
@gai Adobe Firefly cannibalizes stock photo market for creators https://venturebeat.com/ai/adobe-stock-creators-arent-happy-with-firefly-the-companys-commercially-safe-gen-ai-tool/
With minimal tweaking, just giving relatively simple prompts to these, would you say one is measurably better than the other? in what ways? or is it more of a subjective judgement.
A nice fork from a main dev: https://github.com/henk717/KoboldAI
Main release: https://github.com/KoboldAI/KoboldAI-Client
Thoughts? Ideas? How do we align these systems, some food for thought; when we have these systems do chain of reasoning or various methods of logically going through problems and coming to conclusions we've found that they are telling "lies" about their method, they follow no logic even if their stated logic is coherent and makes sense.
Here's the study I'm poorly explaining, read that instead. https://arxiv.org/abs/2305.04388