@Kerfuffle
@sh.itjust.worksI recently ran into an issue where I wanted to use Any
for slices. However, it only allows 'static
types (based on what I read, this is because you get the same TypeId
regardless of lifetimes).
I came up with this workaround which I think is safe:
use std::{
any::{Any, TypeId},
marker::PhantomData,
};
#[derive(Clone, Debug)]
pub struct AnySlice<'a> {
tid: TypeId,
len: usize,
ptr: *const (),
marker: PhantomData<&'a ()>,
}
impl<'a> AnySlice<'a> {
pub fn from_slice(s: &'a [T]) -> Self {
Self {
len: s.len(),
ptr: s.as_ptr() as *const (),
tid: TypeId::of::(),
marker: PhantomData,
}
}
pub fn as_slice(&self) -> Option<&'a [T]> {
if TypeId::of::() != self.tid {
return None;
}
Some(unsafe { std::slice::from_raw_parts(self.ptr as *const T, self.len) })
}
pub fn is(&self) -> bool {
TypeId::of::() == self.tid
}
}
edit: Unfortunately it seems like Lemmy insists on mangling the code block. See the playground link below.
T: Any
ensures T
is also 'static
. The lifetime is preserved with PhantomData
. Here's a playground link with some simple tests and a mut version: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=3116a404c28317c46dbba6ed6824c8a9
It seems to pass Miri, including the mut version (which requires a bit more care to ensure there can only be one mutable reference). Any problems with doing this?
Even though green coffee beans tend to be heavier due to the higher water content, generally it's cheaper to roast your own compared to buying them pre-roasted.
You can roast the same beans at different levels to get some variety without having to go out and buy a new batch.
It's kind of fun and a decent conversation topic.
Don't be scared by how long this post is. It basically just comes down to spread beans on a cookie sheet, put in preheated oven, wait around 12-15 minutes and then take them out and cool them.
Since we're talking about roasting beans, naturally you're going to need a grinder to actually use them.
The process will create some smoke, even with a light roast. Basically, darker roast, more smoke. So far I've mainly done pretty light roasts and even though my kitchen doesn't have much ventilation (and my oven doesn't have fancy modern contraptions like, you know, a light or a fan) it hasn't been an issue.
Your oven should be reasonably clean if you don't want the roasted coffee to taste like random stuff.
If you're a super coffee snob and it has to be perfect, this may not be for you. It's pretty easy, but odds are the first few tries aren't going to be perfect especially if you like darker roasts.
You're going to want something like a large metal mixing bowl and colander for the cooling process. My colander is plastic, so you can probably get away with that if you don't put the red hot beans in it directly out of the oven.
You'll also probably need access to an outside area where bits of coffee chaff blowing around aren't going to bother people. I don't think there's really an easy way to deal with coffee chaff indoors.
By the way, don't try to grind green coffee beans in a normal grinder. They are insanely, and I mean insanely hard and tough. You'll destroy your grinder unless it is an absolute tank. (I'd say it's also not really worth trying, green coffee didn't taste very good to me.)
Here's the process:
One thing to note is you don't want to actually grind/use the beans for at least 12 hours. It might seem unintuitive, but from what I've read as freshly roasted as possible isn't necessarily best. Depending on the beans/roast level, the coffee might reach its optimal tastiness even a couple weeks after roasting.
I'm far from an expert, but feel free to ask questions in the comments if you want. I can recommend a grinder/beans to get started with if anyone needs information like that.
This subject is kind of niche, but hey... It's new content of some kind at least! Also just want to be up front: These projects may have reached the point of usefulness (in some cases) but they're also definitely not production ready.
GGML is the machine learning library that makes llama.cpp
work. If you're interested in LLMs, you've probably already heard of llama.cpp
by now. If not, this one is probably irrelevant to you!
ggml-sys-bleedingedge
is a set of low level bindings to GGML which are automatically generated periodically. Theoretically it also supports stuff like CUDA, OpenCL, Metal via feature flags but this is not really tested.
Repo: https://github.com/KerfuffleV2/ggml-sys-bleedingedge
Crate: https://crates.io/crates/ggml-sys-bleedingedge
You may or may not already know this: When you evaluate an LLM, you don't get any specific answer back. LLMs have a list of tokens they understand which is referred to as their "vocabulary". For LLaMA models, this is about 32,000 tokens. So once you're done evaluating the LLM, you get a list of ~32,000 f32
s out of it representing the probability for each token.
The naive approach of just picking the most probable token actually doesn't work that well ("greedy sampling") so there are various approaches to filtering, sorting and selecting tokens to produce better results.
Repo: https://github.com/KerfuffleV2/llm-samplers
Crate: https://crates.io/crates/llm-samplers
Higher level bindings built on the ggml-sys-bleedingedge
crate. Not too much to say about this one: if you want to use GGML in Rust, there aren't that many options and using low level bindings directly isn't all that pleasant.
I'm actually using this one in the next project, but it's very, very alpha.
Repo: https://github.com/KerfuffleV2/rusty-ggml
Crate: https://crates.io/crates/rusty-ggml
If you're interested in LLMs, most (maybe all) of the models you know about like LLaMA, ChatGPT, etc are based on the Transformer paradigm. RWKV is a different approach to building large language models: https://github.com/BlinkDL/RWKV-LM
This project started out "smol" as an attempt to teach myself about LLMs but I've gradually added features and backends. It's mostly useful as a learning aid/example of some of the other projects I made. In addition to being able to run inference using ndarray
(pretty slow) it now supports GGML as a backend and I'm in the process of adding llm-samplers
support.
Repo: https://github.com/KerfuffleV2/smolrsrwkv
Last (and possibly least) is repugnant-pickle
. As far as I know, it is the only Rust crate available that will let you deal with PyTorch files (which are basically zipped up Python pickles). smolrsrwkv
also uses this one to allow loading PyTorch RWKV models directly without having to convert them first.
If that's not enough of a description: Pickle is the default Python data serialization format. It was designed by crazy people, though: it is extremely difficult to interoperate with unless you're Python because it's basically a little stack based virtual machine and can call into Python classes. Existing Rust crates don't fully support it.
repugnant-pickle
takes the approach of best-effort scraping pickled data rather than trying to be 100% correct and can deal with weird pickle stuff that other crates throw their hands up at.
Repo: https://github.com/KerfuffleV2/repugnant-pickle
Crate: TBD