I know we are moving away from Reddit. However, if I don't link, I feel like we may miss out good threads on r/machinelearning. Moreover, the authors don't only post arxiv links, they post other sutff such as Summary, Key points, ... (e.g this).
So can I at least put them in the posts instead of posting in a comment?
The idea is similar to BLIP-2. Both papers use learnable tokens as queries for a transformer decoder. This decoder query from vision space base on the trainable queries and prompt.
I also want to share some resources.
For Pytorch,
For TPU,
@KingsmanVince
@kbin.social