Mousehub

KingsmanVince commented on Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models • •

KingsmanVince commented on Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing • •

I know we are moving away from Reddit. However, if I don't link, I feel like we may miss out good threads on r/machinelearning. Moreover, the authors don't only post arxiv links, they post other sutff such as Summary, Key points, ... (e.g this).

So can I at least put them in the posts instead of posting in a comment?

KingsmanVince commented on Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing • •

Reddit thread: https://www.reddit.com/r/MachineLearning/comments/14pq5mq/r_hardwiring_vit_patch_selectivity_into_cnns/

KingsmanVince commented on VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks • •

The idea is similar to BLIP-2. Both papers use learnable tokens as queries for a transformer decoder. This decoder query from vision space base on the trainable queries and prompt.

KingsmanVince commented on Machine Learning Beginner Info/Resources • •

I also want to share some resources.
For Pytorch,

https://pytorch.org/tutorials/ their basic tutorials are fundamental but some more advanced tutorials might be outdated.
https://www.learnpytorch.io/ the author guides mostly in computer vision but he gives the overview from research to production.

For TPU,

https://github.com/ayaka14732/tpu-starter full guideline using TPUs with Jax

KingsmanVince

KingsmanVince commented on Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models • machinelearning •

KingsmanVince commented on Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing • machinelearning •

KingsmanVince commented on Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing • machinelearning •

KingsmanVince commented on VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks • machinelearning •

KingsmanVince commented on Machine Learning Beginner Info/Resources • machinelearning •

KingsmanVince commented on Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models • machinelearning •

KingsmanVince commented on Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing • machinelearning •

KingsmanVince commented on Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing • machinelearning •

KingsmanVince commented on VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks • machinelearning •

KingsmanVince commented on Machine Learning Beginner Info/Resources • machinelearning •

KingsmanVince

KingsmanVince commented on Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models • •

KingsmanVince commented on Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing • •

KingsmanVince commented on Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing • •

KingsmanVince commented on VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks • •

KingsmanVince commented on Machine Learning Beginner Info/Resources • •

KingsmanVince commented on Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models • •

KingsmanVince commented on Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing • •

KingsmanVince commented on Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing • •

KingsmanVince commented on VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks • •

KingsmanVince commented on Machine Learning Beginner Info/Resources • •