OpenOrca, an open-source dataset and series of instruct-tuned language models

Open link in next tab

OpenOrca

https://erichartford.com/openorca

Today I am announcing OpenOrca, an open-source dataset and series of instruct-tuned language models. As I read Orca: Progressive Learning from Complex Explanation Traces of GPT-4 by Mukherjee et. al. of Microsoft, I had to consider the implications f...

OpenOrca

I realized that while Microsoft would probably release their LLaMA-13b based model (as of the time of this writing they still haven't) I concluded that they might not release the dataset. Therefore, I resolved to replicate their efforts, download the data myself, and train the model myself, so that OpenOrca can be released on other sizes of LLaMA as well as other foundational models such as Falcon, OpenLLaMA, RedPajama, MPT, RWKV.