@Mojeek
@lemmy.mlif you look at the repo they give thanks to:
"The commoncrawl organization for crawling the web and making the dataset readily available. Even though we have our own crawler now, commoncrawl has been a huge help in the early stages of development."
There is nothing I can find which says how much of the index is CC and how much is their own; if there's a decent amount of CC, this is originally for researchers etc. it's not the best resource in the world for a search index: https://commoncrawl.org/
That being said, as an independent search engine, it's always good to see people take on the massive task of actually building an index, not becoming a proxy.
Thanks for mentioning us, here's a good quantity with information on sources: https://www.searchenginemap.com/ if it is of use
ah no bother at all, not everyone is gonna be across every single kind of company and they are functionally very similar!
thank you kindly, that is great to hear; we've got this new and even better (so far testing is showing that) algo ready to go too: https://www.mojeek.com/eval so hopefully it'll be even more fantastic-er soon :D