They aren't behind any login or anything stopping it. So yah, I expect they're already are being indexed.
That's a good point. The same content exists on multiple instances. I think Lemmy should set a canonical URL the HTML <head>. The canonical URL of each post should point to the instance where a post originates from.
Seems like that is not implemented in Lemmy. Also checked Mastodon, and doesn't have a canonical tag either.
On browser, I see a little fediversee icon next to every post/comment that links to the canonical. I don't think traditional html search engines know how to index it, though. Probably better if we have our own lemmy search engine like browse.feddit.de
You can search posts on lemmy using Google already. They are indexed as separate sites, so you may have to use "site:lemmy.ml" or "site:beehaw.org" in order to find a post. I do wonder if major search engines will try to handle federation more comprehensively in the future, though.
Here's an example Google search, with these operators:
(site:lemmy.world OR site:lemmy.ml OR site:beehaw.org OR site:feddit.de OR site:sh.itjust.works OR site:lemmy.one OR site:lemmy.ca)
Yes, actually it's already getting indexed. For example you can try searching for site:lemmy.ml
on DDG or Google. Although it'll probably take a while before search engines will deem lemmy instances "popular enough" for posts to show up for regular search queries (assuming that'll even happen at all).
There is a similar topic on beehaw.
Yes, lemmy posts can be indexed and found, but there are disadvantages compared to big, centralized services. I just found some posts on ecosia page 3.
I'm not sure if posts from instances without 'lemmy' in their name would show up when somebody searches for "something lemmy".
I checked my instance, and here's the contents of the robots.txt
file.
User-Agent: *
Disallow: /login
Disallow: /settings
Disallow: /create_community
Disallow: /create_post
Disallow: /create_private_message
Disallow: /inbox
Disallow: /setup
Disallow: /admin
Disallow: /password_change
Disallow: /search/
Legitimate search engines will index everything, except what's disallowed. Of course, the robots.txt
could be changed to block all indexing by legitimate search engines.