Lemmy Online

!lemmyonline

@lemmyonline.com
Create post
Downtime Explanation - Updated 7/24 9pm CST

Downtime Explanation - Updated 7/24 9pm CST

My apologies for the past day or so of downtime.

I had a work conference all of last week. On the last morning around 4am, before I headed back to my timezone, "something" inside of my kubernetes cluster took a dump.

While- I can remotely reboot nodes, and even access them... the scope of what went wrong was far above what I can accomplish remotely via my phone.

After returning home yesterday evening, I started plugging away a bit, and quickly realized.... something was seriously wrong with the cluster. As such, from previous experience, I found it was quicker to just tear it down, rebuild it, and restore from backups. So- I started that process.

However, since, I had not seen my wife in a week, I felt spending some time with her was slightly more important at the time. But- I was able to finish getting everything restored today.

Due, to the issues before, I will be rebuilding some areas of my infrastructure to be slightly more redundant.

Whereas before- I had bare-metal machines running ubuntu, going forward, I will be leveraging proxmox for compute clustering and HA, along with ceph for storage HA.

That being said, sometime soon, I will have ansible playbooks setup to get everything pushed out and running.

Again- My apologies for the downtime. It was completely unexpected, and came out of the blue. I honestly still have no idea what happened.

The best suspicion I have, is disk failure.... and after rebooting the machine, it came back to life?

Regardless, Will work to improve this moving forward. Also- I don't plan on being out of town soon... so, that will help too.

There may be some slight downtime later on as I am working on and moving things around. If- that is the case, it will be short. But- for now- the goal is just restoring my other services and getting back up and running.

Update 2023-07-23 CST

There are still a few kinks being worked out. I have noticed occasionally things are disconnecting still.

Working on ironing out the issues still. Please bear with me.

(This issue appears to be due to a single realtek nic in the cluster... realtek = bad)

Update 9:30pm CST

Well, it has been a "fun" evening. I have been finding issues left and right.

  1. A piece of bad fiber cable.
  2. The aforementioned server with a realtek NIC which was bringing down the entire cluster.
  3. STP/RSTP issues, likely caused by the above two issues.

Still, working and improving...

Update 2023-07-24

Update 9am CST

Working out a few minor kinks still. Finish line is in sight.

Update 5pm CST

Happened to find a SFP+ module which was in the process of dying. Swapped it out with a new one, and... magically, many of the spotty network issues went away.

Have new fiber ordered, will install later this week.

Update 9pm CST

  1. Broken/Intermittent SFP+ Module replaced.
  2. Server with crappy realtek nic removed. Re-added server with 10G SFP+ connectivity.
  3. Clustered servers moved to dedicated switch.
  4. New fiber stuff ordered to replace longer-distance (50ft) 10G copper runs.

I am aware of current performance issues. These will start going away as I expand out the cluster. Still focusing on rebuilding everything to a working state.

Lemmyonline is back!

Lemmyonline is back!

Lemmyonline.com is back under a new owner! In the coming days I'll be migrating the servers etc. but other than that, everything should stay the same, so feel free to enjoy the instance as you used to!

Lemmyonline.com, will go offline 2023-09-04

Lemmyonline.com, will go offline 2023-09-04

As promised, if I brought the instance offline, I would give you a heads up in advance.

Here are the reasons for me coming to this decision-

Moderation / Administration

Lemmy has absolutely ZERO administration tools, other then the ability to create a report. This, makes it extremely difficult to properly administer anything.

As well, other then running reports and queries against the local database manually, I literally do not have insight into anything. I can't even see a list of which users are registered on this instance, without running a query on the database.

Personal Liability

I host lemmyonline.com on some of my personal infrastructure. It shares servers, storage, etc. It is powered via my home solar setup, and actually doesn't cost much to keep online.

However- for a project which compensates me exactly $0.00 USD (No- I still don't take donations). It is NOT worth the additional liability I am taking on.

That liability being- currently trolls/attackers are literally uploading child-porn to lemmy. Thumbnails and content gets synced to this instance. At that point, I am on the hook for this content. This, also goes back to the problem of literally having basically no moderation capabilities either.

Once something is posted, it is sent everywhere.

Here in the US, they like to send no-knock raids out. That is no-bueno.

Project Inefficiencies

One issue I have noticed, every single image/thumbnail, appears to get cached by pictrs. This data is never cleaned up, never purged.... so, it will just keep growing, and growing. The growth, isn't drastic, around 10-30G of new data per week- however, this growth isn't going to be sustainable, especially due to again- this project compensates me nothing. While- hosting 100G of content, isn't going to be a problem. When we start looking 1T, 10T, etc.... That costs money.

Its not as simple as tossing another disk into my cluster. The storage needs redundancy. So, you need multiple disks there.

Then, you need backups. A few more disks here.

Then, we need offsite backups. These cost $/TB stored.

I don't mind hosting putting some resources up front to host something that takes a nominal amount of resources. However- based on my stats, its going to continue to grow forever as there is no purge/timeout/lifespan attached to these objects.

I don't enjoy lemmy enough to want to put up with the above headaches.

Lets face it. You have already seen me complain about the general negativity around lemmy.

The quality of content here, just isn't the same. I have posted lots of interesting content to try and get collaboration going. But, it just doesn't happen.

I just don't see nearly as much interesting content, as I want to interact with.

Summary-

I get no benefit from hosting lemmy online. It was a fun side project for a while. I refuse to attempt to monetize it as well.

As such, since I don't enjoy it, and the process of keeping on top of the latest attacks for the week is time consuming, and boresome, The plan is simple.

The servers will go offline 2023-09-04.

If you wish to migrate your account to another instance-

Here is a tool recently released.

https://github.com/gusVLZ/lemmy_handshake

Pictrs disabled

Pictrs disabled

A heads up....

Since, attackers/etc are now uploading CSAM (child porn....) to lemmy, which gets federated to other instances....

Because I really don't want any reason for the feds to come knocking on my door, as of this time, pictrs is now disabled.

This means.... if you try to post an image, it will fail. As well, you will notice other issues potentially.

Driver for this: https://lemmyonline.com/post/454050

This- is a hobby for me. Given the complete and utter lack of moderation tools to help me properly filter content, the nuclear approach is the only approach here.

Negativity on Lemmy

Negativity on Lemmy

I am just wondering... is it me- or is there a LOT of just general negativity here.

Every other post I see is...

  1. America is bad.
  2. Capitalism is bad. Socialism/Communism is good.
  3. If you don't like communism, you are a fascist nazi.

Honestly, it's kind of killing my mood with Lemmy. There are a few decent communities/subs here, but, the quality of content appears to be falling.

I mean, FFS. It can't just be me that is noticing this. It honestly feels like I am supporting a communist platform here.

I am on social media to post and read about things related to technology, automation, race cars, etc.

Every other technology post, is somebody bashing on Elon Musk (actually- that is deserved), or talking about Reddit (Let it go. Seriously. We are here, it is there).

On my hobby of liking racecars, I guess, half of the people on lemmy feel it is OK to vandalize a car for being too big.... and car hate culture is pretty big.

All of this is really turning off my mood regarding lemmy.

LemmyOnline Updated to 0.18.4

LemmyOnline Updated to 0.18.4

Sorry for the ~30 seconds of downtime earlier, however, we are now updated to version 0.18.4.

Base Lemmy Changes:

https://github.com/LemmyNet/lemmy/compare/0.18.3...0.18.4

Lemmy UI Changes:

https://github.com/LemmyNet/lemmy-ui/compare/0.18.3...0.18.4

Official patch notes: https://join-lemmy.org/news/2023-08-08_-_Lemmy_Release_v0.18.4

Lemmy

  • Fix fetch instance software version from nodeinfo (#3772)
  • Correct logic to meet join-lemmy requirement, don’t have closed signups. Allows Open and Applications. (#3761)
  • Fix ordering when doing a comment_parent type list_comments (#3823)

Lemmy-UI

  • Mark post as read when clicking “Expand here” on the preview image on the post listing page (#1600) (#1978)
  • Update translation submodule (#2023)
  • Fix comment insertion from context views. Fixes #2030 (#2031)
  • Fix password autocomplete (#2033)
  • Fix suggested title " " spaces (#2037)
  • Expanded the RegEx to check if the title contains new line caracters. Should fix issue #1962 (#1965)
  • ES-Lint tweak (#2001)
  • Upgrading deps, running prettier. (#1987)
  • Fix document title of admin settings being overwritten by tagline and emoji forms (#2003)
  • Use proper modifier key in markdown text input on macOS (#1995)
LemmyOnline Updated 0.18.3

LemmyOnline Updated 0.18.3

Open link in next tab

Lemmy v0.18.3 Release - Lemmy Online

https://lemmyonline.com/post/150849

## What is Lemmy? Lemmy is a self-hosted social link aggregation and discussion platform. It is completely free and open, and not controlled by any company. This means that there is no advertising, tracking, or secret algorithms. Content is organized into communities, so it is easy to subscribe to topics that you are interested in, and ignore others. Voting is used to bring the most interesting items to the top. ## Major Changes This version brings major optimizations to the database queries, which significantly reduces CPU usage. There is also a change to the way federation activities are stored, which reduces database size by around 80%. Special thanks to @phiresky for their work on DB optimizations. The federation code now includes a check for dead instances which is used when sending activities. This helps to reduce the amount of outgoing POST requests, and also reduce server load. In terms of security, Lemmy now performs HTML sanitization on all messages which are submitted through the API or received via federation. Together with the tightened content-security-policy from 0.18.2, cross-site scripting attacks are now much more difficult. Other than that, there are numerous bug fixes and minor enhancements. ## Support development @dessalines and @nutomic are working full-time on Lemmy to integrate community contributions, fix bugs, optimize performance and much more. This work is funded exclusively through donations. If you like using Lemmy, and want to make sure that we will always be available to work full time building it, consider donating to support its development [https://join-lemmy.org/donate]. No one likes recurring donations, but they’ve proven to be the only way that open-source software like Lemmy can stay independent and alive. - Liberapay [https://liberapay.com/Lemmy] (preferred option) - Open Collective [https://opencollective.com/lemmy] - Patreon [https://www.patreon.com/dessalines] - Cryptocurrency [https://join-lemmy.org/donate] (scroll to bottom of page) ## Upgrade instructions Follow the upgrade instructions for ansible [https://github.com/LemmyNet/lemmy-ansible#upgrading] or docker [https://join-lemmy.org/docs/en/administration/install_docker.html#updating]. There are no config or API changes with this release. This upgrade takes ~5 minutes for the database migrations to complete. You may need to run sudo chown 1000:1000 lemmy.hjson if you have any permissions errors. If you need help with the upgrade, you can ask in our support forum [https://lemmy.ml/c/lemmy_support] or on the Matrix Chat [https://matrix.to/#/#lemmy-admin-support-topics:discuss.online].

The issue, and next steps

The issue, and next steps

Turns out... its ceph storage.

Despite having 7x OSDs on bare metal NVMe... despite having DEDICATED 10G network connectivity.... Its having significant performance issues.

Any spikes in IO (Large file transfers, backups. Even copying files to a different server) would cause huge IO delays, causing things to break or drop offline.

There are no errors shown. The configuration is pretty standard. I have no idea why it is having so many issues.

I have cleared off a new NVMe, and will move this server to it tomorrow, and hopefully end all of the issues from this week... Assuming I have any users left here. (I wouldn't blame you for leaving, it has been a really bad week for LemmyOnline)

IF, my assumptions are incorrect, then f-it, I will just run lemmy on a bare metal server I have on standby.

Update

Server migrated to local storage. Was, nearly unnoticeable, unless you did something in the 3 minute window it took to clone/restore/etc.

Migrated to new server

Migrated to new server

Just finished migrating to a different server... hopefully this helps some.

Downtime Followup

Downtime Followup

As a continuation from the FIRST POST

As you have likely noticed, there are still issues.

To summarize the first post.... catastrophic software/hardware failure, which meant needing to restore from backups.

I decided to take the opportunity to rebuild newer, and better. As such, I decided to give proxmox a try, with a ceph storage backend.

After, getting a simple k8s environment back up and running on the cluster, and restoring the backups- lemmy online, was mostly back in business using the existing manifests.

Well, the problem is.... when heavy backend IO occurs (during backups, big operations, installing large software....), the longhorn.io storage used in the k8s environment, kind of... "dies".

And- as I have seen today, this is not an infrequent issue. I have had to bounce the VM multiple times today to restore operations.

I am currently working on building out a new VM specifically for LemmyOnline, to seperate it from the temporary k8s environment. Once, this is up and running, things should return to stable, and normal.