The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Open link in next tab

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

Ever wonder about that mysterious Content-Type tag? You know, the one you’re supposed to put in HTML and you never quite know what it should be? Did you ever get an email from your friends in…

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Sign in to add comment

Oh boy, this article is 20 years old. Still relevant though so go read it if you still aren't sure about Unicode!

I would take this with a grain of salt. None of the programming languages I have used require this level of in-depth knowledge. Certainly not modern C++.

Did you see the date on the article?

No. I can only read date in the YYYY/MM/DD format.

It's not so much about the programming language you use, it's about what data you're taking in, what you're doing with it, and where you're passing the data off to next.

If everything is all the same encoding, or all your data is ANSI you never have to think about it. It's only when your program runs across systems or regions things get screwed up

You make a good point. Data and programming can be seen as separate entities. Though I'd disagree with this slightly because some programming languages are friendlier to some data types than others.

With that said, my main disagreement comes from the claim that every programmer must know what is being pitched here. Which is what I would take with a block of salt.

It's still true that if you have a bunch of bytes and don't know their encoding, you don't really know what characters you have.

Good thing print debugging is still going strong!

I code across multiple systems, particularly Windows and Linux. A lot of data involved. These encoding issues are rare. Granted, I'm more on the backend of things and more numerical-data driven. I'd expect a completely different set of headaches for web dev and such. But that just highlights the issue I have with the claim that every programmer must know this. Hence why the comment about taking it with a grain of salt.

I have seen a lot of people interested in learning but get turned away by this level of complexity. This imo is unnecessary. Programming is first and foremost about logic, this level of in-depth knowledge requirement is what we have been doing away with in higher-level programming languages.