ASCII vs Unicode

Dolly Desir
3 min readMar 20, 2021

--

Up until this point I’ve spent about 2 weeks understanding and trying to solidify searching algorithms. Before ever diving into algos, I thought throwing .sort()on the end of an array was enough. Little did I know, it doesn’t always work the way you would expect. I am working on being really patient with myself for not knowing everything so I took a deeper look into the .sort() method. Reading up on it made realize, that I often forget when I’m coding, I’m writing in a way that a computer can understand what I’m trying to do. Writing code is so much more than just manipulating the DOM, it’s super important to understand what’s happening under the hood. Referencing developer.mozilla.org. “.sort() sorts the elements of an array in place and returns the sorted array. The default sort order is ascending, built upon converting the elements into strings, then comparing their sequences of UTF-16 code units values.” The .sort() method works as expected when the array is strings but what about when the array is numbers? “If compareFunction is not supplied, all non-undefined array elements are sorted by converting them to strings and comparing strings in UTF-16 code units order. For example, "banana" comes before "cherry". In a numeric sort, 9 comes before 80, but because numbers are converted to strings, "80" comes before "9" in the Unicode order. All undefined elements are sorted to the end of the array. “

This discussion isn’t about the .sort() method though, it’s about Character Sets and the differences between them. A Character Set is a list of characters that your computer can understand, ‘A’ & ‘a’ is understood very differently by our computers. In the early days of Computer Science & Programming, only numbers 0–9, english letters and a few punctuations symbols were considered. ASCII, American Standard Code for Information Interchange was the system used to represent each of those characters.

ASCII can represent every character on a keyboard using numbers 0–127. Numbers 0–31 represents control characters, for instance the number 7 ‘Bell’ would make your computer beep. As technology evolved and expanded across the world, computer manufacturers started making use of an extended ASCII table that represented accented characters for different languages and various simple shapes that were used to produce basic graphics. The problem with this was that all the different computer manufacturers were using their own versions of this extended table that was specific to their operating system. There was no universal standard that all manufacturers followed and with the introduction of the world wide web came the ability for data to move from one computer to another, there would be conflict. An IBM computer possibly wouldn’t be able to read or display characters that were from a Microsoft computer because of each manufacturer having their own versions of the extended table, resulting in a corrupted file.

Finally in 1991, a universal standard was created which is Unicode. This standard gave each individual character a unique number, no matter what platform, operating system, application etc. This standard was created by the Unicode Consortium which includes organizations such as Apple, Google, IBM, Netflix, & Adobe to name few. Unicode is a modern and updated extension of ASCII, but with slight differences.

ASCII:

  • A character encoding standard for electronic communication.
  • Supports 128 characters.
  • Uses 7 bits to represent a character.
  • Uses less memory.

Unicode:

  • Computing industry standard for consistent encoding, representation and handling of text in most of the world’s writing system.
  • Supports a wide range of characters
  • Uses 8, 16 or 32 bit based on the encoding type.
  • Uses more memory.

I never imagined that .sort() would have taken me down this rabbit hole of computer science but I realize knowing these basic fundamentals of programming will only help me improve as a developer. I hope you learned something new today! Happy coding!!

--

--