ASCII & Unicode

These are essentially character sets that provide codes for each character.

ASCII

In ASCII-7, there are a total of 128 available characters (codes 0-127) here (2 to the 7th). ASCII-8 provides 256 available characters (2 to the 8th) (codes 0-255).

Similar characters are grouped together. If A is 65, then B is 66, C is 67, enough said. Upper-case and lower-case characters are grouped separately (A is not grouped with a).

When converting the binary equivalent of the code of a letter, the 6th bit changes. In lower-case letters, the 6th bit is 1. In upper-case, it's 0.

ASCIIs main disadvantage is its limited character set. Even ASCII-8 proves to be insufficient. Only so many languages and special characters can be supported in 256 slots.

The purpose of unicode is to allow for a greater number of characters.

Unicode

The codes of the first 128 characters in unicode are identical to the codes found in ASCII (backwards compatible without much fuss).

16-bit unicode can handle about 65,536 characters (2 to the 16th). 32-bit unicode can handle about 4 billion characters.

With ASCII, characters are referred to with a denary or hexadecimal value. Unicode characters are referred to with "U+" followed by a 4 digit hexadecimal number. If 65 is 41 in hexadecimal, then A is referred to as U+0041 in unicode terminology. These are called code points.

A few goals of unicode:

Create a universal standard
Better coding system for characters (compared to ASCII)
Uniform number of bits (i.e. 16 bits or 32 bits only (depending on the version of unicode))
Unambiguous encoding between different versions of unicode (i.e. the same character has the same code in 16 or 32 bit unicode)
Reserve part of the code for the user to fill

This article was written on 22/09/2023. If you have any thoughts, feel free to send me an email with them. Have a nice day!