ASCII & Unicode

These are essentially character sets that provide codes for each character.

ASCII

In ASCII-7, there are a total of 128 available characters (codes 0-127) here (2 to the 7th). ASCII-8 provides 256 available characters (2 to the 8th) (codes 0-255).

Similar characters are grouped together. If A is 65, then B is 66, C is 67, enough said. Upper-case and lower-case characters are grouped separately (A is not grouped with a).

When converting the binary equivalent of the code of a letter, the 6th bit changes. In lower-case letters, the 6th bit is 1. In upper-case, it's 0.

ASCIIs main disadvantage is its limited character set. Even ASCII-8 proves to be insufficient. Only so many languages and special characters can be supported in 256 slots.

The purpose of unicode is to allow for a greater number of characters.

Unicode

The codes of the first 128 characters in unicode are identical to the codes found in ASCII (backwards compatible without much fuss).

16-bit unicode can handle about 65,536 characters (2 to the 16th). 32-bit unicode can handle about 4 billion characters.

With ASCII, characters are referred to with a denary or hexadecimal value. Unicode characters are referred to with "U+" followed by a 4 digit hexadecimal number. If 65 is 41 in hexadecimal, then A is referred to as U+0041 in unicode terminology. These are called code points.

A few goals of unicode:

This article was written on 22/09/2023. If you have any thoughts, feel free to send me an email with them. Have a nice day!