ASCII Encoding Explained: Characters, Control Codes, and Extended ASCII
ASCII is the foundation of modern text encoding. Learn how ASCII maps characters to numbers, what control characters do, how extended ASCII grew from 128 to 256 characters, and how Unicode superseded it.
What Is ASCII?
ASCII — the American Standard Code for Information Interchange — is a 7-bit character encoding standard defined in 1963. It maps 128 character codes (0–127) to letters, digits, punctuation, and control characters.
When two early computer systems needed to exchange text, they needed to agree on what binary value represents each character. ASCII provided that agreement and became the universal standard for text in early computing.
The ASCII Table Structure
The 128 code points are divided into three groups:
| Range | Type | Examples |
|---|---|---|
| 0–31 | Control characters | NUL, LF, CR, TAB, ESC |
| 32–126 | Printable characters | Space, A–Z, a–z, 0–9, punctuation |
| 127 | DEL | Delete / backspace |
The printable characters are arranged so that:
- Uppercase letters A–Z occupy codes 65–90
- Lowercase letters a–z occupy codes 97–122
- Digits 0–9 occupy codes 48–57
There is a useful pattern: converting between uppercase and lowercase is just flipping bit 5 (add or subtract 32, or XOR with 0x20):
"A".charCodeAt(0); // 65
"a".charCodeAt(0); // 97 = 65 + 32
// Toggle case using XOR
String.fromCharCode("A".charCodeAt(0) ^ 32); // "a"
String.fromCharCode("a".charCodeAt(0) ^ 32); // "A"
Digits 0–9 (codes 48–57) have the useful property that the numeric value equals code - 48:
"7".charCodeAt(0) - 48; // 7
"0".charCodeAt(0) - 48; // 0
Control Characters
The first 32 ASCII codes are control characters. Originally intended to control teletype machines and serial terminals, most are now used only in specific contexts:
| Code | Name | Char | Use |
|---|---|---|---|
| 0 | NUL | \0 | String terminator in C |
| 7 | BEL | \a | Terminal bell sound |
| 8 | BS | \b | Backspace |
| 9 | HT | \t | Horizontal tab |
| 10 | LF | \n | Newline (Unix line endings) |
| 13 | CR | \r | Carriage return (part of Windows CRLF) |
| 26 | SUB | ^Z | End-of-file in Windows (Ctrl+Z) |
| 27 | ESC | \e | Start of ANSI escape sequences |
ASCII in Programming
C-style strings
C strings are arrays of bytes terminated by a NUL byte (\0). The length of a C string is the number of bytes before the first NUL. This is why NUL bytes in binary data can corrupt C string handling — the string appears to end prematurely.
Character arithmetic
Because ASCII is a dense, ordered mapping, many operations reduce to arithmetic:
// Check if a character is a digit
int isDigit(char c) { return c >= '0' && c <= '9'; }
// Parse a decimal digit
int digit = c - '0';
// Convert lowercase to uppercase
char toUpper(char c) {
if (c >= 'a' && c <= 'z') return c - 32;
return c;
}
Extended ASCII: Codes 128–255
The original ASCII standard used 7 bits, leaving the 8th bit unused. As personal computers became widespread in the 1980s, vendors started using codes 128–255 for additional characters: accented Latin letters, box-drawing characters, and symbols.
The problem: everyone defined their own mapping. IBM's Code Page 437 (used in DOS) had box-drawing characters. ISO Latin-1 (ISO 8859-1) had Western European accented letters. These were incompatible with each other.
This proliferation of incompatible "extended ASCII" encodings was a major source of mojibake (garbled text).
Unicode: The Successor to ASCII
Unicode was designed to be a universal character set. Crucially, Unicode is a superset of ASCII: the first 128 Unicode code points are identical to ASCII. Code point U+0041 is 'A', code point U+000A is the newline character, and so on.
The most common Unicode encoding, UTF-8, has a further useful property: ASCII characters are encoded as a single byte identical to their ASCII value. This means any ASCII file is also a valid UTF-8 file.
// In a UTF-8 file, these are equivalent:
const a = "A"; // U+0041, one byte: 0x41
const code = 0x41; // same byte value as ASCII 'A'
new TextEncoder().encode("A"); // Uint8Array [65]
For new code, always use UTF-8. Extended ASCII code pages should only be used when reading legacy files that require a specific code page.