Jean LeLoup & Bob Ponterio SUNY Cortland © 2003, 2017 |
One bit can have 2 possible states. 21=2. 0 or
1.
Two bits can have 4 possible states. 22=4. 00,01,10,11.
(i.e. 0-3)
Four bits can have 16 possible states. 24=16.
0000,0001,0010,0011, etc. (i.e. 0-15)
Seven bits can have 128 possible states. 27=128.
0000000,0000001,0000010, etc. (i.e. 0-127).
Eight bits can have 256 possible states. 28=256.
00000000,00000001,00000010, etc. (i.e. 0-255).
Eight bits are called a byte. One byte character sets can contain 256 characters. The current standard, though, is Unicode which uses two bytes to represent all characters in all writing systems in the world in a single set.
The original ASCII was a 7 bit character set (128 possible characters) with no accented letters. This was used in teletype machines. (The eighth bit was originally used to check parity - a way to look for errors.) IBM and Mac both created extended character sets for their personal computers, using the eighth bit to double the number of characters. As competitors, they didn't use the same characters in the same positions in their sets. Thus 8-bit character sets and incompatibility were born. For example, old Microsoft DOS/Windows used character 130 for the é, but the old Macs used character 142. Character 130 on the Mac was Ç. Today's standards have reduced such problems.
On the Internet, many cables had wires that were designed to transmit 7-bit codes. In order to send more complex data, encoding schemes were developed to translate the more complex data (e.g. 8-bit, binary [graphics]) into something that could fit through a 7-bit pipeline. One such encoding scheme is MIME (actually many different schemes are part of MIME - Multipurpose Internet Mail Extensions). In order for MIME to work, two elements need to be defined, a content format or character set (what characters or other content are to be represented) and an encoding scheme (what codes will be used to represent these characters) for the content.
A common code used for accented characters is Quoted-Printable. Any extended characters (higher than 127) are encoded using a string of three symbols. For example, é is =E9. 8BIT (essentially uncompressed character data) is also a valid MIME code and is the most common way to send characters with accents today.
To allow a code to function on two different kinds of machines with different operating systems and different built-in character sets, we all have to agree on standard character sets into which we will translate. The International Organization for Standardization (ISO) has established such standards. For instance, the standard character set for Western European languages is ISO-LATIN-I (or ISO-8859-1). But as long as a computer knows which character set is being used, it can be programmed to translate and display those characters, no matter what the computer's native character set may be. The é is character 130 in ISO-LATIN-I.
A MIME compliant email program will use the email headers to keep track of which character set and which encoding scheme are applied to each email message. A web browser will do the same. This allows the program to convert and know how to display the characters on any given machine so the entire encoding system is transparent (the user doesn't notice) to the user. For MIME Quoted-Printable in western European languages, these headers might look like:
And these same MIME headers are also used in web pages so that a web browser such as Internet Explorer, Chrome or Firefox knows how to display each page, no matter where it was created and no matter where it is being viewed. As long as the computer knows which character set is represented, it knows which character to display.
Keyboards
You also have to get the characters into the computer in the first place. Windows and Mac have long allowed this to be done using keyboard shortcuts. The best way to enter characters in Windows is to select a keyboard layout that includes the characters that you want to type. For typing western European languages on American keyboards, the safest bet and the easiest to use if you already know how to type on an American keyboard is the US-International keyboard. In Windows 7, 8, 10, look for the Keyboards and Languages tab on the Region and Language control panel to change or add keyboards. Although many programs might have built-in keyboard shortcuts, the advantage of using the keyboard in the operating system (Windows, Mac) is that will work for all programs.
For online help with keyboards see:
Non-English Keyboards Windows 7 ; General for Windows & Mac ; Windows 10
Character set lookup tables (including HTML codes for web pages)