Help:Entering special characters

Many special characters (those not on the standard computer keyboard) are useful—and sometimes necessary—in Wikipedia articles. Even articles that use only English words may use punctuation such as an em dash (—), and symbols such as a section sign (§) or registered mark (®). Articles about or that mention European persons or places may use many extended Latin characters, and articles about other persons and places may require characters from entirely different alphabets. This article describes several methods for entering such characters.

Entry methods edit

There are several ways to enter a special character into wikitext.

Special character link edit

Use a special-character link to enter a Unicode (UTF-8) character. Links are available under Special characters above the edit window, and below the buttons at the bottom of the edit window. Clicking a special-character link enters that character at the current position of the cursor in the edit window, so you need to position the cursor where you want it before clicking the link.

Clicking the arrow to the left of Special characters above the edit window opens a list of groups of images of special characters (see Figure 1 below); clicking again on the arrow (which now points down) closes the list. Click on a group name (e.g., Symbols) to display that group; click on the image of the appropriate character to enter that character at the current cursor position in the edit window. Some of the images of different characters are very similar in appearance, so it is important to use the correct image. For example, the images for the closing single quotation mark (’) and closing double quotation mark (”) are very similar to the images for the single prime (′) and double prime (″) characters (the latter two are located after the image of the degree symbol).

File:SpecialCharsAbove.png
Figure 1. Special-character links above edit window: Symbol group


Groups for the special-character links below the edit window are displayed one at a time; the default group is Insert, which includes punctuation and some other common symbols (see Figure 2 below), but another group may be shown if you have previously selected it. Click the down-pointing arrow at the right of this box to display other groups; click on the appropriate group to select it. When the cursor is passed over a special-character link, the link is underlined; clicking on the underlined link enters that character at the current cursor position in the edit window.

File:SpecialCharsBelow.png
Figure 2. Special-character links below edit window: default Insert group


Russian letters are in the Cyrillic group; most other European letters are in the Latin group. You may need to click several categories in both places to find your special character, especially if it’s non-alphabetic: mathematical symbols can be at Symbols, Insert, or Math and logic (the latter two are only at the bottom link), or at Wikipedia:Mathematical symbols and its linked articles.

Some character images and links include pairs of opening and closing quotation marks. By default, the character pair is entered at the current cursor position; if a passage of text is selected before the image or link is clicked, the quotation marks are entered at the beginning and end of the selection.

Alt code or Option key edit

Enter a Unicode character using an Alt code (Windows operating system) or the Option key (Macintosh computer).

Under Windows, the Alt key is pressed and held down while a decimal character code is entered on the numeric keypad; the Alt key is then released and the character appears. The numerical code corresponds to the character’s code point in the Windows 1252 code page, with a leading zero; for example, an en dash (–) is entered using Alt+0150. The leading zero is required; if it is omitted, a character corresponding to the code point in the default OEM code page is entered. For example, if the OEM default is code page 437, Alt+150 gives û.

On a Macintosh computer, the Opt key (and sometimes another key) is pressed and held down while another key is pressed; the Opt key (and when applicable, the other key) is then released, and the character appears. For example, an en dash is entered using Opt+-; an em dash (—) is entered using Shift+ Opt+-.

Lists of Alt codes and Option key combinations are given in sources linked under External links.

Some keyboards have a Compose key that provides similar functionality with some other operating systems.

External application edit

Enter a Unicode character by copying and pasting from an application such as Character Map, or a text-editing application that supports Unicode (e.g., Microsoft Word). Whenever pasting from an external application, it is important to preview the edit before saving to ensure that the pasted material displays correctly.

HTML character reference edit

Use an HTML character reference. The reference can be either named or numeric; either type begins with an ampersand (&) ends with a semicolon (;). A named reference is of the form &name;; for example, à refers to a lower-case Latin a with grave accent (à). Because the names are reasonably mnemonic, they are usually easier to remember than numerical codes, and accordingly are easier for other editors to recognize.

Some Unicode characters, such as Turkish letters, do not have HTML names, so a numerical reference is sometimes the only option using HTML. An HTML numeric character reference is of the form &#D; or &#xH;; D and H are the character’s Unicode code point in decimal and hexadecimal. For example, either — or — can be entered to give U+2014, em dash (—). Because a character’s Unicode code point is usually given in hexadecimal with a prefixed “U+”, the hexadecimal code is arguably more convenient. Of course, when a name exists, a named reference (e.g., — for an em dash) is usually more convenient (and more easily recognized) than either numerical code.

HTML character names (and the corresponding hexadecimal and decimal codes) are given in List of XML and HTML character entity references.

Because a character reference uses only ASCII characters, it does not require that a Web browser support Unicode, and it is unambiguous when a Web page does not announce its character encoding, when the browser’s encoding is incorrectly manually set, and even when the character does not display properly with some browsers. Accordingly, it is usually the most “Web safe” approach. However, character references are distracting for many editors, and they may cause difficulties with searches in Wikipedia (see below).

Special characters and searches edit

Wikipedia searches are easier if a special character is entered as Unicode. If an HTML entity is used, a word like Odiliënberg can only be found by searching for Odili, euml, nberg or combination thereof; this is actually a bug that should be fixed—the entities should be folded into their raw character equivalents so all searches on them are equivalent. See also Help:Searching.

See also edit

References edit

External links edit