In our recent discussion of foreign loan words, specifically how to render them in Microsoft Word with their diacritics (or “accent marks”) intact, we’ve drawn a distinction between the ninety-four typographical characters available on a standard QWERTY keyboard and the thousands of others that are not.
But from your computer’s perspective, that distinction is arbitrary. The numbers and letters on your keyboard are not your computer’s native language. Rendering even a single letter on the screen requires a process of translation from binary code.
Typing a capital “A,” for instance, initiates a chain of commands and retrievals that must be completed before your software changes white pixels to black to draw on your screen that familiar peaked roof and crossbar. In practical terms, it’s no more simple or complex for the Word software to render an “A” than an “Ɐ.” Painting a typographical character (or glyph) on your screen may require as little as one byte’s worth of computational power or as much as four bytes. The command to execute the rendering of a given glyph is called a “code point.”
Given that processing text is an essential function for most users, industry protocols eventually developed to handle the code points associated with the glyphs of all the world’s written languages. The Unicode Consortium, a nonprofit based in Mountain View, California, was founded in 1991 to compile the various ad hoc systems used in different companies and locations and institute a truly global, cross-platform standard, meaning that a given code point will correspond with the same glyph in any application, search engine, or operating system.
The move toward standardization has been gradual and not without hiccups along the way. The screenshot above is from a piece I wrote in 2008, when my writing first started appearing regularly on the internet. The content management system WordPress was not yet fully Unicode-compliant and did not automatically handle smart quotation marks or em dashes; I therefore hand-coded those marks into the source HTML using the variant encoding system in place for WordPress at the time. A couple of years later, when WordPress updated to full Unicode compliance, dozens of painstakingly formatted articles suddenly looked like glitches in the Matrix.
Current information architecture provides capacity for well over a million unique code points, 1,112,064 to be precise, of which the consortium has assigned a mere 145,000. Code points currently assigned correspond to glyphs from 159 modern and historic writing systems, mathematical operators, decorative and symbolic marks, and non-visual formatting codes. (The Unicode Consortium also approves and administers emojis, meaning they’re the ones who signed off on that suggestively sexy peach. Rrrrrowrrr!)
You’ve borne with me through these five paragraphs of tech-talk and may be wondering, “So what? I’m not a computer programmer. I’m a literature student who wants to write a term paper about the crime novels of Jo Nesbø, and I’d just like to spell his surname properly!” Well, we’ve looked at a couple of ways you could add that “ø” glyph. The drawback to the methods we’ve examined so far is that they require you to interrupt the flow of typing to open what is essentially a second app, scroll through a list of characters, and select and paste the one you want. You’ll get the job done, but it’s inconvenient.
But those Unicode code points? They can all be accessed using some combination of keystrokes on the standard 47-key QWERTY keyboard. And there’s a simple utility built into Windows and other operating systems that will show you which key combo corresponds to a given glyph. Let’s take a look at Character Map.
On a Windows machine, access Character Map by either pressing the Windows key or clicking the Windows icon at the bottom left of your screen, then scrolling down to Windows Accessories. Click to open the drop-down menu, then select Character Map. The utility will open win a pop-up window that looks like this.
So far, this looks a lot like Word’s own Insert function. And indeed, you can use it the same way, by clicking on the glyph you want, clicking Select, clicking Copy, then going back into your Word window to paste.
But Character Map has two major advantages over Insert. First, it’s fully searchable in plain language. Let’s say you need the trademark sign (™); typing “trade” in the search bar at the bottom of the pop-up window brings it up instantly.
And look what else happens when we click on that “ø” glyph:
Down in the lower right-hand corner, where I’ve circled in red, Character Map gives you the instructions for typing the “ø” glyph yourself. “Alt+0248” simply means to hold down the Alt key (There’s one on either side of your space bar.) and and type in order 0, 2, 4, and 8. You can use either the row of numeral keys above your alphabet keys or (if you have one) the numeric keypad the right. I find the latter option more convenient, myself. Here’s how it looks in practice on my keyboard; I’ll hold down the left-hand Alt key and punch numbers with my right.
Now, I’m a two-finger typist (three on a good day), and when I get on a roll I hate to interrupt it. For me, these Unicode keystroke shortcuts are a godsend. Having committed just a few to memory, I can handle my most-commonly encountered diacritic characters and irksome typographical marks on the fly, saving me time.
Now that we’ve covered some background, next time we’ll look at a few Unicode shortcuts that are particularly worth knowing and explain how and when they should be deployed. In the end, you’ll have the information you’ll need to compile your own handy cheat sheet. See you next month!
We will get your free sample back in three to six hours!