As I’ve watched folks have the same “issues” for years, and made the same explanations for years, I’ve decided to just put it in one posting where I can “toss a link” instead of recreating the same painful comments each time.
(Painful in that I not only need to use a level of “magic” in HTML as they need to learn, but as I need to take it beyond that a layer or two to display what one does and not have WordPress turn it into the actual HTML. In other words, the work to explain it is much harder than the work to do it.)
First up, a couple of definitions:
HTML – Hyper Text Markup Language. Those funny coding characters people use to make web pages and comments do “special things” like italics or bold or
strike out text. They are “captured” by the WordPress engine and turned into the effect itself.
URL – Universal Resource Locator. That thing in the “address bar” of your browser that finds things on the internet. So, for example, “https:/duckduckgo.com” (that is the address of a much more polite search engine than Google, IMHO.) Now you will notice that just shows up as plain text, not an active link. That is because the quote marks ” ” around it told WordPress “do not add the HTML to make this an active link like:” https://duckduckgo.com/
Unicode – An encoding system that lets you represent darned near any character in the computer universe as a series of very common characters that pretty much everyone has. There are three common ‘encodings’ used. Hexadecimal numbers (because computers calculate in binary and hexidecimal, or ‘hex’, is an easy way to represent it – so a lot of converting hex to something else and back by programmers can be avoided with a direct hex mapping), Decimal numbers (because real people often don’t like thinking in hex or dealing with it), and Text (since most folks can’t remember a string of numbers all that well for common things.) So, for example, if I want the ‘cent sign’, it is not on my keyboard. So how to get ¢? Well, I used the Unicode for “cent sign” that has the text value “cent” wrapped in an escape sequence..
Escape – When you are in one mode, such as ‘use as a URL’, and wish to have that not happen, you can ‘escape’ that mode and into another, typically with the use of specially defined “escape characters”. These can be various kinds of quoting (as shown above with the duckduckgo URL example) or other special characters. Since there are hundreds of kinds of “modes” in Geekdom and various languages and systems, that all have different ideas about what is “special”, the “escape sequence” for any one mode varies dramatically by what computer system, language, or mode is in use. So sometimes it is a / slash, or a \ backslash, or a ” ” pair, or even an & or more. For HTML in a comment, you are trying to ‘escape’ (temporarily) from text entry mode and into HTML mode. This is usually done with the < sign. The escape is ended with the > sign. This also means attempts to use them as “just text” can fail when WordPress decides you meant them as an “escape to HTML” directive and stole them to make HTML out of it. Moral: Don’t start with a leading < and end with a > anything you want to ever see again ;-)
Guess how I typed those < > without them being captured as HTML? Yes, Yet Another Escape Sequence. In this case, the & is used to mark an “escape to Unicode for a particular character” and that escape ends with a semicolon ; and between them you can put the Unicode coding for a particular character. For Less Than sign and Greater Than sign, there are several options. Including a text value, a hexadecimal number, or a decimal number. The easy to remember text form is lt for less than and gt for greater than. So the whole sequence for < is < Easy, no? (Now figure out how I got THAT to print… ;-)
Some folks try to use the square brackets [ and ] and those are not special to WordPress. Oddly, there’s also an “Angle Bracket” that is a lot like the < and > signs, but isn’t it either. More on that below… Just realize folks often call those < and > characters “angle brackets” when they mean something else…
WordPress – Is, of course, this wonderful platform that lets us have blogs for free and hides all the technical details from us… except when it takes something to be Unicode and "fixes it" when it wasn't, or takes something to NOT be a URL when it is, and breaks the URL. Then we are thrust kicking and screaming into the hidden world of hypertext and markup languages against our will… (Well, most of us. Geeks like me kind of like it. I type my HTML "long hand" and don't use their visual editor 'tools'…)
Some Unicode and HTML
OK, like it or not, you must learn a couple of escape sequences, and a little bit of Unicode. The very good news is that the Unicode for all sorts of characters is easily available for you on a web site (URL) so you don’t have to remember the strange ones. You can just look them up.
This URL gives a high level entry into their listings. Choose the “A-Z Index” for most things (or put a target in the “search” box).
Here is an example URL for the “Full Stop” character (aka the ‘period’).
What does an entry look like? Well, a lot of stuff mostly of interest to computer geeks and programmers, but a bit near the middle you care about.
HTML Entity (decimal) .
HTML Entity (hex) .
How to type in Microsoft Windows
The Full Stop does not have a ‘text’ representation, so no easy to remember form for it…
First off, notice that “46” shows up in a few places. And that “2e” comes around. That’s a very common pattern. The Hex and the Decimal values being common to many ways of using that Unicode, only the “wrapping” changes. We care about the HTML way of using it. So . for the decimal, or we could put in an ‘x’ that says “Hex coming” and use the Hexidecimal value of “2e” (2 in the “16s place” for 32, plus e in the 1’s place that is “14′, add them, golly, it’s 46. 1 2 3 4 5 6 7 8 9 A B C D E F are 1 through 15 in Hex) The “Geeky” will just remember that period is 46 and translate to hex as needed. The UberGeek will remember it is 2e and translate that to decimal if ever needed ;-)
OK, we know it’s 46. Now what? In Windows (which you ought not need to use unless in some place other than WordPress), the “Escape” is “Alt” then you use 46 with a leading zero, or not. For Hex, you use a plus sign and mandatory two zeros. Just lumpy and wrong. Lucky for us, what we care about is the HTML version.
For that, the escape is the & character and the close escape the ; and we just wrap those around the 46 value with a number sign stuck in front of it to say it is a decimal number. Easy. Just as easy, use the # followed by an x to say it is a Hex number and use the Hex value for 46 – 2e.
Now go to the upper left of that Unicode page and type “ampersand” in the search box. You will get a listing of URLs and the top one ought to be
It is interesting for a couple of reasons. First off, you get an example of a ‘text’ mapping. Second, it is the “magic” of escaping the escape…
HTML Entity (decimal) &
HTML Entity (hex) &
HTML Entity (named) &
We’ll be ignoring the Microsoft entries from here on out for the simple reason I never have needed to use them.
Notice that last entry? “(named)”? That is the text, or ‘named’ value of that Unicode character. So you can type any of those three and get the same thing:
& gives &
& gives &
& gives &
It is also the clue to how to ‘escape the escape’. Instead of typing an & and having WordPress decide if it is part of a Unicode sequence, or not, I can type the Unicode FOR the & and explicitly say “use this as a character, not an escape”. Then just type the rest of the Unicode sequence (that is not being interpreted as Unicode then). &#38; Though for some, like the Hex value, WordPress is even harder to get past and you must ‘escape’ the Hash Sign as well with the #35; value for it.
No, I will not be showing that whole escape value chain here. You end up in Recursion Hell constantly needing to add more values of ‘escaping’ to show how the prior escaping was done, rinse and repeat.. (From The Devils DP Dictionary: Recursion – See Recursion. )
Ok, hopefully at this point you have a handle on how to stop your < signs from being stolen, how to keep all the text between a < and a > from being stolen (particularly a problem when doing math formulas or posting programming text). Also how to put in “special characters” like € € and ¢ ¢ and the currency pound £ £ and even how to find others.
Some Special Cases
There are a few things WordPress allows easily, and a few that it causes ongoing problems by “helping”. I’m just going to list a few of them here and how to get around them.
If there are particular tricks you know, feel free to add them in a comment.
The “Strike out text” is a fun one. Just use “strike” inside lt and gt and /strike to end it.
I want a big Scotch on the rocks Tea is Fine, thanks!
Is typed as:
<strike>I want a big Scotch on the rocks</strike> Tea is Fine, thanks!
Similarly bold using b and /b or Italics as i and /i along with “block quote” done with blockquote and /blockquote markers. Then there is underlining that uses u and /u markers.
<b>To Get Bold</b>
<i>To Get Italics</i>
<u>To Get Underline</u>
To Get Blockquotes
OK, a “sidebar” on Angle Brackets. Technically, they are a different character and different Unicode from the LT and GT symbols.
This is the Left Angle Bracket:
HTML Entity (decimal) 〈
HTML Entity (hex) 〈
So compare the LT < to 〈 the left angle bracket. So just remember to use the LT, ok?
While the right angle bracket is:
HTML Entity (decimal) 〉
HTML Entity (hex) 〉
So compare the GT > to 〉 the right angle bracket. So just remember to use the GT, ok?
The URL Period Problem
Folks will regularly post a URL with “…” in it somewhere. For unknown reasons, some places like to put ‘dots’ in their URLs. WordPress, to help folks who post a URL at the end of a sentence and end it with a period, takes that first ‘dot’ to be ‘end of sentence’ and then the rest of the URL becomes plain text. As that doesn’t work as a URL that’s “not helping”…
To “fix it”, just remember to replace any “dots” in a URL with the Unicode for “full stop” that we saw above:
Try this one:
That ends up broken. Change those ‘dots’ to . and it works:
Another Minor Annoyance
Just as a minor point, if you use Google to find a familiar paper, and just copy the “URL” from the Google Listing, they now hand you a Google Search URL (that is long and looks ugly) instead of the actual article URL. It is best, then, to click that link to the actual article, and THEN copy the real URL from the URL line of your browser.
I suspect this is done as a way to drive more traffic through Google and “get their numbers up” while tracking more people and what they do. “Just say No”. Either copy the final target link (by doing the extra work) or use a more “user respectful” search engine.
Personally, I like DuckDuckGo since they don’t do such shenanigans and are not prone to tracking folks, nor stuffing you into an interest box by profiling you. Called “bubbling” (That is, you get more ‘variety’ of hits, as they do not customize the results to the person, so you spend less time “sucking your own exhaust”.)
If you find yourself posting a URL to a paper that starts with “http://google...” You have most likely been trapped by their traffic feeding gimmick. Try again…
OK, that’s enough for now. Tea time is calling my name ;-)
If I think of any other interesting ones, I’ll add them to this posting over time. Realistically, though, that ought to get most folks past most of the aggravation most of the time, and save me from needing to do the “Unicode Recursion Dance” to explain to folks one at a time how to do it.
Updates & Additions
A couple of tables shamelessly lifted from WUWT on the page made by Ric Werme (with h/t to Tckev & Gail in comments). I’ve expanded the Superscript / Subscript blocks a little as the explanation was not as clear as straight examples. The different choices for a Superscript 1, for example, giving slight font variations (that may not show in all browsers or all browser settings).
|b (bold)||This is <b>bold</b> text||This is bold text|
|i (italics)||This is <i>italicized</i> text||This is italicized text|
|a (anchor)||See <a href=http://wermenh.com>My (Ric’s) home page</a>||See My (Ric’s) home page|
|blockquote (indent text)||My text<blockquote>quoted text</blockquote>More of my text||My text
More of my text
|strike||This is <strike>text with strike</strike>||This is text with strike|
|code (use for monospace display)||<code>Wordpress handles this completely differently</code>||
And this block on “special characters”. (Though do note that they call LT and GT the same as Angle Brackets, when Unicode has a distinct code for Angle Brackets – even if most folks treat LT and GT as Angle Brackets. So be careful when looking up Unicode characters to get the name precisely right. Otherwise, for most purposes, it doesn’t matter much.)
Nevertheless, there are very useful characters that are most reliably entered
|Type this||To get||Notes|
|<||<||Less than sign. Left angle bracket|
|°||°||Degree (Use with C and F, but not K (kelvins))
Alt + numeric keypad 0176 also works
|Superscripts (use 8304, 185, 178-179, 8308-8313 for digits 0-9)|
|Subscripts (use 8320-8329 for digits 0-9)|
|ñ||ñ||For La Niña & El Niño Alt + numeric keypad 0164 also works|
|±||±||Plus or minus|
| ||Like a space, with no special processing (i.e. word wrapping or multiple
|>||>||Greater than sign. Right angle bracket. Generally not needed|
Now to add to it…
Has a list of Unicode characters for superscripts and subscripts with a display of the characters. For me in this browser, some of the letters come out as the ‘MISSING LETTER’ character, so the browser I am using doesn’t know that bit of Unicode. For unusual characters, different browsers may or may not implement them. Still, I can fine the Unicode and put it in a page, even if this browser doesn’t ‘see’ it on display. (That is sort of the purpose of Unicode, to let you handle characters you can not type and may not be able to display… via an alternate encoding.)
For further exploration, you can wander categories of characters at the fileformat.info site, such as this list of ‘numbers, other’:
Which includes some numbers in other odd languages / scripts like Teluga or Malayalam, and interesting ones like the ‘number in a circle’ series:
⑤ giving: ⑤
There are also fractions, some named, so, for example, to get the “vulgar fraction 1/4” (where ‘vulgar’ means common, as opposed to some of the other language / culture fractions also available):
¼ or ¼ giving ¼
½ or ½ giving ½
¾ or ¾ giving ¾
Then un-named fractions in another block:
⅓ giving ⅓
⅔ giving ⅔
⅕ giving ⅕
⅖ giving ⅖
⅗ giving ⅗
⅘ giving ⅘
⅙ giving ⅙
⅚ giving ⅚
⅛ giving ⅛
⅜ giving ⅜
⅝ giving ⅝
⅞ giving ⅞
⅟ giving ⅟
That last one is interesting since with subscripts you ought to be able to construct other fractions.
⅟₂₃ made from: ⅟₂₃
There are also the whole set of accent marks and all, but those will be added “on another day” ;-)