Typography
Section titles and ordinals
Section ordinals in the body text are set in Roman numerals.
Section ordinals in a file’s
<title>
element are set in Arabic numerals.Section titles are titlecased according to the output of
se titlecase
. Section titles are not all-caps or small-caps.Section titles do not have trailing periods.
Chapter titles omit the word
Chapter
, unless the word used is a stylistic choice for prose style purposes. Chapters with unique identifiers (i.e. notChapter
, but something unique to the style of the book, likeBook
orStave
) do include that unique identifier in the title.In special cases it may be desirable to retain
Chapter
for clarity. For example, Frankenstein has “Chapter” in titles to differentiate between the “Letter” sections.
Italics
Using both italics and quotes (outside of the context of quoted dialog) is usually not necessary. Either one or the other is used, with rare exceptions.
Words and phrases that require emphasis are italicized with the
<em>
element.Strong emphasis, like shouting, may be set in small caps with the
<strong>
element.When a short phrase within a longer clause is italicized, trailing punctuation that may belong to the containing clause is not italicized.
When an entire clause is italicized, trailing punctuation is italicized, unless that trailing punctuation is a comma at the end of dialog.
Words written to be read as sounds are italicized with
<i>
.
Italicizing individual letters
Individual letters that used in context as a grapheme are italicized with an
<i epub:type="z3998:grapheme">
element. They are typically lowercased and not followed by periods.Individual letters that are not graphemes (for example letters that might be referring to names, the shapes of the letters themselves, or concepts) are not italicized.
The ordinal
nth
is set with an italicizedn
, without a hyphen.
Italicizing non-English words and phrases
Non-English words and phrases that are not in Merriam-Webster are italicized, unless they are in a non-Roman script like Chinese or Japanese.
Non-English words that are proper names, or are in proper names, are not italicized, unless the name itself would be italicized according to the rules for italicizing or quoting names and titles. Such words are wrapped in a
<span xml:lang="LANGUAGE">
element, to assist screen readers with pronunciation.If certain non-English words are used so frequently in the text that italicizing them at each instance would be distracting to the reader, then only the first instance is italicized. Subsequent instances are wrapped in a
<span xml:lang="LANGUAGE">
element.Words and phrases that are originally non-English in origin, but that can now be found in Merriam-Webster, are not italicized.
Inline-level italics are set using the
<i>
element with anxml:lang
attribute corresponding to the correct IETF language tag.Block-level italics are set using an
xml:lang
attribute on the closest encompassing block element, with the style offont-style: italic
.In this example, note the additional namespace declaration, and that we target descendants of the
<body>
element; otherwise, the entire<body>
element would receive italics!Words that are in a non-English “alien” language (i.e. one that is made up, like in a science fiction or fantasy work) are italicized and given an IETF languate tag in a custom namespace. Custom namespaces begin consist of
x-TAG
, whereTAG
is a custom descriptor of 8 characters or less.
Italicizing or quoting newly-used English words
When introducing new terms, non-English or technical terms are italicized, but terms composed of common English are set in quotation marks.
English neologisms in works where a special vocabulary is a regular part of the narrative are not italicized. For example science fiction works may necessarily contain made-up English technology words, and those are not italicized.
Italics in names and titles
Place names, like pubs, bars, or buildings, are not quoted.
The names of publications, music, and art that can stand alone are italicized; additionally, the names of transport vessels are italicized. These include, but are not limited to:
Periodicals like magazines, newspapers, and journals.
Publications like books, novels, plays, and pamphlets, except “holy texts,” like the Bible or books within the Bible.
Long poems and ballads, like the Iliad, that are book-length.
Long musical compositions or audio, like operas, music albums, or radio shows.
Long visual art, like films or a TV show series.
Visual art, like paintings or sculptures.
Transport vessels, like ships.
The names of short publications, music, or art, that cannot stand alone and are typically part of a larger collection or work, are quoted. These include, but are not limited to:
Short musical compositions or audio, like pop songs, arias, or an episode in a radio series.
Short prose like novellas, shot stories, or short (i.e. not epic) poems.
Chapter titles in a prose work.
Essays or individual articles in a newspaper or journal.
Short visual art, like short films or episodes in a TV series.
Examples
Taxonomy
Binomial names (generic, specific, and subspecific) are italicized with a
<i>
element having thez3998:taxonomy
semantic inflection.Family, order, class, phylum or division, and kingdom names are capitalized but not italicized.
If a taxonomic name is the same as the common name, it is not italicized.
The second part of the binomial name follows the capitalization style of the source text. Modern usage requires lowercase, but older texts may set it in uppercase.
Capitalization
In general, capitalization follows modern English style. Some very old works frequently capitalize nouns that today are no longer capitalized. These archaic capitalizations are removed, unless doing so would change the meaning of the work.
Titlecasing, or the capitalization of titles, follows the formula used in the
se titlecase
tool.Text in all caps is almost never correct typography. Instead, such text is changed to the correct case and surround with a semantically-meaningful element like
<em>
(for emphasis),<strong>
(for strong emphasis, like shouting) or<b>
(for unsemantic formatting required by the text).<strong>
and<b>
are styled in small-caps by default in Standard Ebooks.When something is addressed as an apostrophe,
O
is capitalized.Names followed by a generational suffix, like
Junior
orSenior
, have the suffix uppercased if the suffix is part of the person's name.Occasionally,
junior
orsenior
may be used to refer to a younger or elder person having the same last name, but not necessarily the same first name. In these cases, the suffix is lowercased as it is not part of their name, but rather describing their generational relation.
Indentation
Paragraphs that directly follow another paragraph are indented by 1em.
The first line of body text in a section, or any text following a visible break in text flow (like a header, a scene break, a figurem etc.), is not indented, with the exception of block quotations.
Body text following a block quotation is indented only if the text begins a new semantic paragraph. Otherwise, if the body text following a block quotation is semantically part of the paragraph preceding the block quotation, it is not indented. Such non-indented paragraphs have
class="continued"
, which removes the default indentation.
Headers
Titles or subtitles that are entirely non-English-language are not italicized. However, they do have an
xml:lang
attribute to assist screen readers in pronunciation. Titles or subtitles that are in English but contain non-English components have those components italicized according to the general rules for italics.
Chapter headers
Epigraphs in chapters have the quote source set in small caps, without a leading em-dash and without a trailing period.
Ligatures
Ligatures are two or more letters that are combined into a single letter, usually for stylistic purposes. In general they are not used in modern English spelling, and are replaced with their expanded characters.
Words in non-English languages like French may use ligatures to differentiate words or pronunciations. In these cases, ligatures are retained.
Punctuation and spacing
Sentences are single-spaced.
Periods and commas are placed within quotation marks; i.e. American-style punctuation is used, not logical (AKA “British” or “new”) style.
Ampersands are preceded by a no-break space (U+00A0).
Some older works include spaces in common contractions; these spaces are removed.
Quotation marks
“Curly” or typographer’s quotes, both single and double, are always used instead of straight quotes. This is known as “American-style” quotation, which is different from British-style quotation which is also commonly found in both older and modern books.
Quotation marks that are directly side-by-side are separated by a hair space ( or U+200A) character.
Words with missing letters represent the missing letters with a right single quotation mark (
’
or U+2019) character to indicate elision.
Ellipses
The ellipses glyph (
…
or U+2026) is used for ellipses, instead of consecutive or spaced periods.When ellipses are used as suspension points (for example, to indicate dialog that pauses or trails off), the ellipses are not preceded by a comma.
Ellipses used to indicate missing words in a quotation require keeping surrounding punctuation, including commas, as that punctuation is in the original quotation.
A hair space ( or U+200A) glyph is located before all ellipses that are not directly preceded by punctuation, or that are directly preceded by an em-dash or a two- or three-em-dash.
A regular space is located after all ellipses that are not followed by punctuation.
A hair space ( or U+200A) glyph is located between an ellipses and any punctuation that follows directly after the ellipses, unless that punctuation is a quotation mark, in which case there is no space at all between the ellipses and the quotation mark.
Dashes
There are many kinds of dashes, and the run-of-the-mill hyphen is often not the correct dash to use. In particular, hyphens are not used for things like date ranges, phone numbers, or negative numbers.
Dashes of all types do not have white space around them.
Figure dashes (
‒
or U+2012) are used to indicate a dash in numbers that aren’t a range, like phone numbers.Hyphens (
-
or U+002D) are used to join words, including double-barrel names, or to separate syllables in a word.Minus sign glyphs (
−
or U+2212) are used to indicate negative numbers, and are used in mathematical equations instead of hyphens to represent the “subtraction” operator.En-dashes (
–
or U+2013) are used to indicate a numerical or date range; to indicate a relationships where two concepts are connected by the word “to,” for example a distance between locations or a range between numbers; or to indicate a connection in location between two places.
Em-dashes
Em-dashes (—
or U+2014) are typically used to offset parenthetical phrases.
Em-dashes are preceded by the invisible word joiner glyph (U+2060).
Interruption in dialog is set by a single em-dash, not two em-dashes or a two-em-dash.
Partially-obscured words
Em-dashes are used for partially-obscured years.
A regular hyphen is used in partially obscured years where only the last number is obscured.
A two-em-dash (
⸺
or U+2E3A) preceded by a word joiner glyph (U+2060) is used in partially obscured word.A three-em-dash (
⸻
or U+2E3B) is used for completely obscured words.
Numbers, measurements, and math
Coordinates are set with the prime (
′
or U+2032) or double prime (″
or U+2033) glyphs, not single or double quotes.Ordinals for Arabic numbers are as follows:
st
,nd
,rd
,th
.Numbers in a non-mathematical context are spelled out if they are less than or equal to 100. Numbers over 100 are set with digits.
If a series of numbers is close together in a sentence, and one would be spelled out but another wouldn’t, spell out all numbers within that context to maintain visual consistency.
Roman numerals
Roman numerals are not followed by trailing periods, except for grammatical reasons.
Roman numerals are set using ASCII, not the Unicode Roman numeral glyphs.
Roman numerals are not followed by ordinal indicators.
Fractions
Fractions are set in their appropriate Unicode glyph, if a glyph available; for example,
½
,¼
,¾
and U+00BC–U+00BE and U+2150–U+2189.If a fraction doesn’t have a corresponding Unicode glyph, it is composed using the fraction slash Unicode glyph (
⁄
or U+2044) and superscript/subscript Unicode numbers. See this Wikipedia entry for more details.There is no space between a whole number and its fraction.
Measurements
Dimension measurements are set using the Unicode multiplication glyph (
×
or U+00D7), not the ASCII letterx
orX
.Feet and inches in shorthand are set using the prime (
′
or U+2032) or double prime (″
or U+2033) glyphs (not single or double quotes), with a no-break space (U+00A0) separating consecutive feet and inch measurements.When forming a compound of a number and unit of measurement in which the measurement is abbreviated, the number and unit of measurement are separated with a no-break space (U+00A0), not a dash. For exceptions in money, see 8.8.7.3.1.
Punctuation in abbreviated measurements
See here for general abbreviation rules that also apply to measurements.
Abbreviated SI units are set in lowercase without periods. They are not initialisms.
Abbreviated English, Imperial, or US customary units that are one word are set in lowercase with a trailing period. They are not initialisms.
The one exception is
G
(i.e.G-force
), which is an initialism that is set without a period.Abbreviated English, Imperial, or US customary units that are more than one word (like
hp
forhorse power
ormph
formiles per hour
) are set in lowercase without periods. They are not initialisms.
Math
In works that are not math-oriented or that don’t have a significant amount of mathematical equations, equations are set using regular HTML and Unicode.
Operators and operands in mathematical equations are separated by a space.
Operators like subtraction (
−
or U+2212), multiplication (×
or U+00D7), and equivalence (≡
or U+2261) are set using their corresponding Unicode glyphs, not a hyphen orx
. Almost all mathematical operators have a corresponding special Unicode glyph.
In works that are math-oriented or that have a significant amount of math, all variables, equations, and other mathematical objects are set using MathML.
When MathML is used in a file, the
m
namespace is declared at the top of the file and used for all subsequent MathML code, as follows:This namespace is declared and used even if there is just a single MathML equation in a file.
When possible, Content MathML is used over Presentational MathML. (This may not always be possible depending on the complexity of the work.)
Each
<m:math>
element has analttext
attribute.The
alttext
attribute describes the contents in the element in plain-text Unicode according to the rules in this specification.Operators in the
alttext
attribute are surrounded by a single space.
When using Presentational MathML,
<m:mrow>
is used to group subexpressions, but only when necessary. Many elements in MathML, like<m:math>
and<m:mtd>
, imply<m:mrow>
, and redundant elements are not desirable. See this section of the MathML spec for more details.If a Presentational MathML expression contains a function, the invisible Unicode function application glyph (U+2061) is used as an operator between the function name and its operand. This element looks exactly like the following, including the comment for readability:
<m:mo><!--hidden U+2061 function application--></m:mo>
. (Note that the preceding element contains an invisible Unicode character! It can be revealed with these unicode-names
tool.)Expressions grouped by parenthesis or brackets are wrapped in an
<m:row>
element, and fence characters are set using the<m:mo fence="true">
element. Separators are set using the<m:mo separator="true">
element.<m:mfenced>
, which used to imply both fences and separators, is deprecated in the MathML spec and thus is not used.If a MathML variable includes an overline, it is set by combining the variable’s normal Unicode glyph and the Unicode overline glyph (
‾
or U+203E) in a<m:mover>
element. However in thealttext
attribute, the Unicode overline combining mark (U+0305) is used to represent the overline in Unicode.
Money
Typographically-correct symbols are used for currency symbols.
Currency symbols are not abbreviations.
£sd shorthand
£sd shorthand is a way of denoting pre-decimal currencies (pounds, shillings, and pence) common in England and other parts of the world until the 1970s.
There is no white space between a number and an £sd currency symbol.
Letters used in £sd shorthand are wrapped in
<abbr>
elements.
Latinisms
Latinisms (except
sic
) that can be found in a modern dictionary are not italicized. Examples includee.g.
,i.e.
,ad hoc
,viz.
,ibid.
,etc.
. The exception issic
, which is always italicized.Whole passages of Latin language and Latinisms that aren’t found in a modern dictionary are italicized.
&c.
is not used, and is replaced withetc.
.For
Ibid.
, see Endnotes.Latinisms that are abbreviations are set in lowercase with periods between words and no spaces between them, except
BC
,AD
,BCE
, andCE
, which are set without periods, in small caps, and wrapped with<abbr class="era">
:
Initials and abbreviations
Acronyms (terms made up of initials and pronounced as one word, like
NASA
,SCUBA
, orNATO
) are set in small caps, without periods, and are wrapped in an<abbr class="acronym">
element with corresponding CSS.Initialisms (terms made up of initials in which each initial is pronounced separately, like
M.P.
,P.S.
, orU.S.S.R.
) are set with periods and without spaces (with some exceptions that follow) and are wrapped in an<abbr class="initialism">
element.When an abbreviation that is not an acronym contains a terminal period, its
<abbr>
element has the additionaleoc
class (End of Clause) if the terminal period is also the last period in clause. Such sentences do not have two consecutive periods.Initials of people’s names are each separated by periods and spaces. The group of initials is wrapped in an
<abbr class="name">
element.Academic degrees are wrapped in an
<abbr class="degree">
element. Degrees that consist of initials are set with a period between each initial. Degrees that consist of initials followed by abbreviated words are set with a hair space before the word.Some degrees are exceptions:
LL.D.
does not have a period inLL
, because it indicates the pluralLegum
.
Postal codes and abbreviated US states are set in all caps, without periods or spaces, and are wrapped in an
<abbr class="postal">
element.Abbreviations that are abbreviations of a single word, and that are not acronyms or initialisms (like
Mr.
,Mrs.
, orlbs.
) are set with<abbr>
.Abbreviations ending in a lowercase letter are set without spaces between the letters, and have a trailing period.
Abbreviations without lowercase letters are set without spaces and without a trailing period.
Abbreviations that describes the next word, like
Mr.
,Mrs.
,Mt.
, andSt.
, are set with a no-break space (U+00A0) between the abbreviation and its target.
Compass points are separated by periods and spaces. The group of points are wrapped in an
<abbr class="compass">
element.
Exceptions that are not abbreviations
The following are not abbreviations, and are set without periods or spaces.
OK
SOS
The following are initialisms, but are set without periods or spaces:
TV
, i.e.television
.AC
andDC
, when referring to electrical current.G
, when used in the sense ofG-force
. Also see 8.8.5.4.2.Stock ticker symbols.
The following are abbreviations, but are not initialisms. Unlike almost all other abbreviations, they are in all caps and only have a period at the end.
MS.
(manuscript)MSS.
(manuscripts)M.
(Monsieur)MM.
(Messieurs)
A.B.C.
, when used in the sense of the alphabet, is not an abbreviation, and is set with periods between the letters. But other uses, likeA.B.C. shops
, are abbreviations. (The abbreviation inA.B.C. shop
stands for “Australian Broadcasting Corporation.”)
Other exceptions
The abbreviations
1D
,2D
,3D
, and4D
, meaning first, second, third, and fourth dimensions, are abbreviations but do not have a trailing period.The words
recto
andverso
are sometimes abbreviated with an initial and a superscripto
. They are regular abbreviations, set without periods, and theo
is superscripted with<sup>
.
Times
Times in a.m. and p.m. format are set in lowercase, with periods, and without spaces.
a.m.
andp.m.
are wrapped in an<abbr class="time">
element.
Times as digits
Digits in times are separated by a colon, not a period or comma.
Times written in digits followed by
a.m.
orp.m.
are set with a no-break space (U+00A0) between the digit anda.m.
orp.m.
.
Times as words
Words in a spelled-out time are separated by spaces, unless they appear before a noun, where they are separated by a hyphen.
Times written in words followed by
a.m.
orp.m.
are set with a regular space between the time anda.m.
orp.m.
.Military times that are spelled out (for example, in dialog) are set with dashes. Leading zeros are spelled out as
oh
.
Chemicals and compounds
Molecular compounds are set in Roman, without spaces, and wrapped in an
<abbr class="compound">
element.Elements in a molecular compound are capitalized according to their listing in the periodic table.
Amounts of an element in a molecular compound are set in subscript with a
<sub>
element.
Temperatures
The minus sign glyph (
−
or U+2212), not the hyphen glyph, is used to indicate negative numbers.Either the degree glyph (
°
or U+00B0) or the worddegrees
is acceptable. Works that use both are normalized to use the dominant method.
Abbreviated units of temperature
Units of temperature measurement, like Farenheit or Celcius, may be abbreviated to
F
orC
.Units of temperature measurement do not have trailing periods.
If an abbreviated unit of temperature measurement is preceded by a number, the unit of measurement is first preceded by a hair space ( or U+200A).
Abbreviated units of measurement are set in small caps.
Abbreviated units of measurement are wrapped in an
<abbr class="temperature">
element.
Scansion
Scansion is the representation of the metrical stresses in lines of verse.
×
(U+00d7) indicates an unstressed sylllable and/
(U+002f) indicates a stressed syllable. They are separated from each other with no-break spaces (U+00A0).Lines of poetry listed on a single line (like in a quotation) are separated by a space, then a forward slash, then a space. Capitalization is preserved for each line.
Legal cases and terms
Legal cases are set in italics.
Either
versus
orv.
are acceptable in the name of a legal case; if usingv.
, a period follows thev.
, and it is wrapped in an<abbr>
element.
Morse code
Any Morse code that appears in a book is changed to fit Standard Ebooks’ format.
American Morse Code
Middle dot glyphs (
·
or U+00B7) are used for the short mark or dot.En dash (
–
or U+2013) are used for the longer mark or short dash.Em dashes (
—
or U+2014) are used for the long dash (the letter L).If two en dashes are placed next to each other, a hair space ( or U+200A) is placed between them to keep the glyphs from merging into a longer dash.
Only in American Morse Code, there are internal gaps used between glyphs in the letters C, O, R, or Z. No-break spaces (U+00A0) are used for these gaps.
En spaces (U+2002) are used between letters.
Em spaces (U+2003) are used between words.
Citations
Citations are wrapped in a
<cite>
element.Citations that are the source of a quote are preceded by a space and an em dash, within the
<cite>
element.Citations within a
<blockquote>
element have the<cite>
element as the last direct child of the<blockquote>
parent.