Section titles and ordinals
Section ordinals in the body text are set in Roman numerals.
Section ordinals in a file’s
<title>element are set in Arabic numerals.
Section titles are titlecased according to the output of
se titlecase. Section titles are not all-caps or small-caps.
Section titles do not have trailing periods.
Chapter titles omit the word
Chapter, unless the word used is a stylistic choice for prose style purposes. Chapters with unique identifiers (i.e. not
Chapter, but something unique to the style of the book, like
Stave) do include that unique identifier in the title.
In special cases it may be desirable to retain
Chapterfor clarity. For example, Frankenstein has “Chapter” in titles to differentiate between the “Letter” sections.
Using both italics and quotes (outside of the context of quoted dialog) is usually not necessary. Either one or the other is used, with rare exceptions.
Words and phrases that require emphasis are italicized with the
Strong emphasis, like shouting, may be set in small caps with the
When a short phrase within a longer clause is italicized, trailing punctuation that may belong to the containing clause is not italicized.
When an entire clause is italicized, trailing punctuation is italicized, unless that trailing punctuation is a comma at the end of dialog.
Words written to be read as sounds are italicized with
Italicizing individual letters
Individual letters that used in context as a grapheme are italicized with an
<i epub:type="z3998:grapheme">element. They are typically lowercased and not followed by periods.
Individual letters that are not graphemes (for example letters that might be referring to names, the shapes of the letters themselves, or concepts) are not italicized.
nthis set with an italicized
n, without a hyphen.
Italicizing non-English words and phrases
Non-English words and phrases that are not in Merriam-Webster are italicized, unless they are in a non-Roman script like Chinese or Japanese.
Non-English words that are proper names, or are in proper names, are not italicized, unless the name itself would be italicized according to the rules for italicizing or quoting names and titles. Such words are wrapped in a
<span xml:lang="LANGUAGE">element, to assist screen readers with pronunciation.
If certain non-English words are used so frequently in the text that italicizing them at each instance would be distracting to the reader, then only the first instance is italicized. Subsequent instances are wrapped in a
Words and phrases that are originally non-English in origin, but that can now be found in Merriam-Webster, are not italicized.
Inline-level italics are set using the
<i>element with an
xml:langattribute corresponding to the correct IETF language tag.
Block-level italics are set using an
xml:langattribute on the closest encompassing block element, with the style of
In this example, note the additional namespace declaration, and that we target descendants of the
<body>element; otherwise, the entire
<body>element would receive italics!
Words that are in a non-English “alien” language (i.e. one that is made up, like in a science fiction or fantasy work) are italicized and given an IETF languate tag in a custom namespace. Custom namespaces begin consist of
TAGis a custom descriptor of 8 characters or less.
Italicizing or quoting newly-used English words
When introducing new terms, non-English or technical terms are italicized, but terms composed of common English are set in quotation marks.
English neologisms in works where a special vocabulary is a regular part of the narrative are not italicized. For example science fiction works may necessarily contain made-up English technology words, and those are not italicized.
Italics in names and titles
Place names, like pubs, bars, or buildings, are not quoted.
The names of publications, music, and art that can stand alone are italicized; additionally, the names of transport vessels are italicized. These include, but are not limited to:
Periodicals like magazines, newspapers, and journals.
Publications like books, novels, plays, and pamphlets, except “holy texts,” like the Bible or books within the Bible.
Long poems and ballads, like the Iliad, that are book-length.
Long musical compositions or audio, like operas, music albums, or radio shows.
Long visual art, like films or a TV show series.
Visual art, like paintings or sculptures.
Transport vessels, like ships.
The names of short publications, music, or art, that cannot stand alone and are typically part of a larger collection or work, are quoted. These include, but are not limited to:
Short musical compositions or audio, like pop songs, arias, or an episode in a radio series.
Short prose like novellas, shot stories, or short (i.e. not epic) poems.
Chapter titles in a prose work.
Essays or individual articles in a newspaper or journal.
Short visual art, like short films or episodes in a TV series.
Binomial names (generic, specific, and subspecific) are italicized with a
<i>element having the
Family, order, class, phylum or division, and kingdom names are capitalized but not italicized.
If a taxonomic name is the same as the common name, it is not italicized.
The second part of the binomial name follows the capitalization style of the source text. Modern usage requires lowercase, but older texts may set it in uppercase.
In general, capitalization follows modern English style. Some very old works frequently capitalize nouns that today are no longer capitalized. These archaic capitalizations are removed, unless doing so would change the meaning of the work.
Titlecasing, or the capitalization of titles, follows the formula used in the
Text in all caps is almost never correct typography. Instead, such text is changed to the correct case and surround with a semantically-meaningful element like
<strong>(for strong emphasis, like shouting) or
<b>(for unsemantic formatting required by the text).
<b>are styled in small-caps by default in Standard Ebooks.
When something is addressed as an apostrophe,
Paragraphs that directly follow another paragraph are indented by 1em.
The first line of body text in a section, or any text following a visible break in text flow (like a header, a scene break, a figure, a block quotation, etc.), is not indented.
For example: in a block quotation, there is a margin before the quotation and after the quotation. Thus, the first line of the quotation is not indented, and the first line of body text after the block quotation is also not indented.
Epigraphs in chapters have the quote source set in small caps, without a leading em-dash and without a trailing period.
Ligatures are two or more letters that are combined into a single letter, usually for stylistic purposes. In general they are not used, and are replaced with their respective characters.
Punctuation and spacing
Sentences are single-spaced.
Periods and commas are placed within quotation marks; i.e. American-style punctuation is used, not logical (AKA “British” or “new”) style.
Ampersands in names of things, like firms, are surrounded by no-break spaces (U+00A0).
Some older works include spaces in common contractions; these spaces are removed.
“Curly” or typographer’s quotes, both single and double, are always used instead of straight quotes. This is known as “American-style” quotation, which is different from British-style quotation which is also commonly found in both older and modern books.
Quotation marks that are directly side-by-side are separated by a hair space ( or U+200A) character.
Words with missing letters represent the missing letters with a right single quotation mark (
’or U+2019) character to indicate elision.
The ellipses glyph (
…or U+2026) is used for ellipses, instead of consecutive or spaced periods.
When ellipses are used as suspension points (for example, to indicate dialog that pauses or trails off), the ellipses are not preceded by a comma.
Ellipses used to indicate missing words in a quotation require keeping surrounding punctuation, including commas, as that punctuation is in the original quotation.
A hair space ( or U+200A) glyph is located before all ellipses that are not directly preceded by punctuation, or that are directly preceded by an em-dash or a two- or three-em-dash.
A regular space is located after all ellipses that are not followed by punctuation.
A hair space ( or U+200A) glyph is located between an ellipses and any punctuation that follows directly after the ellipses, unless that punctuation is a quotation mark, in which case there is no space at all between the ellipses and the quotation mark.
There are many kinds of dashes, and the run-of-the-mill hyphen is often not the correct dash to use. In particular, hyphens are not used for things like date ranges, phone numbers, or negative numbers.
Dashes of all types do not have white space around them.
Figure dashes (
‒or U+2012) are used to indicate a dash in numbers that aren’t a range, like phone numbers.
-or U+002D) are used to join words, including double-barrel names, or to separate syllables in a word.
Minus sign glyphs (
−or U+2212) are used to indicate negative numbers, and are used in mathematical equations instead of hyphens to represent the “subtraction” operator.
–or U+2013) are used to indicate a numerical or date range; to indicate a relationships where two concepts are connected by the word “to,” for example a distance between locations or a range between numbers; or to indicate a connection in location between two places.
— or U+2014) are typically used to offset parenthetical phrases.
Em-dashes are preceded by the invisible word joiner glyph (U+2060).
Interruption in dialog is set by a single em-dash, not two em-dashes or a two-em-dash.
Em-dashes are used for partially-obscured years.
A regular hyphen is used in partially obscured years where only the last number is obscured.
A two-em-dash (
⸺or U+2E3A) preceded by a word joiner glyph (U+2060) is used in partially obscured word.
A three-em-dash (
⸻or U+2E3B) is used for completely obscured words.
Numbers, measurements, and math
Coordinates are set with the prime (
′or U+2032) or double prime (
″or U+2033) glyphs, not single or double quotes.
Ordinals for Arabic numbers are as follows:
Roman numerals are not followed by trailing periods, except for grammatical reasons.
Roman numerals are set using ASCII, not the Unicode Roman numeral glyphs.
Roman numerals are not followed by ordinal indicators.
Fractions are set in their appropriate Unicode glyph, if a glyph available; for example,
¾and U+00BC–U+00BE and U+2150–U+2189.
If a fraction doesn’t have a corresponding Unicode glyph, it is composed using the fraction slash Unicode glyph (
⁄or U+2044) and superscript/subscript Unicode numbers. See this Wikipedia entry for more details.
Dimension measurements are set using the Unicode multiplication glyph (
×or U+00D7), not the ASCII letter
Feet and inches in shorthand are set using the prime (
′or U+2032) or double prime (
″or U+2033) glyphs (not single or double quotes), with a no-break space (U+00A0) separating consecutive feet and inch measurements.
When forming a compound of a number and unit of measurement in which the measurement is abbreviated, the number and unit of measurement are separated with a no-break space (U+00A0), not a dash. For exceptions in money, see 188.8.131.52.1.
Punctuation in abbreviated measurements
See here for general abbreviation rules that also apply to measurements.
Abbreviated SI units are set in lowercase without periods. They are not initialisms.
Abbreviated English, Imperial, or US customary units that are one word are set in lowercase with a trailing period. They are not initialisms.
The one exception is
G-force), which is an initialism that is set without a period.
Abbreviated English, Imperial, or US customary units that are more than one word (like
miles per hour) are set in lowercase without periods. They are not initialisms.
In works that are not math-oriented or that don’t have a significant amount of mathematical equations, equations are set using regular HTML and Unicode.
Operators and operands in mathematical equations are separated by a space.
Operators like subtraction (
−or U+2212), multiplication (
×or U+00D7), and equivalence (
≡or U+2261) are set using their corresponding Unicode glyphs, not a hyphen or
x. Almost all mathematical operators have a corresponding special Unicode glyph.
In works that are math-oriented or that have a significant amount of math, all variables, equations, and other mathematical objects are set using MathML.
When MathML is used in a file, the
mnamespace is declared at the top of the file and used for all subsequent MathML code, as follows:
This namespace is declared and used even if there is just a single MathML equation in a file.
When possible, Content MathML is used over Presentational MathML. (This may not always be possible depending on the complexity of the work.)
<m:math>element has an
alttextattribute describes the contents in the element in plain-text Unicode according to the rules in this specification.
Operators in the
alttextattribute are surrounded by a single space.
When using Presentational MathML,
<m:mrow>is used to group subexpressions, but only when necessary. Many elements in MathML, like
<m:mrow>, and redundant elements are not desirable. See this section of the MathML spec for more details.
If a Presentational MathML expression contains a function, the invisible Unicode function application glyph (U+2061) is used as an operator between the function name and its operand. This element looks exactly like the following, including the comment for readability:
<m:mo><!--hidden U+2061 function application--></m:mo>. (Note that the preceding element contains an invisible Unicode character! It can be revealed with the
Expressions grouped by parenthesis or brackets are wrapped in an
<m:row>element, and fence characters are set using the
<m:mo fence="true">element. Separators are set using the
<m:mfenced>, which used to imply both fences and separators, is deprecated in the MathML spec and thus is not used.
If a MathML variable includes an overline, it is set by combining the variable’s normal Unicode glyph and the Unicode overline glyph (
‾or U+203E) in a
<m:mover>element. However in the
alttextattribute, the Unicode overline combining mark (U+0305) is used to represent the overline in Unicode.
Typographically-correct symbols are used for currency symbols.
Currency symbols are not abbrevations.
£sd shorthand is a way of denoting pre-decimal currencies (pounds, shillings, and pence) common in England and other parts of the world until the 1970s.
There is no white space between a number and an £sd currency symbol.
Letters used in £sd shorthand are wrapped in
sic) that can be found in a modern dictionary are not italicized. Examples include
etc.. The exception is
sic, which is always italicized.
Whole passages of Latin language and Latinisms that aren’t found in a modern dictionary are italicized.
&c.is not used, and is replaced with
Ibid., see Endnotes.
Latinisms that are abbreviations are set in lowercase with periods between words and no spaces between them, except
CE, which are set without periods, in small caps, and wrapped with
Initials and abbreviations
Acronyms (terms made up of initials and pronounced as one word, like
NATO) are set in small caps, without periods, and are wrapped in an
<abbr class="acronym">element with corresponding CSS.
Initialisms (terms made up of initials in which each initial is pronounced separately, like
U.S.S.R.) are set with periods and without spaces (with some exceptions that follow) and are wrapped in an
When an abbreviation that is not an acronym contains a terminal period, its
<abbr>element has the additional
eocclass (End of Clause) if the terminal period is also the last period in clause. Such sentences do not have two consecutive periods.
Initials of people’s names are each separated by periods and spaces. The group of initials is wrapped in an
Academic degrees are wrapped in an
<abbr class="degree">element. Degrees that consist of initials are set with a period between each initial. Degrees that consist of initials followed by abbreviated words are set with a hair space before the word.
Some degrees are exceptions:
LL.D.does not have a period in
LL, because it indicates the plural
Postal codes and abbreviated US states are set in all caps, without periods or spaces, and are wrapped in an
Abbreviations that are abbreviations of a single word, and that are not acronyms or initialisms (like
lbs.) are set with
Abbreviations ending in a lowercase letter are set without spaces between the letters, and have a trailing period.
Abbreviations without lowercase letters are set without spaces and without a trailing period.
Abbreviations that describes the next word, like
St., are set with a no-break space (U+00A0) between the abbreviation and its target.
Compass points are separated by periods and spaces. The group of points are wrapped in an
Exceptions that are not abbreviations
The following are not abbreviations, and are set without periods or spaces.
The following are initialisms, but are set without periods or spaces:
G, when used in the sense of
G-force. Also see 184.108.40.206.2.
Stock ticker symbols.
The following are abbreviations, but are not initialisms. Unlike almost all other abbreviations, they are in all caps and only have a period at the end.
A.B.C., when used in the sense of the alphabet, is not an abbreviation, and is set with periods between the letters. But other uses, like
A.B.C. shops, are abbreviations.
Times in a.m. and p.m. format are set in lowercase, with periods, and without spaces.
p.m.are wrapped in an
Times as digits
Digits in times are separated by a colon, not a period or comma.
Times written in digits followed by
p.m.are set with a no-break space (U+00A0) between the digit and
Times as words
Words in a spelled-out time are separated by spaces, unless they appear before a noun, where they are separated by a hyphen.
Times written in words followed by
p.m.are set with a regular space between the time and
Military times that are spelled out (for example, in dialog) are set with dashes. Leading zeros are spelled out as
Chemicals and compounds
Molecular compounds are set in Roman, without spaces, and wrapped in an
Elements in a molecular compound are capitalized according to their listing in the periodic table.
Amounts of an element in a molecular compound are set in subscript with a
The minus sign glyph (
−or U+2212), not the hyphen glyph, is used to indicate negative numbers.
Either the degree glyph (
°or U+00B0) or the word
degreesis acceptable. Works that use both are normalized to use the dominant method.
Abbreviated units of temperature
Units of temperature measurement, like Farenheit or Celcius, may be abbreviated to
Units of temperature measurement do not have trailing periods.
If an abbreviated unit of temperature measurement is preceded by a number, the unit of measurement is first preceded by a hair space ( or U+200A).
Abbreviated units of measurement are set in small caps.
Abbreviated units of measurement are wrapped in an
Scansion is the representation of the metrical stresses in lines of verse.
×(U+00d7) indicates an unstressed sylllable and
/(U+002f) indicates a stressed syllable. They are separated from each other with no-break spaces (U+00A0).
Lines of poetry listed on a single line (like in a quotation) are separated by a space, then a forward slash, then a space. Capitalization is preserved for each line.
Legal cases and terms
Legal cases are set in italics.
v.are acceptable in the name of a legal case; if using
v., a period follows the
v., and it is wrapped in an
Any Morse code that appears in a book is changed to fit Standard Ebooks’ format.
American Morse Code
Middle dot glyphs (
·or U+00B7) are used for the short mark or dot.
En dash (
–or U+2013) are used for the longer mark or short dash.
Em dashes (
—or U+2014) are used for the long dash (the letter L).
If two en dashes are placed next to each other, a hair space ( or U+200A) is placed between them to keep the glyphs from merging into a longer dash.
Only in American Morse Code, there are internal gaps used between glyphs in the letters C, O, R, or Z. No-break spaces (U+00A0) are used for these gaps.
En spaces (U+2002) are used between letters.
Em spaces (U+2003) are used between words.
Citations are wrapped in a
Citations that are the source of a quote are preceded by a space and an em dash, within the
Citations within a
<blockquote>element have the
<cite>element as the last direct child of the