Source Data Format for OpenType Layout Tables¶
Source Format Definition¶
This document assumes familiarity with the OpenType specification where it introduces the Layout tables. Detailed knowledge of the inner workings of these tables is not needed, as long as one understands what they do and how to put them to use.
Beginning in 1999 Monotype needed the ability to create, view and debug fonts with OpenType Layout tables. A text representation of the OpenType tables (GSUB, GPOS, and GDEF) was created that follows the actual structure of the tables so that these source files could be compiled into binary format and vice versa. This document describes the format of such source files. With appropriate tools such files can be converted to binary data and inserted into fonts.
In formulating the source format, the choice was made to use a text representation that follows the actual structure of the tables, but machine readable translated into human readable as much as possible. So there is a separation of lookups from each other and from the assignment of lookups to features and features to languages and scripts, rather than using a description that mixes these elements together. When a GSUB or GPOS source is compiled and decompiled, the results should match. There may be differences in the ordering within lookups however, because the binary encoding usually forces a certain glyph index ordering, which does not need to be present in the text source.
For the most part the OpenType Layout tables consist of rather simple lists: glyph substitution lists, glyph enumeration or classification lists, glyphs with values. These lists lend themselves very well to be viewed and edited in rows and columns. The text sources are therefore made up of tab (ascii 9, ‘\t’, etc.) delimited lines, that can easily be exchanged with spreadsheet documents, without mark-up tags or key-words getting in the way. However care should be taken to remove trailing tabs from text files before compiling since the tabs are often used to count the number of glyphs or values in a line and it is possible to confuse a compiler with “empty entries” in a line. Most text editors allow a Replace All operation on the sequence “tab-return” into “return”. This needs to be repeated until 0 replacements are reported. In this document tabs are referred to by <T>.
The OpenType Layout (OTL) source files may be used to represent three tables: GSUB, GPOS, and GDEF.
- The first line of a text file representing a GSUB is:
FontDame GSUB table
- The first line of a text file representing a GPOS is:
FontDame GPOS table
- The first line of a text file representing a GDEF is:
FontDame GDEF table
These first lines are case sensitive. Other keywords are not case sensitive. However one is encouraged to use lowercase for legibility!
GPOS and GSUB tables consist of three types of elements: one script list per table, one feature list and any number of lookups. Each of these elements are represented by a block starting with a keyword. Any line outside these blocks, not starting with a keyword, is ignored by the compiler and can be used for comments.
There is one additional keyword for the GPOS table only. The EM keyword at the beginning of the file, before the first lookup, allows to declare the EM-square that the font unit values are referring to and should allow the transfer of values between fonts with different EM resolutions. Without the declaration of the EM within the GPOS text file, it is assumed to equal to that of the font, the file is used for. The format is:
EM <T> decimal value
The inclusion of the EM value is informational. It also allows the sharing of GPOS source between TTF and OTF fonts that use different EM values, if used with a compiler that re-scales values appropriately.
Glyphs are usually represented by their name from the ‘post’ table or ‘CFF ‘ table. If there is no format 2 post table or no names stored in the ‘CFF ‘ table then it is possible to refer to glyphs by Unicode or by glyph index.
To refer to a glyph by Unicode use U or u followed by a space, followed by the hexadecimal Unicode. To refer to a glyph by index use a # followed by a space followed by the index in decimal. For example: in a normal Latin font in standard order the lowercase a can be referred to in the following ways:
a U 0061 u 0061 # 68
When decompiling a table to OTL Source format, a tool should use the ‘post’ or ‘CFF ‘ table name when possible. If no name can be obtained from either of those, it next attempts to use a Unicode reference. If a Unicode cannot be obtained, the last resort is using the glyph index. In this document glyphs in any of these 3 formats are referred to as <Gl>.
In OpenType Layout tables all info regarding substitutions and positioning is stored in so-called lookup tables: lists with glyphs and values. In text format of each of these lists begins with:
lookup <T> label <T> lookup type
and ends with:
- A label can be anything which doesn’t have a tab or return in it, a description of the content for example. In decompiled lookups it is the index number of the lookup. These labels are used in the feature table to tie the lookup to a feature. They are also used in context or chained context lookups to tie a rule to a lookup.
- Lookup Type
Describes the type of lookup. The following keywords are used to identify the various lookup types:
- single: GSUB lookup type 1 or GPOS lookup type 1
- multiple: GSUB lookup type 2
- alternate: GSUB lookup type 3
- ligature: GSUB lookup type 4
- pair or kernset: GPOS lookup type 2
- cursive: GPOS lookup type 3
- mark to base: GPOS lookup type 4
- mark to ligature: GPOS lookup type 5
- mark to mark: GPOS lookup type 6
- context: GSUB lookup type 5 or GPOS lookup type 7
- chained: GSUB lookup type 6 or GPOS lookup type 8
- reversechained: GSUB lookup type 8
- Lookup Flags
Each lookup has four flags values and an optional mark attachment type associated with it, which can be set at the beginning of the lookup block:
FlagType <T> FlagSetting
The flag settings are
no. Flag types are:
When set to
yes, indicates that the lookup applies to right to left scripts such as Arabic. As of OpenType 1.3 only to be used for cursive attachment lookups, but it is often set for Arabic in older fonts.
When the lookup is applied, all glyphs classified as base in the GDEF table are ignored, when set to
When the lookup is applied, all glyphs classified as ligature in the GDEF table are ignored, when set to
When the lookup is applied, all glyphs classified as mark in the GDEF table are ignored, when set to
If a flag is not listed in the lookup, the default value
- Mark Attachment
The Mark Attachment Type is an optional part of the lookup flags which can be used to have marks ignored except for one specified class of marks:
markattachmenttype <T> mark class number
For example, in Telugu, this is used where subscript consonant marks need to be attached to base consonant glyphs with a mark to base lookup, while between base and subscripts there can be other types of marks. By setting the mark attachment type to the class of the subscript marks, the other marks are ignored. By having
markattachmenttypespecifying a class, you have an Ignoremarks setting which is between yes and no: “IgnoreMarks other than these”.
In order to use this you need a GDEF with a MarkAttachmentClass table.
- Mark Filtering
Another optional lookup flag is
markfiltertype. This key, followed by a <T> and a single number representing a glyph set index (defined in the GDEF table) allows behavior similar to mark attachment classes. The advantage of this over mark attachment classes is that a glyph may belong to multiple glyph sets.
Note that the use of this is mutually exclusive with
markattachmenttypeper lookup. As such, if both flags are encountered for a single lookup, only the
markfiltertypeshould be used.
The set number specified for
markfiltertypemust be defined in the GDEF. If the set number is out-of-range, a warning should be issued and no filtering should be applied.
subtable endend to mark a subtable boundary. Use
subtable end) to terminate the final subtable of a lookup (or lookups with only one subtable).
The feature table is used to assign lookups to features. These assignments are done within a block starting with:
feature table begin
and ending with:
feature table end
Within the block assignments are done in one line for each feature:
index <T> feature tag <T> list of lookups
The index is a number starting with 0 for the first entry. These numbers are ignored by the compiler but can be used as a reminder when the features are assigned to a script or language in the script table.
The feature tag is one of the four character feature tags as they are published in the OpenType specification.
The list of lookups contains the label of one or more lookup tables, separated by commas. If for some reason (it is sometimes necessary) to declare a feature without lookups, then a hyphen takes in the position of the list.
Normally features are expected to be listed in alphabetical order in feature lists, in the same way that they are stored in the font. But it is often easier to have a source file with the features in the order in which they are applied, i.e. following the lookup order. Features are expected to be sorted when the feature list is compiled.
The script table is used to assign features to scripts or languages. These assignments are done within a block starting with:
script table begin
and (predictably) ending with:
script table end
Within the block assignments are done in one line for each script/language:
script tag <T> language tag <T> requiredFeature <T> list of features
The script tag is one of the four character script tags as they are published in the OpenType specification.
The language tag is one of the 3 character language script tags as published in
the OpenType specification. If the features need to be applied to the script in
general instead if one language in particular use the keyword
The required feature entry can contain one index number of one of the features in the feature table. However in most cases this field is empty, as it is an OpenType recommendation not to use required features. Make sure an extra <T> for an empty required feature is present.
The list of features contains one or more indices of feature definitions as they are used in the feature table. Entries are separated by commas. It is sometimes necessary to have an entry of a particular script in either GPOS or GSUB, although there are no features using that table, but without the entry the features in the other table are ignored. In that case the entry in the list of features field can be empty as well.
Two entries per line; replaces the input glyph by the output glyph:
<input Gl> <T> <output Gl>
At last two entries per line; replaces the input glyph by one or more output glyphs:
<input Gl> <T> <output Gl 1> <T> <output Gl 2> ....
At least two entries per line; Replaces the input glyph by one of the output glyphs:
<input Gl> <T> <output Gl 1> <T> <output Gl 2> ....
At least two entries per line; Replaces the input glyphs by one ligature (the first entry):
<ligature Gl> <T> <input Gl 1> <T> <input Gl 2> ....
position type <T> <Gl> <T> value
pair or kernset (positioning)¶
It is possible to do pair kerning in two formats: one is a list of kern pair glyphs in the same way as we do in kern tables or in AFM files. This is most suited for small tables. The second is assigning glyphs to classes. Then the kerning values are declared for combinations of these classes. This format is best suited for larger kern tables with many accented characters sharing metrics. The glyph list format uses lines in this format:
position type <T> <LeftGl> <T> <RightGl> <T> value
Keyword and three entries per line. The type of positioning (advance modification of x/y movement) is specified by the keyword, the amount by the value. The Left glyph needs to be interpreted as being the first in the text (in case of right-to-left fonts it is therefore the right glyph).
The class format starts by two lists assigning glyphs to left/first and right/second classes. It is then followed by a list similar to the glyph list format, with class indices taking the place of glyph references:
firstclass definition begin <Gl> <T> class class definition end secondclass definition begin <Gl> <T> class class definition end position type <T> firstClass <T> secondClass <T> value
Class definitions have two entries per line, glyph reference and class index. Start class values at 1. The class value 0 refers to all glyphs that aren’t assigned to another class.
kernset is a combination of the
class formats: a
subtable of glyph pairs precedes a subtable of class pairs, with the glyph
pairs acting as “exceptions” to the class pair definitions.
The Anchor type keyword is either
entry for where a stroke enters the glyph or
exit where a stroke leaves the glyph. The anchor coordinate is the location of the anchor. The index of the point of the outline at the coordinate is optional.:
Anchor type <T> <Gl> <T> anchorCoordinate <T> pointIndex
mark to base and mark to mark (positioning)¶
The GlyphType keyword is either
mark. It is followed by the glyph reference and the Anchor class, which is a decimal index number. The coordinate of the anchor point is required, but the index of the point in the outline at the coordinate is optional:
GlyphType <T> <Gl> <T> anchorClass <T> anchorCoordinate <T> pointIndex
In a mark to mark lookup, the marks that other marks are attached to (“base marks”) use the
In scripts such as the Indics where ligatures are used as single entities, accent positioning on these glyphs can be done with a mark to base rather than a mark to ligature lookup.
Within a lookup marks can belong to one class only. If there is more than one class of marks, all base glyphs are expected to have an anchor for each class. For marks with several anchors or when base glyphs do not have anchors for all classes of marks, it is recommended to use separate lookups for the classes.
mark to ligature (positioning)¶
This type of lookup differs from mark to base in that it allows marks to be positioned over a specific component part of a ligature glyph. The declaration of attachment points on marks is identical to that in mark to base lookups:
mark <T> <Gl> <T> anchorClass <T> anchorCoordinate <T> pointIndex
mark keyword is followed by the glyph reference and the Anchor class. The coordinate of the anchor point is required, but the index of the point in the outline at the coordinate is optional.
The declarations for the attachment points include the index of the component, and the total number of components which the ligature represents:
ligature <T> <Gl> <T> componentIndex <T> componentCount <T> anchorClass <T> anchorCoordinate <T> pointIndex
The index used for the first component is 1.
Fonts can support any number of anchor classes. However marks can only have 1 anchor point and belong to 1 class only within a single lookup. Use separate lookups for cases where marks have more than one attachment point for use in a mark to ligature table.
If the lowest anchor number isn’t 0 or if there are gaps in the class numbering, the font tool should renumber or this needs to be done manually.
context (substitution and positioning)¶
This type of lookup is not used to directly list positioning or substitution actions. In a context table a set of rules is listed, where separate lookups are applied to certain glyph sequences. The general format of a rule is:
context type <T> inputSequence <T> action1 <T> action2 ...
First the type of context is declared. The input sequence consists of two or more entries, each representing a glyph or one of a group of glyphs. There can be one or more action items, which are used to declare what lookup is applied to which member of the sequence. These lookups follow the context lookup they are referenced by.
There are three types of contexts, each having its own keyword.
glyph: In this type the input sequence is a list of two or more glyph references separated by commas. For example to apply a kerning positioning to a F in the context in which F is followed by a period and a quoteright:
glyph <T> F, period, quoteright <T> 1,F-kern-lookup-label
In this type of context no additional information is required.
class: In this type the input sequence is a list of two or more glyph classes separated by commas. The list of rules is preceded by a class definition. For example to apply a kerning positioning to a F in the context in which F is followed by a period or a comma and a quoteright or a quotesingle:
class definition begin F <T> 1 period <T> 2 comma <T> 2 quoteleft <T> 3 quotesingle <T> 3 class definition end class <T> 1, 2, 3 <T> 1,F-kern-lookup-label
Start numbering classes by 1. the class 0 can be used to refer to any glyph not classified otherwise.
coverage: In this type it is possible to have one rule per sub-table only. The rule is preceded by two or more coverage tables, which are lists of glyphs. The input sequence refers to these lists. This works a bit like the class format, with the difference that a glyph can be in more than one list/class. The example with the F would look like this:
coverage definition begin <T> 0 F coverage definition end coverage definition begin <T> 1 period comma coverage definition end coverage definition begin <T> 2 quoteleft quotesingle coverage definition end coverage <T> 1,F-kern-lookup-label
Note that in the rule there is no entry for the sequence, because there is only one sequence possible. Each coverage table has an index number, starting with zero for the first.
chained (substitution and positioning)¶
This type of lookup is not used to directly list positioning or substitution actions. In a context table a set of rules is listed, where separate lookups are applied to certain glyph sequences. It differs from the context lookup in that the context from the glyphs that are modified is stored separately: a chain of contexts rather than one context. The general format of a rule is:
context type <T> backtrackSequence <T> inputSequence <T> lookaheadSequence <T> action1 <T> action2 ...
First the type of context is declared. The input sequence consists of one or more entries, each representing a glyph or one of a group of glyphs. These are the glyphs that the actions are applied to. The backTrack and lookAhead sequences are used to give the context of the glyphs that are modified, but elements in these sequences cannot be modified themselves. A context can have both a backtrack and a lookhead, or just one of them. If that is the case them the entry in the rule can remain empty, but the number of <T>’s has to be complete. There can be one or more action items, which are used to declare what lookup is applied to which member of the sequence. These lookups follow the context lookup they are referenced by.
There are three types of contexts, each having its own keyword.
glyph: In this type the sequences are lists of one or more glyph references separated by commas. For example to apply a kerning positioning to a period in the context in which F is followed by a period and a quoteright:
glyph <T> F <T> period <T> quoteright <T> 1,F-kern-lookup-label
In this type of context no additional information is required.
class-chain: In this type the sequence are lists of one or more glyph classes separated by commas. The list of rules is preceded by a set of class definition. For example to apply a kerning positioning to a comma or period in the context in which F is followed by a period or a comma and a quoteright or a quotesingle:
backtrackclass definition begin F <T> 1 class definition end class definition begin period <T> 1 comma <T> 1 class definition end lookaheadclass definition begin quoteleft <T> 1 quotesingle <T> 1 class definition end class-chain <T> 1 <T> 1 <T> 1 <T> 1,F-kern-lookup-label
Start numbering classes by 1. The class 0 can be used to refer to any glyph not classified otherwise. The backtrack and lookahead definitions are optional, but at least one of them has to be present. More about this here...
coverage: In this type it is possible to have one rule per sub-table only. The rule is preceded by two or more coverage tables, which are lists of glyphs. The sequence refer to these lists. This works a bit like the class format, with the difference that a glyph can be in more than one list/class. The example with the F would look like this:
backtrackcoverage definition begin F coverage definition end inputcoverage definition begin period comma coverage definition end lookaheadcoverage definition begin quoteleft quotesingle coverage definition end coverage <T> 1,F-kern-lookup-label
Note that in the rule there is no entry for the sequence, because there is only one sequence possible. There can be any number of these coverage definitions. At least one inputcoverage table and at least one of the others are required.
This format combines chained context rules with single substitutions within a lookup. Either a backtrack coverage definition or lookahead coverage definition (or both) defining glyphs for the contexts, followed by single (glyph) substitution definitions:
backtrackcoverage definition begin <Gl1> ... <GLn> coverage definition end lookaheadcoverage definition begin <Gl1> ... <Gln> coverage definition end <input Gl1> <T> <output Gl1> ... <input Gln> <T> <output Gln>
It is important to note that for
reversechained lookups, processing of the
input glyph sequence goes from the end to start, opposite of other lookup
types. This type is designed for Arabic and scripts like it where the shape of
a glyph is determined by a following glyph. See the OpenType specification for more
More about chained context¶
Suppose we have an input context of three glyphs, where we need to change the middle if it is surrounded by two others:
rule 1: If ABC then ACC
rule 2: If CCC then CDC
These input sequences can appear separate, but also together, then
xxxxABCCxxxx needs to become xxxxACDCxxxx
The problem with normal context lookups is that the pointer is moved after the input context sequence after a match has been found. After ABC has been turned into ACC the pointer is moved to investigate the next input sequence Cxx and the CCC sequence is overlooked.
In the chained context lookup you can separate the input part from the rest of the context. This allows you to look for context, while moving the pointer one step at the time, by having a single glyph as the input. That means that when you check the next input sequence, you can take into account the change just made, because the previous input sequence has become the context of the next.
In a context lookup we have one set of classes, which you use to define the context for the rules. But in a chained context lookup in class format we have also a set of classes for the context before the input (the backtrack classes) and a set for what comes after (the lookahead classes).
Of course you can have both lookahead and backtrack, or just one of the two.
Within these separate class tables you count the classes separately. In the sample above we see three class 1 items in the sequence: these are class 1 items from different classifications. It isn’t saying bird #1, bird #1, bird #1. It is saying duck #1, parrot #1, seagull #1.
The number 1 in the rule is a different number all together, it doesn’t refer to glyphs. It refers to the position in the input sequence. The input sequence contains 1 glyph (out of the possibly more than one glyphs in the class list). So the rule says:
If you have input sequence of one of the glyphs class 1 in the input class table, surrounded by one of class 1 of the lookhead and backtrack class tables, then apply lookup “F-kern-lookup-label” to the first element of the input sequence. Then move the text pointer by the amount of the input sequence, which is one.
The typical way to generate a GDEF is to compile an MTap table.
The GDEF table is where numerous glyph properties, independent of the GPOS and GSUB, are defined. This includes things like glyph categorization (Base, Mark, Ligature, etc.), anchor points, ligature carets, and mark categorization.
Similar to GSUB and GPOS, the first line of a text file representing a GDEF is:
FontDame GDEF table
Glyphs defined in GDEF glyph classes are what are used in the GSUB and GPOS
lookup flags (
Classes can be a number from 1 to 4:
- 1 - Base
- 2 - Ligature
- 3 - Mark
- 4 - Component (note, these are not referenced by the GSUB or GPOS)
Glyphs not explicitly assigned to a class are considered to be in class 0 (“no class”). The syntax for defining glyph classes is:
class definition begin <Gl> <T> <classNumber> class definition end
Attachment Point List¶
Attachment (anchor) points are point numbers on the glyph outline which can be used as references for positioning glyphs. Define anchor points with a glyph reference followed by one or more point numbers, separated by <T>:
attachment list begin <Gl> <T> <point1> <T> <point2> ... attachment list end
Ligature Caret Lists¶
Ligature Carets are x-coordinates on a glyph, designating where the caret (cursor) should be placed when selecting a component of a ligature glyph. For example, an ‘fi’ ligature might contain a single ligature caret somewhere between the right side of the ‘f’ and the left side of the ‘i’ portion of the glyph.
Ligature carets are defined with a glyph reference, followed by a number designating the number of carets, then that number of integers which are x-coordinate values, defining the caret positions:
carets begin <Gl> <T> <numberOfCarets> <T> X1 <T> X2 ... carets end
Mark Attachment Classes¶
Mark Attachment Classes are an additional level of specifying marks. These can
be used in conjunction with the
markattachmenttype lookup flag to enable
lookups to ignore marks other than those that are part of the class specified
markattachmenttype flag as described above. A mark can only belong
to a single mark attachment class.
Create mark attachment classes with a glyph reference and a class number, separated by <T>:
mark attachment class definition begin <Gl> <T> <markClassNumber> class definition end
Mark Glyph Sets (Mark Filtering Sets)¶
Mark glyph sets, also known as mark filter sets, are somewhat similar to mark
attachment classes in that they allow a specific sub-category of marks to be
ignored. The main difference is that a glyph can belong to any number of mark
glyph sets This allows greater flexibility when using sets in lookups with the
markfiltertype flag as described above. However, like mark attachment
classes, only a single set may be specified per lookup.
To define mark filter sets, use a glyph reference followed by a set number. Begin set numbering at 0 (font tools should automatically renumber and report if you forget, but most important is that any references to mark glyph sets in GSUB or GPOS lookups must reference the renumbered sets. Syntax:
markfilter set definition begin <Gl> <T> <markSetNumber> set definition end
Other table source files can be represented as Font Chef table. These files begin with a first line indicating Font Chef format and the table tag as follows:
Font Chef Table <tag>
Following the tag entry on the first line are cmap subtable definitions. A subtable definition begins with a declaration of the form:
cmap subtable N
Next are a few lines indicating the platform, encoding, and language IDs and the format:
platformID <T> N encodingID <T> N format <T> N language <T> N
Following the header information are 2 <T>-separated entries to designate the code-to-glyph relationship:
0xXXXX <T> <glyphName>
An optional 3rd column appears in dump files to indicate the glyph index. This column is ignored when loading:
At the end of the subtable is the token:
Copyright © 2015 Monotype Imaging Inc. http://www.monotype.com/ All rights reserved.