Source Data Format for OpenType Layout Tables

Source Format Definition

Copyright © 2015 Monotype Imaging Inc. http://www.monotype.com/ All rights reserved. This document is available under a Creative Commons License.

Note

This document assumes familiarity with the OpenType specification where it introduces the Layout tables. Detailed knowledge of the inner workings of these tables is not needed, as long as one understands what they do and how to put them to use.

Introduction

Beginning in 1999 Monotype needed the ability to create, view and debug fonts with OpenType Layout tables. A text representation of the OpenType tables (GSUB, GPOS, and GDEF) was created that follows the actual structure of the tables so that these source files could be compiled into binary format and vice versa. This document describes the format of such source files. With appropriate tools such files can be converted to binary data and inserted into fonts.

In formulating the source format, the choice was made to use a text representation that follows the actual structure of the tables, but machine readable translated into human readable as much as possible. So there is a separation of lookups from each other and from the assignment of lookups to features and features to languages and scripts, rather than using a description that mixes these elements together. When a GSUB or GPOS source is compiled and decompiled, the results should match. There may be differences in the ordering within lookups however, because the binary encoding usually forces a certain glyph index ordering, which does not need to be present in the text source.

For the most part the OpenType Layout tables consist of rather simple lists: glyph substitution lists, glyph enumeration or classification lists, glyphs with values. These lists lend themselves very well to be viewed and edited in rows and columns. The text sources are therefore made up of tab (ascii 9, ‘\t’, etc.) delimited lines, that can easily be exchanged with spreadsheet documents, without mark-up tags or key-words getting in the way. However care should be taken to remove trailing tabs from text files before compiling since the tabs are often used to count the number of glyphs or values in a line and it is possible to confuse a compiler with “empty entries” in a line. Most text editors allow a Replace All operation on the sequence “tab-return” into “return”. This needs to be repeated until 0 replacements are reported. In this document tabs are referred to by <T>.

OpenType Layout Source Files

The OpenType Layout (OTL) source files may be used to represent three tables: GSUB, GPOS, and GDEF.

The first line of a text file representing a GSUB is:
FontDame GSUB table
The first line of a text file representing a GPOS is:
FontDame GPOS table
The first line of a text file representing a GDEF is:
FontDame GDEF table

These first lines are case sensitive. Other keywords are not case sensitive. However one is encouraged to use lowercase for legibility!

GPOS and GSUB tables consist of three types of elements: one script list per table, one feature list and any number of lookups. Each of these elements are represented by a block starting with a keyword. Any line outside these blocks, not starting with a keyword, is ignored by the compiler and can be used for comments.

There is one additional keyword for the GPOS table only. The EM keyword at the beginning of the file, before the first lookup, allows to declare the EM-square that the font unit values are referring to and should allow the transfer of values between fonts with different EM resolutions. Without the declaration of the EM within the GPOS text file, it is assumed to equal to that of the font, the file is used for. The format is:

EM <T> decimal value

Note

The inclusion of the EM value is informational. It also allows the sharing of GPOS source between TTF and OTF fonts that use different EM values, if used with a compiler that re-scales values appropriately.

Comments and Whitespace

Anything which is outside a block describing a lookup, the script table and feature table is ignored. That means you can use lines between these blocks as a place to put comments, such as reminders what lookups do. If needed it is possible to insert comments inside lookup blocks by having lines start with the percent sign:

% this is a comment.

Blank lines and white space are ignored.

Glyph References

Glyphs are usually represented by their name from the ‘post’ table or ‘CFF ‘ table. If there is no format 2 post table or no names stored in the ‘CFF ‘ table then it is possible to refer to glyphs by Unicode or by glyph index.

To refer to a glyph by Unicode use U or u followed by a space, followed by the hexadecimal Unicode. To refer to a glyph by index use a # followed by a space followed by the index in decimal. For example: in a normal Latin font in standard order the lowercase a can be referred to in the following ways:

a
U 0061
u 0061
# 68

When decompiling a table to OTL Source format, a tool should use the ‘post’ or ‘CFF ‘ table name when possible. If no name can be obtained from either of those, it next attempts to use a Unicode reference. If a Unicode cannot be obtained, the last resort is using the glyph index. In this document glyphs in any of these 3 formats are referred to as <Gl>.

Lookups

In OpenType Layout tables all info regarding substitutions and positioning is stored in so-called lookup tables: lists with glyphs and values. In text format of each of these lists begins with:

lookup <T> label <T> lookup type

and ends with:

lookup end
Label
A label can be anything which doesn’t have a tab or return in it, a description of the content for example. In decompiled lookups it is the index number of the lookup. These labels are used in the feature table to tie the lookup to a feature. They are also used in context or chained context lookups to tie a rule to a lookup.
Lookup Type

Describes the type of lookup. The following keywords are used to identify the various lookup types:

Lookup Flags

Each lookup has four flags values and an optional mark attachment type associated with it, which can be set at the beginning of the lookup block:

FlagType <T> FlagSetting

The flag settings are yes or no. Flag types are:

  • RightToLeft

    When set to yes, indicates that the lookup applies to right to left scripts such as Arabic. As of OpenType 1.3 only to be used for cursive attachment lookups, but it is often set for Arabic in older fonts.

  • IgnoreBaseGlyphs

    When the lookup is applied, all glyphs classified as base in the GDEF table are ignored, when set to yes.

  • IgnoreLigatures

    When the lookup is applied, all glyphs classified as ligature in the GDEF table are ignored, when set to yes.

  • IgnoreMarks

    When the lookup is applied, all glyphs classified as mark in the GDEF table are ignored, when set to yes.

If a flag is not listed in the lookup, the default value no is used.

Mark Attachment

The Mark Attachment Type is an optional part of the lookup flags which can be used to have marks ignored except for one specified class of marks:

markattachmenttype <T> mark class number

For example, in Telugu, this is used where subscript consonant marks need to be attached to base consonant glyphs with a mark to base lookup, while between base and subscripts there can be other types of marks. By setting the mark attachment type to the class of the subscript marks, the other marks are ignored. By having IgnoreMarks no and markattachmenttype specifying a class, you have an Ignoremarks setting which is between yes and no: “IgnoreMarks other than these”.

In order to use this you need a GDEF with a MarkAttachmentClass table.

Mark Filtering

Another optional lookup flag is markfiltertype. This key, followed by a <T> and a single number representing a glyph set index (defined in the GDEF table) allows behavior similar to mark attachment classes. The advantage of this over mark attachment classes is that a glyph may belong to multiple glyph sets.

Note that the use of this is mutually exclusive with markattachmenttype per lookup. As such, if both flags are encountered for a single lookup, only the markfiltertype should be used.

The set number specified for markfiltertype must be defined in the GDEF. If the set number is out-of-range, a warning should be issued and no filtering should be applied.

Subtables
Use % subtable or subtable end end to mark a subtable boundary. Use lookup end (not subtable end) to terminate the final subtable of a lookup (or lookups with only one subtable).

Feature Table

The feature table is used to assign lookups to features. These assignments are done within a block starting with:

feature table begin

and ending with:

feature table end

Within the block assignments are done in one line for each feature:

index <T> feature tag <T> list of lookups

The index is a number starting with 0 for the first entry. These numbers are ignored by the compiler but can be used as a reminder when the features are assigned to a script or language in the script table.

The feature tag is one of the four character feature tags as they are published in the OpenType specification.

The list of lookups contains the label of one or more lookup tables, separated by commas. If for some reason (it is sometimes necessary) to declare a feature without lookups, then a hyphen takes in the position of the list.

Feature table example

Note

Normally features are expected to be listed in alphabetical order in feature lists, in the same way that they are stored in the font. But it is often easier to have a source file with the features in the order in which they are applied, i.e. following the lookup order. Features are expected to be sorted when the feature list is compiled.

Script Table

The script table is used to assign features to scripts or languages. These assignments are done within a block starting with:

script table begin

and (predictably) ending with:

script table end

Within the block assignments are done in one line for each script/language:

script tag <T> language tag <T> requiredFeature <T> list of features

The script tag is one of the four character script tags as they are published in the OpenType specification.

The language tag is one of the 3 character language script tags as published in the OpenType specification. If the features need to be applied to the script in general instead if one language in particular use the keyword default.

The required feature entry can contain one index number of one of the features in the feature table. However in most cases this field is empty, as it is an OpenType recommendation not to use required features. Make sure an extra <T> for an empty required feature is present.

The list of features contains one or more indices of feature definitions as they are used in the feature table. Entries are separated by commas. It is sometimes necessary to have an entry of a particular script in either GPOS or GSUB, although there are no features using that table, but without the entry the features in the other table are ignored. In that case the entry in the list of features field can be empty as well.

Script table example


single (substitution)

Two entries per line; replaces the input glyph by the output glyph:

<input Gl> <T> <output Gl>

GSUB single lookup example

multiple (substitution)

At last two entries per line; replaces the input glyph by one or more output glyphs:

<input Gl> <T> <output Gl 1> <T> <output Gl 2> ....

GSUB multiple lookup example

alternate (substitution)

At least two entries per line; Replaces the input glyph by one of the output glyphs:

<input Gl> <T> <output Gl 1> <T> <output Gl 2> ....

GSUB alternate lookup example

ligature (substitution)

At least two entries per line; Replaces the input glyphs by one ligature (the first entry):

<ligature Gl> <T> <input Gl 1> <T> <input Gl 2> ....

GSUB ligature lookup example

single (positioning)

Keyword and two entries per line: The type of positioning (advance modification of x/y movement) is specified by the keyword, the amount by the value:

position type <T> <Gl> <T> value

GPOS single lookup example

pair or kernset (positioning)

It is possible to do pair kerning in two formats: one is a list of kern pair glyphs in the same way as we do in kern tables or in AFM files. This is most suited for small tables. The second is assigning glyphs to classes. Then the kerning values are declared for combinations of these classes. This format is best suited for larger kern tables with many accented characters sharing metrics. The glyph list format uses lines in this format:

position type <T> <LeftGl> <T> <RightGl> <T> value

Keyword and three entries per line. The type of positioning (advance modification of x/y movement) is specified by the keyword, the amount by the value. The Left glyph needs to be interpreted as being the first in the text (in case of right-to-left fonts it is therefore the right glyph).

The class format starts by two lists assigning glyphs to left/first and right/second classes. It is then followed by a list similar to the glyph list format, with class indices taking the place of glyph references:

firstclass definition begin
<Gl> <T> class
class definition end

secondclass definition begin
<Gl> <T> class
class definition end

position type <T> firstClass <T> secondClass <T> value

Class definitions have two entries per line, glyph reference and class index. Start class values at 1. The class value 0 refers to all glyphs that aren’t assigned to another class.

A kernset is a combination of the glyph and class formats: a subtable of glyph pairs precedes a subtable of class pairs, with the glyph pairs acting as “exceptions” to the class pair definitions.

GPOS pair (glyph) lookup example

GPOS pair (class) lookup example

GPOS kernset lookup example

cursive (positioning)

The Anchor type keyword is either entry for where a stroke enters the glyph or exit where a stroke leaves the glyph. The anchor coordinate is the location of the anchor. The index of the point of the outline at the coordinate is optional.:

Anchor type <T> <Gl> <T> anchorCoordinate <T> pointIndex

GPOS cursive lookup example

mark to base and mark to mark (positioning)

The GlyphType keyword is either base or mark. It is followed by the glyph reference and the Anchor class, which is a decimal index number. The coordinate of the anchor point is required, but the index of the point in the outline at the coordinate is optional:

GlyphType <T> <Gl> <T> anchorClass <T> anchorCoordinate <T> pointIndex

In a mark to mark lookup, the marks that other marks are attached to (“base marks”) use the base keyword.

In scripts such as the Indics where ligatures are used as single entities, accent positioning on these glyphs can be done with a mark to base rather than a mark to ligature lookup.

Within a lookup marks can belong to one class only. If there is more than one class of marks, all base glyphs are expected to have an anchor for each class. For marks with several anchors or when base glyphs do not have anchors for all classes of marks, it is recommended to use separate lookups for the classes.

GPOS mark to base/mark lookup example

mark to ligature (positioning)

This type of lookup differs from mark to base in that it allows marks to be positioned over a specific component part of a ligature glyph. The declaration of attachment points on marks is identical to that in mark to base lookups:

mark <T> <Gl> <T> anchorClass <T> anchorCoordinate <T> pointIndex

The mark keyword is followed by the glyph reference and the Anchor class. The coordinate of the anchor point is required, but the index of the point in the outline at the coordinate is optional.

The declarations for the attachment points include the index of the component, and the total number of components which the ligature represents:

ligature <T> <Gl> <T> componentIndex <T> componentCount <T> anchorClass <T> anchorCoordinate <T> pointIndex

The index used for the first component is 1.

Fonts can support any number of anchor classes. However marks can only have 1 anchor point and belong to 1 class only within a single lookup. Use separate lookups for cases where marks have more than one attachment point for use in a mark to ligature table.

If the lowest anchor number isn’t 0 or if there are gaps in the class numbering, the font tool should renumber or this needs to be done manually.

context (substitution and positioning)

This type of lookup is not used to directly list positioning or substitution actions. In a context table a set of rules is listed, where separate lookups are applied to certain glyph sequences. The general format of a rule is:

context type <T> inputSequence <T> action1 <T> action2 ...

First the type of context is declared. The input sequence consists of two or more entries, each representing a glyph or one of a group of glyphs. There can be one or more action items, which are used to declare what lookup is applied to which member of the sequence. These lookups follow the context lookup they are referenced by.

There are three types of contexts, each having its own keyword.

glyph: In this type the input sequence is a list of two or more glyph references separated by commas. For example to apply a kerning positioning to a F in the context in which F is followed by a period and a quoteright:

glyph <T> F, period, quoteright <T> 1,F-kern-lookup-label

In this type of context no additional information is required.

class: In this type the input sequence is a list of two or more glyph classes separated by commas. The list of rules is preceded by a class definition. For example to apply a kerning positioning to a F in the context in which F is followed by a period or a comma and a quoteright or a quotesingle:

class definition begin
F <T> 1
period <T> 2
comma <T> 2
quoteleft <T> 3
quotesingle <T> 3
class definition end

class <T> 1, 2, 3 <T> 1,F-kern-lookup-label

Start numbering classes by 1. the class 0 can be used to refer to any glyph not classified otherwise.

coverage: In this type it is possible to have one rule per sub-table only. The rule is preceded by two or more coverage tables, which are lists of glyphs. The input sequence refers to these lists. This works a bit like the class format, with the difference that a glyph can be in more than one list/class. The example with the F would look like this:

coverage definition begin <T> 0
F
coverage definition end

coverage definition begin <T> 1
period
comma
coverage definition end

coverage definition begin <T> 2
quoteleft
quotesingle
coverage definition end

coverage <T> 1,F-kern-lookup-label

Note that in the rule there is no entry for the sequence, because there is only one sequence possible. Each coverage table has an index number, starting with zero for the first.

context (class) lookup example

context (coverage) lookup example

chained (substitution and positioning)

This type of lookup is not used to directly list positioning or substitution actions. In a context table a set of rules is listed, where separate lookups are applied to certain glyph sequences. It differs from the context lookup in that the context from the glyphs that are modified is stored separately: a chain of contexts rather than one context. The general format of a rule is:

context type <T> backtrackSequence <T> inputSequence <T> lookaheadSequence <T> action1 <T> action2 ...

First the type of context is declared. The input sequence consists of one or more entries, each representing a glyph or one of a group of glyphs. These are the glyphs that the actions are applied to. The backTrack and lookAhead sequences are used to give the context of the glyphs that are modified, but elements in these sequences cannot be modified themselves. A context can have both a backtrack and a lookhead, or just one of them. If that is the case them the entry in the rule can remain empty, but the number of <T>’s has to be complete. There can be one or more action items, which are used to declare what lookup is applied to which member of the sequence. These lookups follow the context lookup they are referenced by.

There are three types of contexts, each having its own keyword.

glyph: In this type the sequences are lists of one or more glyph references separated by commas. For example to apply a kerning positioning to a period in the context in which F is followed by a period and a quoteright:

glyph <T> F <T> period <T> quoteright <T> 1,F-kern-lookup-label

In this type of context no additional information is required.

class-chain: In this type the sequence are lists of one or more glyph classes separated by commas. The list of rules is preceded by a set of class definition. For example to apply a kerning positioning to a comma or period in the context in which F is followed by a period or a comma and a quoteright or a quotesingle:

backtrackclass definition begin
F <T> 1
class definition end

class definition begin
period <T> 1
comma <T> 1
class definition end

lookaheadclass definition begin
quoteleft <T> 1
quotesingle <T> 1
class definition end

class-chain <T> 1 <T> 1 <T> 1 <T> 1,F-kern-lookup-label

Start numbering classes by 1. The class 0 can be used to refer to any glyph not classified otherwise. The backtrack and lookahead definitions are optional, but at least one of them has to be present. More about this here...

coverage: In this type it is possible to have one rule per sub-table only. The rule is preceded by two or more coverage tables, which are lists of glyphs. The sequence refer to these lists. This works a bit like the class format, with the difference that a glyph can be in more than one list/class. The example with the F would look like this:

backtrackcoverage definition begin
F
coverage definition end

inputcoverage definition begin
period
comma
coverage definition end

lookaheadcoverage definition begin
quoteleft
quotesingle
coverage definition end

coverage <T> 1,F-kern-lookup-label

Note that in the rule there is no entry for the sequence, because there is only one sequence possible. There can be any number of these coverage definitions. At least one inputcoverage table and at least one of the others are required.

chained (class) lookup example

chained (coverage) lookup example

reversechained (substitution)

This format combines chained context rules with single substitutions within a lookup. Either a backtrack coverage definition or lookahead coverage definition (or both) defining glyphs for the contexts, followed by single (glyph) substitution definitions:

backtrackcoverage definition begin
<Gl1>
...
<GLn>
coverage definition end

lookaheadcoverage definition begin
<Gl1>
...
<Gln>
coverage definition end

<input Gl1> <T> <output Gl1>
...
<input Gln> <T> <output Gln>

It is important to note that for reversechained lookups, processing of the input glyph sequence goes from the end to start, opposite of other lookup types. This type is designed for Arabic and scripts like it where the shape of a glyph is determined by a following glyph. See the OpenType specification for more information.

GSUB reversechained lookup example

Shared by various lookups

Value
A value in Font Units.
Position Type

The following keywords are used to describe the various positioning types used in GPOS table entries:

positionType<T>...

Position types are:

  • x advance

    increase or decrease of advance width (hmtx)

  • y advance

    increase or decrease of vertical advance width (vmtx)

  • x placement

    shift character left or right without alteration of advance

  • y placement

    shift character up or down without alteration of advance

In lookups with kern pairs these keywords are extended with left or right in front of them, referring to the first and second element in the current reading direction. For normal kerning we would therefore use left x advance.

Anchor coordinate
Coordinate of an anchor point in font units as a x and y pair, separated by a comma. For example: 238,1058 for horizontal 238 and vertical 1058.
Context action rule

This consists of an index into a context sequence, followed by the label of the lookup which is applied to the glyph(s) that might be in the position of that index. Index and label are separated by a comma. If there is more than one lookup applied to a sequence the actions are separated by a <T>. The position index of the first element of the sequence is 1. Please be aware: in class context tables it is easy to confuse these index numbers with the class numbers used in the context sequence. The number in the action rule applies to the position in the sequence, not to the classes in the sequence.

When a context lookup is decompiled the label of the lookup is the index number of the lookup.

More about chained context

Suppose we have an input context of three glyphs, where we need to change the middle if it is surrounded by two others:

rule 1: If ABC then ACC

and

rule 2: If CCC then CDC

These input sequences can appear separate, but also together, then

xxxxABCCxxxx needs to become xxxxACDCxxxx

The problem with normal context lookups is that the pointer is moved after the input context sequence after a match has been found. After ABC has been turned into ACC the pointer is moved to investigate the next input sequence Cxx and the CCC sequence is overlooked.

In the chained context lookup you can separate the input part from the rest of the context. This allows you to look for context, while moving the pointer one step at the time, by having a single glyph as the input. That means that when you check the next input sequence, you can take into account the change just made, because the previous input sequence has become the context of the next.

In a context lookup we have one set of classes, which you use to define the context for the rules. But in a chained context lookup in class format we have also a set of classes for the context before the input (the backtrack classes) and a set for what comes after (the lookahead classes).

Of course you can have both lookahead and backtrack, or just one of the two.

Within these separate class tables you count the classes separately. In the sample above we see three class 1 items in the sequence: these are class 1 items from different classifications. It isn’t saying bird #1, bird #1, bird #1. It is saying duck #1, parrot #1, seagull #1.

The number 1 in the rule is a different number all together, it doesn’t refer to glyphs. It refers to the position in the input sequence. The input sequence contains 1 glyph (out of the possibly more than one glyphs in the class list). So the rule says:

If you have input sequence of one of the glyphs class 1 in the input class table, surrounded by one of class 1 of the lookhead and backtrack class tables, then apply lookup “F-kern-lookup-label” to the first element of the input sequence. Then move the text pointer by the amount of the input sequence, which is one.

GDEF Table

Note

The typical way to generate a GDEF is to compile an MTap table.

The GDEF table is where numerous glyph properties, independent of the GPOS and GSUB, are defined. This includes things like glyph categorization (Base, Mark, Ligature, etc.), anchor points, ligature carets, and mark categorization.

Similar to GSUB and GPOS, the first line of a text file representing a GDEF is:

FontDame GDEF table

Glyph Classes

Glyphs defined in GDEF glyph classes are what are used in the GSUB and GPOS lookup flags (ignoreBaseGlyphs, ignoreLigatures, ignoreMarks). Classes can be a number from 1 to 4:

  • 1 - Base
  • 2 - Ligature
  • 3 - Mark
  • 4 - Component (note, these are not referenced by the GSUB or GPOS)

Glyphs not explicitly assigned to a class are considered to be in class 0 (“no class”). The syntax for defining glyph classes is:

class definition begin
<Gl> <T> <classNumber>
class definition end

glyph class definition example

Attachment Point List

Attachment (anchor) points are point numbers on the glyph outline which can be used as references for positioning glyphs. Define anchor points with a glyph reference followed by one or more point numbers, separated by <T>:

attachment list begin
<Gl> <T> <point1> <T> <point2> ...
attachment list end

attachment point definition example

Ligature Caret Lists

Ligature Carets are x-coordinates on a glyph, designating where the caret (cursor) should be placed when selecting a component of a ligature glyph. For example, an ‘fi’ ligature might contain a single ligature caret somewhere between the right side of the ‘f’ and the left side of the ‘i’ portion of the glyph.

Ligature carets are defined with a glyph reference, followed by a number designating the number of carets, then that number of integers which are x-coordinate values, defining the caret positions:

carets begin
<Gl> <T> <numberOfCarets> <T> X1 <T> X2 ...
carets end

ligature caret definition example

Mark Attachment Classes

Mark Attachment Classes are an additional level of specifying marks. These can be used in conjunction with the markattachmenttype lookup flag to enable lookups to ignore marks other than those that are part of the class specified by the markattachmenttype flag as described above. A mark can only belong to a single mark attachment class.

Create mark attachment classes with a glyph reference and a class number, separated by <T>:

mark attachment class definition begin
<Gl> <T> <markClassNumber>
class definition end

mark attachment class definition example

Mark Glyph Sets (Mark Filtering Sets)

Mark glyph sets, also known as mark filter sets, are somewhat similar to mark attachment classes in that they allow a specific sub-category of marks to be ignored. The main difference is that a glyph can belong to any number of mark glyph sets This allows greater flexibility when using sets in lookups with the markfiltertype flag as described above. However, like mark attachment classes, only a single set may be specified per lookup.

To define mark filter sets, use a glyph reference followed by a set number. Begin set numbering at 0 (font tools should automatically renumber and report if you forget, but most important is that any references to mark glyph sets in GSUB or GPOS lookups must reference the renumbered sets. Syntax:

markfilter set definition begin
<Gl> <T> <markSetNumber>
set definition end

markfilter set definition example

Other Source Files

Other table source files can be represented as Font Chef table. These files begin with a first line indicating Font Chef format and the table tag as follows:

Font Chef Table <tag>

cmap Table

Following the tag entry on the first line are cmap subtable definitions. A subtable definition begins with a declaration of the form:

cmap subtable N

Next are a few lines indicating the platform, encoding, and language IDs and the format:

platformID <T> N
encodingID <T> N
format <T> N
language <T> N

Following the header information are 2 <T>-separated entries to designate the code-to-glyph relationship:

0xXXXX <T> <glyphName>

An optional 3rd column appears in dump files to indicate the glyph index. This column is ignored when loading:

\#N

At the end of the subtable is the token:

end subtable

cmap example


Copyright © 2015 Monotype Imaging Inc. http://www.monotype.com/ All rights reserved.