LibreOffice/Collabora Online Typography

Line break interoperability and state-of-the-art ISO OpenDocument/web typography in open source office suites

Table of Contents

1 Summary

2 Example

3 Fixing line/page count interoperability for plain text

3.1 Result of the first analysis of the unknown algorithm

3.2 Fix line count & page count interoperability for plain text (2023-10-17)

Visual comparison

3.3 Fix paragraph layout interoperability: shrink exceeding paragraph lines (2023-11-16)

3.4 Fix handling text portions, cursor positions/selection and shrinking algorithm

Multiple text portions

Cursor positions

Freezing at underflow

Result of the second analysis of the shrinking algorithm

4 Exclude words from hyphenation

5 Don’t hyphenate across a column, page or spread

5.1 Adding hyphenation-keep

Developments

User interface

5.2 Support column, page and spread types, DOCX import

User interface

Don’t hyphenate across a page

Don’t hyphenate across a spread

Don’t hyphenate across a column

Hyphenate across a column, except in the last one

Developments (Writer core, DOCX filter and help content)

5.3 Support hyphenation-keep in linked frames, in tables and last full line of paragraphs, DOCX export

Improved interoperability in Writer’s DOCX export

DOCX export of hyphenation enabled in “Text body” style

Last full line of paragraphs

User interface

Tables

Linked frames on the same pages

Linked frames not on the same spread

Linked frames on the same spread

Developments (Writer core, DOCX filter and help content)

Manual tests

Fixed DOCX export

Disable hyphenation of last full line of paragraphs

Tables

Linked frames

Linked frames only on following right pages

1Summary

LibreOffice, the open source office suite is the reference implementation of ISO OpenDocument (ODF) format. As part of LibreOffice Technology, Collabora Online is key for the digital sovereignty of open source online document editing. Metrically equivalent fonts cannot guarantee MS Word-interoperability any more because of the undocumented changes in MS Word line break algorithm after ODF and OOXML standardization. The planned project solves this fundamental problem, adding also the the following interoperability and CSS4 web typography features and fixes to LibreOffice line break algorithm and layout:

All the developments are libre and open source, and will be integrated with the main development branch of LibreOffice, and released with its next stable release.

2Example

Layout difference of MS Word (upper) and LibreOffice (bottom): line break in justified paragraphs supports shrinking of spaces between words since MS Word 2013. Only disabling the justification results in the same layout, moving the word “Antarctica” to the second line in MS Word, too. The bottom paragraphs contain hundreds of adjacent spaces shrinking only in MS Word.

 
 

3Fixing line/page count interoperability for plain text

3.1Result of the first analysis of the unknown algorithm

With 0.1 pt resolution, up to 2% shrinking was measured for plain justified lines (no direct character formatting, no hyphenation). The following picture shows, how the shrinking is implemented in MSO by using the available spaces (but the maximal shrinking is independent from the number of the spaces – at least in a line with enough spaces):

(Click on the image to show more details. Black text: MS Word, red text: LibreOffice Writer – the black text contains an extra character spacing after the text “3 Au” to give the same position after the last space).

3.2Fix line count & page count interoperability for plain text (2023-10-17)

Commit 7d08767b890e723cd502b1c61d250924f695eb98 “tdf#130088 tdf#119908 smart justify: fix DOCX line count + compat opt.” is the fix for the line count/page count differences and the initial fix for the new justification:

Visual comparison

A simple
lorem.docx
document (attached to
tdf#119908
) was created in MSO 2016 to test and show the improved line break, which solved the line/page count interoperability in DOCX import of LibreOffice Writer for plain text:
 

(MSO, Writer before the fix, Writer after the fix. Click on the image to show more details. Blue marks: correct line break positions in MSO and improved Writer, red marks: bad line breaks in Writer before the fix).

3.3Fix paragraph layout interoperability: shrink exceeding paragraph lines (2023-11-16)

Shrinking has been added by commit 17eaebee279772b6062ae3448012133897fc71bb “tdf#119908 sw smart justify: fix justification by shrinking” and commit c1803de8a093739d189be54b2d9bd5634e9e79ee “tdf#119908 sw smart justify: add unit test”, fixing the temporary exceeding lines, i.e. justifying them.

The following composite picture shows the previous (red), and the shrank (black) lines in Writer. With this commit, the result is the same, as in MS Word. (Click on the image to show more details.)

3.4Fix handling text portions, cursor positions/selection and shrinking algorithm

The following composite screenshots of MS Word (red) and Writer (black) show the fix of handling multiple text portions (middle), and the shrinking algorithm (right), which resulted the same line breaks, as in MS Word (test document: lorempage.docx of Bug 158333, generated PDFs: Word, Writer). (Click on the image to show more details.)

The related commits in LibreOffice code base:

Commit

Description

53de98b29548ded88e0a44c80256fc5e340d551e

tdf#158333 sw smart justify: fix multiple text portions

Multiple text portions, e.g. if some part of a line contains direct character formatting breaks DOCX interoperability of justified paragraphs.

20cbe88ce5610fd8ee302e5780a4c0821ddb3db4

tdf#119908 tdf#158419 sw smart justify: fix cursor position

Text cursor didn't follow the new word positions yet, because of unsigned casting of the negative shrinking value.

7059a1858ddb044c5f3f0c8e0386d3e1d9dd2b5f

tdf#119908 tdf#158436 sw smart justify: fix freezing with NBSP

Stop shrinking during underflow, because it resulted endless layout loop, e.g. when a very short word followed by a no-break space.

The problem reported by Miklós Vajna (Collabora Productivity).

36bfc86e27fa03ee16f87819549ab126c5a68cac

tdf#119908 tdf#158776 sw smart justify: shrink only spaces

For interoperability, only shrink spaces up to 20%, not the lines up to 2%.

8b393bba91111bd4f8988457f3a78b0306462bf2

tdf#159102 sw smart justify: fix automatic hyphenation

As before with soft hyphens, automatic hyphenation could result too much shrinking, because of calculating with an extra non-existing space in the line. Also try to shrink the line only if a space likely will be available in it.

(Note: commit 93ab1bc6be7b46226874810ec4fc3c61d5d0fc7c reverted the removal of unit test testOfz64109, caused by the second commit by accident. Reported by Miklós Vajna.)

Multiple text portions

Text portions (spans or runs) are associated to direct character formatting of the paragraphs, or simply resulted by editing the text without formatting difference.

The first implementation calculated only with the last text portion of the line, which didn’t cause problem, if there was only a single text portion, otherwise resulted different line breaks (less or missing shrinking). The first screenshots show, that the test file contained multiple bad line break resulted by the multiple text portions in the lines. These three bad line breaks were fixed by the first commit.

Cursor positions

Selected text and the text cursor were in wrong position, over the exceeding, i.e. not shrank line instead of the visible shrank line. The second commit adjusted cursor movement and text selection to the shrank lines.

Freezing at underflow

Extending text layout code to shrinking resulted an infinite loop, e.g. when a paragraph ended with a no-break space initiated the recalculation of the line with underflow. This was solved by disabling shrinking for underflow.

Result of the second analysis of the shrinking algorithm

Removing spaces instead of adding them/replacing them allowed more precise comparison of MSO and LibreOffice, see the following data measured with the previous test file, which show up to 20% shrinking of spaces instead of up to 2% shrinking of the lines.

paragraph width

max spaces in line

space width

 

 

 

347.53 pt

104

3.34 pt

 

 

 

 

 

 

 

 

 

 

allowed extra character spacing in the line (pt)

extra spacing by shrinking

shrinking

spaces

MS Word

Writer

difference (pt)

line

spaces

6

5.98

1.95

4.03

1.2%

20.1%

5

8.08

4.65

3.43

1.0%

20.5%

4

10.18

7.45

2.73

0.8%

20.4%

3

12.23

10.25

1.98

0.6%

19.8%

2

14.33

13.05

1.28

0.4%

19.2%

1

16.43

15.75

0.68

0.2%

20.3%

 

 

 

 

 

 

Low precision (approximated recalculation of multiple text portions?)

9

11.15

4.15

7

2.0%

23.3%

12

9.63

1.35

8.28

2.4%

20.6%

21

16.83

5.85

10.98

3.2%

15.6%

The last commit changed the algorithm to this, resulting the same line break not only for the previous test file, but for the originally reported test document of tdf#119908: (Click on the image to show more details.)

4Exclude words from hyphenation

Hyphenation was only a paragraph-level feature in LibreOffice, while the OpenDocument standard allows to disable hyphenation in character formatting, too. The related commits in LibreOffice code base, which solved the problem, allowing to disable words from hyphenation.

Commit

Description

73bd04a71e741788a2f2f3b26cc46ddb6a361372

tdf#106733 xmloff: keep fo:hyphenate in character formatting

In the case of character formatting, map fo:hyphenate to the unused CharNoHyphenation character property to keep it during ODF import/export instead of losing it completely. This is the first step to disable hyphenation for single words or text spans in paragraphs with automatic hyphenation. Note: using fo:hyphenate as character property is part of the ODF standard.

Note: the old workaround to disable hyphenation, changing the language of the text to None had got some serious fallback: losing spell checking and losing language-dependent text layout (supported by both OpenType and Graphite font engines in LibreOffice).

b5e275f47a54bd7fee39dad516a433fde5be872d

tdf#106733 sw: implement CharNoHyphenation

Implement CharNoHyphenation character property to disable automatic hyphenation of words in paragraphs with enabled hyphenation.

Fix also fo:hyphenate mapping to CharNoHyphenation using automatic inversion of their boolean values defined by xmloff's XML_TYPE_NBOOL, as suggested by Michael Stahl.

9193e61d3e7b850b3715c848c09434e24855340b

tdf#106733 sw: fix bad downcast in SwTextNode::GetLang()

Fix bad cast of SvxNoHyphenItem to SvxLanguageItem reported by <https://ci.libreoffice.org/job/lo_ubsan/3049/>.

03c5a31a0f374a90fbc821718c14dc5f8a385adf

tdf#106733 sw cui: add CharNoHyphenation checkbox

On Position tab of Character formatting dialog window as a new checkbox "Exclude from hyphenation" (UX design by Heiko Tietze).

With this, it's possible to disable hyphenation with direct character formatting (e.g. combined with Find All), or using character styles, and settingExclude from hyphenation in them. This feature is conformant to the OpenDocument standard, and unlike the previous locale=None workaround, it keeps spell checking and locale dependent text layout.

(Note: commit 1b83ebf42c535528b73baac2407b347f19070d07 disabled the unit test of tdf#159102 temporarily, because of lack of hyphenation on some test builds. Reported by Noel Grandin and Miklós Vajna.)

The following screenshot shows the new option in Character formatting dialog window:

5Don’t hyphenate across a column, page or spread

This development adds new typographical features standardized by OpenDocument, XSL and CSS 4 to LibreOffice Writer to guarantee MSO interoperability and more: following typographical rules and web standards.

Not only justified text, but left-aligned etc. texts have lost their interoperability since MSO 2013, depending on hyphenation. New default page layout algorithms of MSO truncates the last hyphenated word (before MSO 2013) or line (MSO 2013 and newer), following the English typographical traditions and rules.

Also OpenDocument standard contains the same feature, fo:hyphenation-keep, but this wasn’t implemented before, also LibreOffice hasn’t kept this setting during import/export of an ODF document.

For more control on hyphenation, also better conformance with standard typographical rules, LibreOffice Writer implements “hyphenation across” values of XSL and CSS 4 which haven’t been covered by OpenDocument, yet (loext:hyphenation-keep-type).

5.1Adding hyphenation-keep

As a new development, LibreOffice not only keeps the value of fo:hyphenation-keep, but the page layout moves the last hyphenated line of the page (or column) to the next page (or column), similar to MSO 2016 and newer.

Developments

Commit

Description

9574a62add8e4901405e12117e75c86c2d2c2f21

tdf#132599 cui offapi sw xmloff: implement hyphenate-keep

Both parts of a hyphenated word shall lie within a single

page with ODF paragraph setting fo:hyphenation-keep="page".

The implementation follows the default page layout of

MSO 2016 and newer by shifting the bottom hyphenated line

to the next page (and to the next column).

Note: this is a MSO DOCX interoperability feature, used

also in DTP software, XSL and CSS.

 

* Add checkbox/combobox to Text Flow in paragraph dialog

* Store property in paragraph model (com::sun::star::style::ParagraphProperties::ParaHyphenationKeep)

* Add ODF import/export

* Add ODF unit tests

 

New constants of com::sun::star::text::ParagraphHyphenationKeepType,

containing ODF AUTO and PAGE (borrowed from XSL), and for the

planned extension ParaHyphenationKeepType of ParagraphProperties:

 

– COLUMN (standard XSL value, defined in https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep)

 

– SPREAD and ALWAYS (CSS 4 values of hyphenate-limit-last,

  equivalent of hyphenation-keep, defined in

  https://www.w3.org/TR/css-text-4/#hyphenate-line-limits).

 

Note: the implementation truncates only a single hyphenated

line, like MSO does: the pages can end in hyphenated

lines (i.e. in the case of consecutive hyphenated lines),

but less often, than before.

 

Clean-up hyphenation dialog by collecting "Don't hyphenate"

options at the end of the hyphenation settings, and negating them

(similar to MSO and DTP), adding also the new option

"Hyphenate across column and page":

 

[x] Hyphenate words in CAPS

[x] Hyphenate last word

[x] Hyphenate across column and page

 

Note: ODF fo:hyphenation-keep has got only "auto" and

"page" attributes, while XSL defines also "column".

Because of the interoperability with MSO and DTP,

fo:hyphenation-keep="page" is interpreted as

XSL "column", avoiding hyphenation at the end

of column, not only at the end of page.

User interface

The following screenshot shows the new “Hyphenate across column and page” option on “Text Flow” pane of Paragraph formatting dialog window: the last hyphenated line “except that it has at-” was shifted to the next page. Also the clean-up of “Hyphenate words in CAPS/last word” options is visible.

5.2Support column, page and spread types, DOCX import

Recent OpenDocument standard has not fully adopted values of the XSL attribute hyphenation-keep: value “column” is missing (https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep). Also according e.g. to New Hart Rules for English (OUP, 2005, Line Endings”, p. 40, cited by R. Green in tdf#132599), it’s a typographical requirement to avoid hyphenation in the last line of a spread, i.e. a visible page pair. That is why CSS 4 defines “spread”, not only “page” and “column” to control hyphenation.

User interface

The new user interface of LibreOffice Writer 24.8 with Hyphenation across “Column”, “Page” and “Spread” on Text Flow pane of the Paragraph settings:

 

Following screenshots show layout of the test documents in LibreOffice Writer 24.8.

Don’t hyphenate across a page

 

LEFT: Hyphenation across everywhere (hyphenation-keep="auto"). RIGHT: No hyphenation across, resulting shifted hyphenated line (hyphenation-keep="page", which means the default loext:hyphenation-keep="column" for interoperability reasons).

Don’t hyphenate across a spread