LibreOffice/Collabora Online Typography

Line break interoperability and state-of-the-art ISO OpenDocument/web typography in open source office suites

Table of Contents

1 Summary

2 Example

3 Fixing line/page count interoperability for plain text

3.1 Result of the first analysis of the unknown algorithm

3.2 Fix line count & page count interoperability for plain text (2023-10-17)

Visual comparison

3.3 Fix paragraph layout interoperability: shrink exceeding paragraph lines (2023-11-16)

3.4 Fix handling text portions, cursor positions/selection and shrinking algorithm

Multiple text portions

Cursor positions

Freezing at underflow

Result of the second analysis of the shrinking algorithm

4 Exclude words from hyphenation

5 Don’t hyphenate across a column, page or spread

5.1 Adding hyphenation-keep

Developments

User interface

5.2 Support column, page and spread types, DOCX import

User interface

Don’t hyphenate across a page

Don’t hyphenate across a spread

Don’t hyphenate across a column

Hyphenate across a column, except in the last one

Developments (Writer core, DOCX filter and help content)

5.3 Support hyphenation-keep in linked frames, in tables and last full line of paragraphs, DOCX export

Improved interoperability in Writer’s DOCX export

DOCX export of hyphenation enabled in “Text body” style

Last full line of paragraphs

User interface

Tables

Linked frames on the same pages

Linked frames not on the same spread

Linked frames on the same spread

Developments (Writer core, DOCX filter and help content)

Manual tests

Fixed DOCX export

Disable hyphenation of last full line of paragraphs

Tables

Linked frames

Linked frames only on following right pages

6 No Break context menu and visualization

6.1 Developments

6.2 Manual testing

7 DOCX interoperability fixes

7.1 Support of maximum consecutive hyphenated lines

7.2 Fix overshrank lines in smart justify

7.3 Default hyphenation zone

7.4 Analysis of the test document of Bug 149421 (hyphenation zone)

7.5 Developments

7.6 Manual testing

Maximum consecutive hyphenated lines

Fix overshrank lines in smart justify

Export zero hyphenation zone of new documents

Import default OOXML hyphenation zone

1Summary

LibreOffice, the open source office suite is the reference implementation of ISO OpenDocument (ODF) format. As part of LibreOffice Technology, Collabora Online is key for the digital sovereignty of open source online document editing. Metrically equivalent fonts cannot guarantee MS Word-interoperability any more because of the undocumented changes in MS Word line break algorithm after ODF and OOXML standardization. The planned project solves this fundamental problem, adding also the the following interoperability and CSS4 web typography features and fixes to LibreOffice line break algorithm and layout:

All the developments are libre and open source, and will be integrated with the main development branch of LibreOffice, and released with its next stable release.

2Example

Layout difference of MS Word (upper) and LibreOffice (bottom): line break in justified paragraphs supports shrinking of spaces between words since MS Word 2013. Only disabling the justification results in the same layout, moving the word “Antarctica” to the second line in MS Word, too. The bottom paragraphs contain hundreds of adjacent spaces shrinking only in MS Word.

 
 

3Fixing line/page count interoperability for plain text

3.1Result of the first analysis of the unknown algorithm

With 0.1 pt resolution, up to 2% shrinking was measured for plain justified lines (no direct character formatting, no hyphenation). The following picture shows, how the shrinking is implemented in MSO by using the available spaces (but the maximal shrinking is independent from the number of the spaces – at least in a line with enough spaces):

(Click on the image to show more details. Black text: MS Word, red text: LibreOffice Writer – the black text contains an extra character spacing after the text “3 Au” to give the same position after the last space).

3.2Fix line count & page count interoperability for plain text (2023-10-17)

Commit 7d08767b890e723cd502b1c61d250924f695eb98 “tdf#130088 tdf#119908 smart justify: fix DOCX line count + compat opt.” is the fix for the line count/page count differences and the initial fix for the new justification:

Visual comparison

A simple
lorem.docx
document (attached to
tdf#119908
) was created in MSO 2016 to test and show the improved line break, which solved the line/page count interoperability in DOCX import of LibreOffice Writer for plain text:
 

(MSO, Writer before the fix, Writer after the fix. Click on the image to show more details. Blue marks: correct line break positions in MSO and improved Writer, red marks: bad line breaks in Writer before the fix).

3.3Fix paragraph layout interoperability: shrink exceeding paragraph lines (2023-11-16)

Shrinking has been added by commit 17eaebee279772b6062ae3448012133897fc71bb “tdf#119908 sw smart justify: fix justification by shrinking” and commit c1803de8a093739d189be54b2d9bd5634e9e79ee “tdf#119908 sw smart justify: add unit test”, fixing the temporary exceeding lines, i.e. justifying them.

The following composite picture shows the previous (red), and the shrank (black) lines in Writer. With this commit, the result is the same, as in MS Word. (Click on the image to show more details.)

3.4Fix handling text portions, cursor positions/selection and shrinking algorithm

The following composite screenshots of MS Word (red) and Writer (black) show the fix of handling multiple text portions (middle), and the shrinking algorithm (right), which resulted the same line breaks, as in MS Word (test document: lorempage.docx of Bug 158333, generated PDFs: Word, Writer). (Click on the image to show more details.)

The related commits in LibreOffice code base:

Commit

Description

53de98b29548ded88e0a44c80256fc5e340d551e

tdf#158333 sw smart justify: fix multiple text portions

Multiple text portions, e.g. if some part of a line contains direct character formatting breaks DOCX interoperability of justified paragraphs.

20cbe88ce5610fd8ee302e5780a4c0821ddb3db4

tdf#119908 tdf#158419 sw smart justify: fix cursor position

Text cursor didn't follow the new word positions yet, because of unsigned casting of the negative shrinking value.

7059a1858ddb044c5f3f0c8e0386d3e1d9dd2b5f

tdf#119908 tdf#158436 sw smart justify: fix freezing with NBSP

Stop shrinking during underflow, because it resulted endless layout loop, e.g. when a very short word followed by a no-break space.

The problem reported by Miklós Vajna (Collabora Productivity).

36bfc86e27fa03ee16f87819549ab126c5a68cac

tdf#119908 tdf#158776 sw smart justify: shrink only spaces

For interoperability, only shrink spaces up to 20%, not the lines up to 2%.

8b393bba91111bd4f8988457f3a78b0306462bf2

tdf#159102 sw smart justify: fix automatic hyphenation

As before with soft hyphens, automatic hyphenation could result too much shrinking, because of calculating with an extra non-existing space in the line. Also try to shrink the line only if a space likely will be available in it.

(Note: commit 93ab1bc6be7b46226874810ec4fc3c61d5d0fc7c reverted the removal of unit test testOfz64109, caused by the second commit by accident. Reported by Miklós Vajna.)

Multiple text portions

Text portions (spans or runs) are associated to direct character formatting of the paragraphs, or simply resulted by editing the text without formatting difference.

The first implementation calculated only with the last text portion of the line, which didn’t cause problem, if there was only a single text portion, otherwise resulted different line breaks (less or missing shrinking). The first screenshots show, that the test file contained multiple bad line break resulted by the multiple text portions in the lines. These three bad line breaks were fixed by the first commit.

Cursor positions

Selected text and the text cursor were in wrong position, over the exceeding, i.e. not shrank line instead of the visible shrank line. The second commit adjusted cursor movement and text selection to the shrank lines.

Freezing at underflow

Extending text layout code to shrinking resulted an infinite loop, e.g. when a paragraph ended with a no-break space initiated the recalculation of the line with underflow. This was solved by disabling shrinking for underflow.

Result of the second analysis of the shrinking algorithm

Removing spaces instead of adding them/replacing them allowed more precise comparison of MSO and LibreOffice, see the following data measured with the previous test file, which show up to 20% shrinking of spaces instead of up to 2% shrinking of the lines.

paragraph width

max spaces in line

space width

 

 

 

347.53 pt

104

3.34 pt

 

 

 

 

 

 

 

 

 

 

allowed extra character spacing in the line (pt)

extra spacing by shrinking

shrinking

spaces

MS Word

Writer

difference (pt)

line

spaces

6

5.98

1.95

4.03

1.2%

20.1%

5

8.08

4.65

3.43

1.0%

20.5%

4

10.18

7.45

2.73

0.8%

20.4%

3

12.23

10.25

1.98

0.6%

19.8%

2

14.33

13.05

1.28

0.4%

19.2%

1

16.43

15.75

0.68

0.2%

20.3%

 

 

 

 

 

 

Low precision (approximated recalculation of multiple text portions?)

9

11.15

4.15

7

2.0%

23.3%

12

9.63

1.35

8.28

2.4%

20.6%

21

16.83

5.85

10.98

3.2%

15.6%

The last commit changed the algorithm to this, resulting the same line break not only for the previous test file, but for the originally reported test document of tdf#119908: (Click on the image to show more details.)

4Exclude words from hyphenation

Hyphenation was only a paragraph-level feature in LibreOffice, while the OpenDocument standard allows to disable hyphenation in character formatting, too. The related commits in LibreOffice code base, which solved the problem, allowing to disable words from hyphenation.

Commit

Description

73bd04a71e741788a2f2f3b26cc46ddb6a361372

tdf#106733 xmloff: keep fo:hyphenate in character formatting

In the case of character formatting, map fo:hyphenate to the unused CharNoHyphenation character property to keep it during ODF import/export instead of losing it completely. This is the first step to disable hyphenation for single words or text spans in paragraphs with automatic hyphenation. Note: using fo:hyphenate as character property is part of the ODF standard.

Note: the old workaround to disable hyphenation, changing the language of the text to None had got some serious fallback: losing spell checking and losing language-dependent text layout (supported by both OpenType and Graphite font engines in LibreOffice).

b5e275f47a54bd7fee39dad516a433fde5be872d

tdf#106733 sw: implement CharNoHyphenation

Implement CharNoHyphenation character property to disable automatic hyphenation of words in paragraphs with enabled hyphenation.

Fix also fo:hyphenate mapping to CharNoHyphenation using automatic inversion of their boolean values defined by xmloff's XML_TYPE_NBOOL, as suggested by Michael Stahl.

9193e61d3e7b850b3715c848c09434e24855340b

tdf#106733 sw: fix bad downcast in SwTextNode::GetLang()

Fix bad cast of SvxNoHyphenItem to SvxLanguageItem reported by <https://ci.libreoffice.org/job/lo_ubsan/3049/>.

03c5a31a0f374a90fbc821718c14dc5f8a385adf

tdf#106733 sw cui: add CharNoHyphenation checkbox

On Position tab of Character formatting dialog window as a new checkbox "Exclude from hyphenation" (UX design by Heiko Tietze).

With this, it's possible to disable hyphenation with direct character formatting (e.g. combined with Find All), or using character styles, and settingExclude from hyphenation in them. This feature is conformant to the OpenDocument standard, and unlike the previous locale=None workaround, it keeps spell checking and locale dependent text layout.

(Note: commit 1b83ebf42c535528b73baac2407b347f19070d07 disabled the unit test of tdf#159102 temporarily, because of lack of hyphenation on some test builds. Reported by Noel Grandin and Miklós Vajna.)

The following screenshot shows the new option in Character formatting dialog window:

5Don’t hyphenate across a column, page or spread

This development adds new typographical features standardized by OpenDocument, XSL and CSS 4 to LibreOffice Writer to guarantee MSO interoperability and more: following typographical rules and web standards.

Not only justified text, but left-aligned etc. texts have lost their interoperability since MSO 2013, depending on hyphenation. New default page layout algorithms of MSO truncates the last hyphenated word (before MSO 2013) or line (MSO 2013 and newer), following the English typographical traditions and rules.

Also OpenDocument standard contains the same feature, fo:hyphenation-keep, but this wasn’t implemented before, also LibreOffice hasn’t kept this setting during import/export of an ODF document.

For more control on hyphenation, also better conformance with standard typographical rules, LibreOffice Writer implements “hyphenation across” values of XSL and CSS 4 which haven’t been covered by OpenDocument, yet (loext:hyphenation-keep-type).

5.1Adding hyphenation-keep

As a new development, LibreOffice not only keeps the value of fo:hyphenation-keep, but the page layout moves the last hyphenated line of the page (or column) to the next page (or column), similar to MSO 2016 and newer.

Developments

Commit

Description

9574a62add8e4901405e12117e75c86c2d2c2f21

tdf#132599 cui offapi sw xmloff: implement hyphenate-keep

Both parts of a hyphenated word shall lie within a single

page with ODF paragraph setting fo:hyphenation-keep="page".

The implementation follows the default page layout of

MSO 2016 and newer by shifting the bottom hyphenated line

to the next page (and to the next column).

Note: this is a MSO DOCX interoperability feature, used

also in DTP software, XSL and CSS.

 

* Add checkbox/combobox to Text Flow in paragraph dialog

* Store property in paragraph model (com::sun::star::style::ParagraphProperties::ParaHyphenationKeep)

* Add ODF import/export

* Add ODF unit tests

 

New constants of com::sun::star::text::ParagraphHyphenationKeepType,

containing ODF AUTO and PAGE (borrowed from XSL), and for the

planned extension ParaHyphenationKeepType of ParagraphProperties:

 

– COLUMN (standard XSL value, defined in https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep)

 

– SPREAD and ALWAYS (CSS 4 values of hyphenate-limit-last,

  equivalent of hyphenation-keep, defined in

  https://www.w3.org/TR/css-text-4/#hyphenate-line-limits).

 

Note: the implementation truncates only a single hyphenated

line, like MSO does: the pages can end in hyphenated

lines (i.e. in the case of consecutive hyphenated lines),

but less often, than before.

 

Clean-up hyphenation dialog by collecting "Don't hyphenate"

options at the end of the hyphenation settings, and negating them

(similar to MSO and DTP), adding also the new option

"Hyphenate across column and page":

 

[x] Hyphenate words in CAPS

[x] Hyphenate last word

[x] Hyphenate across column and page

 

Note: ODF fo:hyphenation-keep has got only "auto" and

"page" attributes, while XSL defines also "column".

Because of the interoperability with MSO and DTP,

fo:hyphenation-keep="page" is interpreted as

XSL "column", avoiding hyphenation at the end

of column, not only at the end of page.

User interface

The following screenshot shows the new “Hyphenate across column and page” option on “Text Flow” pane of Paragraph formatting dialog window: the last hyphenated line “except that it has at-” was shifted to the next page. Also the clean-up of “Hyphenate words in CAPS/last word” options is visible.

5.2Support column, page and spread types, DOCX import

Recent OpenDocument standard has not fully adopted values of the XSL attribute hyphenation-keep: value “column” is missing (https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep). Also according e.g. to New Hart Rules for English (OUP, 2005, Line Endings”, p. 40, cited by R. Green in tdf#132599), it’s a typographical requirement to avoid hyphenation in the last line of a spread, i.e. a visible page pair. That is why CSS 4 defines “spread”, not only “page” and “column” to control hyphenation.

User interface

The new user interface of LibreOffice Writer 24.8 with Hyphenation across “Column”, “Page” and “Spread” on Text Flow pane of the Paragraph settings:

 

Following screenshots show layout of the test documents in LibreOffice Writer 24.8.

Don’t hyphenate across a page

 

LEFT: Hyphenation across everywhere (hyphenation-keep="auto"). RIGHT: No hyphenation across, resulting shifted hyphenated line (hyphenation-keep="page", which means the default loext:hyphenation-keep="column" for interoperability reasons).

Don’t hyphenate across a spread

 

LEFT: Hyphenation across column and page, but not spread (hyphenation-keep="page", loext:hyphenation-keep-type="spread"). Shifted hyphenated line on the first (right-hand) page. RIGHT: same settings, but inserting a page break at the start of the document resulted missing shifting, because the bottom hyphenated line is on the second (left-hand) page.

Don’t hyphenate across a column

 

LEFT: No hyphenation across (hyphenation-keep="auto"). Shifted hyphenated line in the first column of the multi-column page. RIGHT: Hyphenation across a column. No shifted line in the first column (hyphenation-keep=”page", loext:hyphenation-keep-type=”page").

Hyphenate across a column, except in the last one

 

LEFT: Hyphenation across a column. No shifted line in the first column (hyphenation-keep=”page", loext:hyphenation-keep-type=”page"). RIGHT: same settings, but the last hyphenated line shifted in the last column, because that line is the last line of the page, too.

Developments (Writer core, DOCX filter and help content)

Details of the core and DOCX filter developments, also extending LibreOffice help with the new paragraph settings:

Commit

Description

6e8819f29b6051a0e551d77512830539913ec277

tdf#132599 cui offapi sw xmloff: add hyphenation-keep-type

 

Support XSL attribute "column" and CSS 4 attribute "spread",

stored in loext:hyphenation-keep-type, to give better control

over hyphenation-keep. E.g. spread: both parts of a hyphenated

word shall lie within a single spread, i.e. when the next page

is not visible at the same time (e.g. the next page is not a

right page of a book).

 

– css::style::ParaHyphenationKeep is a boolean property now,

  importing hyphenation-keep = "page" as true.

 

– type of ParaHyphenationKeep, including the new non-ODF types

  is stored in the new ParagraphProperties::ParaHyphenationKeepType.

 

– default value of ParaHyphenationKeepType is COLUMN for

  interoperability.

 

– Add checkboxes to Text Flow -> Hyphenation Across in

  paragraph dialog:

 

  * Column (previously: Hyphenate across column and page)

  * Page

  * Spread

 

  – enabling/disabling them follows XSL/CSS 4/loext, i.e.

    possible combinations:

 

  * No Hyphenation across

    (hyphenation-keep = "page" and loext:hyphenation-keep-type = "column")

 

  * Hyphenation across [x] Column

    (hyphenation-keep = "page" and loext:hyphenation-keep-type = "page")

 

  * Hyphenation across [x] Column [x] Page

    (hyphenation-keep = "page" and loext:hyphenation-keep-type = "spread")

 

  * Hyphenation across [x] Column [x] Page [x] Spread

    (hyphenation-keep = "auto")

 

– Add ODF import/export

 

– Update DOCX import

 

– Add ODF unit tests

 

Note: recent implementation depends on widow settings: disabling widow

handling allows hyphenation across columns and pages not only in table

cells.

 

Note: RTF import-only, but not used bPageEnd has been renamed to bKeep.

Depending on the RTF test results, likely it will need to disable

the layout change, e.g. GetKeepType()=ParagraphHyphenationKeepType::AUTO,

if PageEnd uses obsolete hyphenation rule, i.e. shifting only the

hyphenated word to the next page, not the full line.

 

More information:

 

– COLUMN (standard XSL value, defined in

  https://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#hyphenation-keep)

 

– SPREAD and ALWAYS (CSS 4 values of hyphenate-limit-last,

  equivalent of hyphenation-keep, defined in

  https://www.w3.org/TR/css-text-4/#hyphenate-line-limits).

 

c8ee0e8f581b8a6e41b1a6b8aa4d40b442c1d463

tdf160518 DOCX: import hyphenation-keep to fix layout

 

To fix layout interoperability, import DOCX compatSettings

allowHyphenationAtTrackBottom and useWord2013TrackBottomHyphenation

as hyphenation-keep setting "COLUMN", shifting last hyphenated

lines of pages and columns, like MSO does.

58350a811a8001f72b13f6ca3def5f32ea904e72

tdf#132599 add "Hyphenation across" options

Document new options of LO 24.8 to control hyphenation

in last line of a column, page or spread.

5.3Support hyphenation-keep in linked frames, in tables and last full line of paragraphs, DOCX export

The code behind Hyphenation across” has been generalized for all possible page changes, including linked frames, also columns in tables and linked frames.

Improved interoperability in Writer’s DOCX export

The DOCX export broke the layout of the documents created in Writer, for example, resulted more pages in MS Word in the case of hyphenated paragraphs. This problem was fixed by adding the missing allowHyphenationAtTrackBottom DOCX compatibility setting.

The following composite screenshots show the DOCX export in MSO (red text), which was 3-page before the fix (top row). After the fix, the result is 2-page in MSO (bottom row), as in Writer (black text). Test document:

 

DOCX export of hyphenation enabled in “Text body” style

Hyphenation was lost, if it was enabled only in “Text body” instead of the default paragraph style. Now Writer exports hyphenation in this case, too, which is more common for documents created in Writer.

Last full line of paragraphs

CSS 4 „always” was implemented as Hyphenate across → Last full line of paragraph. The hyphenated word of the last full line of the paragraph moves to the last line (if there is enough place for it). This results in longer last lines, and removed hyphenation in the bottom right-hand corner of the paragraph.

 

LEFT: missing recognition of hyphenate-keep-type="always" in the last paragraph. RIGHT: correct layout: hyphenated word of the last full paragraph was shifted to the last paragraph line.

User interface

The new user interface of LibreOffice Writer 24.8 with Hyphenation across “Column”, “Page”, “Spread” and the newest Last full line of paragraph on Text Flow pane of the Paragraph settings. The test document and its screenshot show that the hyphenated line was shifted to page 2, according to the Hyphenation across → Page setting:

 

Tables

Now Hyphenation across” works in tables, too, removing the widow setting dependency of the previous implementation:

 

 

LEFT: missing shifting between the split table cell, with hyphenation-keep-type="spread". RIGHT: correct layout.

Linked frames on the same pages

Linked frames are like columns on the same pages, now with correct layout:

 

LEFT: bad shifting between the linked frames on a single page, with hyphenation-keep-type="page". RIGHT: correct layout.

Linked frames not on the same spread

With hyphenation-keep="spread", blank left pages weren’t handled correctly, also linked frames anchored only on different right pages.

 

LEFT: missing shifting between linked frames on right pages, with hyphenation-keep-type="spread". RIGHT: correct layout.

Linked frames on the same spread

Spread is still recognized with linked frames on left and right pages:

Middle frame on the second (left) page ends in a hyphenated line, according to hyphenation-keep="spread".
 
 

Developments (Writer core, DOCX filter and help content)

Details of the core and DOCX filter developments, also extending LibreOffice help with the new paragraph settings:

Commit

Description

c8a99cb8dce54de506ba66d1cc0818b9b5f7858b

tdf#132599 sw schema xmloff: add hyphenation-keep-type='always'

Add new hyphenation option to limit hyphenation of the last full line of the hyphenated paragraph. Move also loext:hyphenation-keep-type to paragraph-properties, following the associated hyphenation-keep. Note: value "always" is defined by CSS 4 hyphenate-limit-last,

see https://www.w3.org/TR/css-text-4/#hyphenate-line-limits.

d4304cd0a4fedd0117fea3625dff1fca2945a0e6

tdf132599 sw: fix hyphenation-keep for linked frames, also for spreads

Linked text frames are hyphenated as columns on the same page,

i.e. do not shift the hyphenated line, if hyphenation-keep-type="page" or "spread". For "spread", check also that the hyphenated line is on the previous left page, because checking only right page wasn't enough for linked text frames and blank left pages.

a4970f4eeb94b8c405c5e3ec094d47061253efac

tdf#132599 sw: fix hyphenation-keep for tables and no widow

Now hyphenation-keep works without widow settings, too, e.g. in tables (where despite the existing widow settings, widow handling is always disabled).

9668c9b8fe1d4afba335ab1f9d3309ad91bd56da

tdf#132599 sw: fix test of "fix hyphenation-keep for tables and no widow"

The problem was reported by Miklós Vajna.

016d61f529f9d9ec2520fb7a808da41cf17d7295

tdf#132599 sw: fix unit tests for hyphenation-keep with frames

Fix en_US language of the test documents to be consistent with the hyphenator condition in the related unit tests of commit d4304cd0a4fedd0117fea3625dff1fca2945a0e6 "tdf132599 sw: fix hyphenation-keep for linked frames, also for spreads".

The problem was reported by René Engelhard.

b538729c90af470c33aeb3002750321ac8ac88be

tdf#160518 sw: fix DOCX import/export of hyphenation-keep

– export hyphenation-page="page" setting of native ODF documents, if hyphenation is enabled in the default paragraph or in the text body style with this setting. It's lossless for hyphenation-keep-type="column", while the other values are converted to hyphenation-keep-type="column", which is the default layout of MSO 2013 and later.

– fix LO roundtrip of DOCX documents which were created in MSO originally: while the roundtrip kept useWord2013TrackBottomHyphenation and allowHyphenationAtTrackBottom, the exported redundant suppressAutoHyphen = "false" settings of the paragraph resulted broken layout in Writer, because the repeated import overwrote every paragraphs with bad hyphenation setting (hyphenation-keep = "auto" instead of hyphenation-keep = "page").

– export also "Hyphenate CAPS" and "Hyphenation zone" settings,  if hyphenation is enabled in text body style with these settings, and not in the default paragraph style. Setting hyphenation only in "Text Body" is more common in documents created in LibreOffice.

0d5b1a072e025a692cee803310d2ceff0296b083

help: tdf#132599 add "Hyphenation across" -> Last full line of paragraph

Document new option of LO 24.8 to control hyphenation in last full line of a paragraph. Fix also the changed IDs of the other "Hyphenation across" options.

Manual tests

The patches contain several unit tests. The next manual tests list only the most important bug fixes:

Fixed DOCX export

1. Open tdf160518_auto_in_default_paragraph_style.fodt (attached to Bug 160518, as “2-page flat ODF” document). The document is 2 pages.

2. Save it in the format “Word 2010–365 Document (.docx)”.

3. Open the result in MS Word: the document is 2-page long, as in Writer. (The old export was 3-page long.)

Disable hyphenation of last full line of paragraphs

1. Open tdf132599_always.fodt (attached to Bug 132599, as “flat ODF test document for "Last full line of paragraph"”). The last full line of the last paragraph is not hyphenated. The previously hyphenated word (“celestial”) is shifted to the last line. (This feature wasn’t supported before.)

2. Click on the paragraph settings of the last paragraph, and enable “Last full line of paragraph” in Text Flow → Hyphenate Across. The word “celestial” is hyphenated.

Tables

1. Open tdf132599_page_in_table.fodt (attached to Bug 132599 as “test document: In tables, do not hyphenate across spread (3-page document)”). The document is 3-page long. (This was 2-page long because of missing support of Hyphenation across in tables.)

2. Click on the paragraph settings of the paragraph, and enable “Spread” in Text Flow → Hyphenate Across. The document is 2-page long, because the hyphenated line “except that it has an at-” is allowed to be bottom of page 1, which is on the right in its spread.

Linked frames

1. Open tdf132599_frames_on_same_page_hyphenation.fodt (attached to Bug 132599 as “test document: Hyphenation across column in linked frames”). Bottom of the left frame is the hyphenated line “space, ex-”. (Previously this line was shifted to the next frame.)

2. Click on the paragraph settings of the paragraph, and disable “Column” in Text Flow →  Hyphenate Across. The hyphenated line “space, ex-” is shifted to the next frame.

Linked frames only on following right pages

1. Open tdf132599_frames_on_right_pages_no_hyphenation.fodt (attached to Bug 132599 as “test document: In linked frames on right pages, do not hyphenate across spread”). The second frame on page 3 starts with the shifted hyphenation line “space, ex-”, according to the disabled Hyphenation across →  Spread setting, because the first frame is on page 1 (a different spread). (This was broken before, because Writer didn’t check, that the page before the right page text content is a left page, i.e. on the same spread, or not).

2. Click on the paragraph settings of the paragraph, and enable “Spread” in Text Flow →  Hyphenate Across. The hyphenated line “space, ex-” is shifted to the bottom of the first frame.

6No Break context menu and visualization

Hyphenated words got a new context menu item “No Break” to disable their hyphenation using the new “Exclude from hyphenation” character formatting. The context menu item remains available for the words with disabled hyphenation to enable their hyphenation again.

The other usability problem was the incomplete user interface of the new character formatting “Exclude from hyphenation”: it was not possible or very hard to notice the words which removed from hyphenation. Now these words got a light gray dotted underline, when Show Formatting Marks mode is enabled.

 

New “No Break” context menu of hyphenated words, and light gray dotted underline visualization of words with disabled hyphenation. (Note: no visualization for the previous workaround, the word with language setting “None” in the second paragraph.)

6.1Developments

Added a new dispatcher call .uno:NoBreak for the context menus. The menu item “No Break” is visible only, if there is a hyphenated word or a word with No Break formatting under the text cursor (with or without selecting the word). The light gray text formatting is conditional, and not visible in the PDF export, and with disabled Show formatting marks.

Commit

Description

2f0c7d5691acd4010443856788a54b0abc03098b

tdf#161563 tdf#161565 sw: add No Break to word context menu & visualize

Add No Break option to context menu of words hyphenated automatically, giving as easy access to fix paragraph layout, as context menu of misspelled words – like DTP software do. Also add this option to context menu of words with enabled "No Break" to disable it.

To avoid unwanted paragraph layout during further text editing or formatting, visualize words excluded from hyphenation with a light gray dotted underline, when Formatting Marks is enabled.

Follow-up to commit b5e275f47a54bd7fee39dad516a433fde5be872d

"tdf#106733 sw: implement CharNoHyphenation" and

commit 73bd04a71e741788a2f2f3b26cc46ddb6a361372

"tdf#106733 xmloff: keep fo:hyphenate in character formatting".

41916d9fb045654fa19b4eac90a3099550a890f7

tdf#161563 sw: show "No Break" context menu only on a whole word

It's possible to set CharNoHyphenation on shorter character sequences, than a word, but the result is not correct (use soft hyphens for alternative hyphenation within words), so limit "No Break" menu item only for selected words. (Not completely, because only Point() is checked for word boundary yet, not also Mark().) If no selection, cursor position must be within the hyphenated word (where "No Break" applied for the whole word automatically).

This fixes also the assert in SwTextFrame::IsInHyphenatedWord(),

when multiple nodes were selected.

b0b691aa32719aa0d41bc0f72480cc455bc414ec

tdf#161563 sw: fix invisible light gray underline for No Break

Light gray underline visualization depended on IsShowHiddenChar() instead of the correct IsViewMetaChar() (Show Formatting Marks).

6.2Manual testing

  1. 1.Open the test documents tdf106733.fodt or tdf106733_LinuxLibertineDisplayG.fodt (attached to Bug 106733). The words with enabled “Exclude from hyphenation” got a light gray underline. 

  2. 2.Click on the Show Formatting Mark (paragraph mark) icon to disable and enable the light gray underline. 

  3. 3.Open the context menu of the hyphenated word in the first paragraph. Choose the first item “No Break” to disable its hyphenation. The word is not hyphenated any more and got a light gray underline. 

  4. 4.Open the context menu of the word with the light gray underline, and choose No Break again. The word is hyphenated again, and no more light gray underline. 

7DOCX interoperability fixes

7.1Support of maximum consecutive hyphenated lines

Value “Maximum consecutive hyphenated lines” wasn’t imported from DOCX files, and the associated ParaHyphenationMaxHyphens wasn’t exported to the OOXML document setting consecutiveHyphenLimit, losing layout interoperability.

Note: OpenDocument interoperability is possible here, false information on page 61 in Eckert et al.: Document Interoperability – Open Document Format and Office Open XML, Fraunhofer Verlag, 2009.

7.2Fix overshrank lines in smart justify

As a regression, smart justify, i.e. space shrinking resulted overshrank lines, i.e. lines with removed spaces and overlapping words when the line hyphenated only in the first call of SwTextGuess::Guess(). (First call calculates the available spaces, the second call makes the  final line break.):

 

After the fix, skipping hyphenation completely:

The problem was solved with the reiinstantiation of the SwTextGuess object for the optional second call.

Note: also Caolán McNamara (Collabora Productivity) made a SwTextGuess fix related to a compiler-based code analysis, which was an alternative solution for the reported test document.

7.3Default hyphenation zone

Default hyphenation zone is not zero in OOXML, but ¼ inch, according to the standard (see w:hyphenationZone, ECMA–376 – Offixe Open XML 1st Edition). Because DOCX export of Writer didn’t contain its default zero hyphenation zone, MSO imported the document with a non-zero hyphenation zone, potentially losing the text layout. For example, with normal 11 pt font size, hyphenations al-legory” or fi-nalare disabled with ¼ inch hyphenation zone.

As a continuation of the implementation of the hyphenation zone in Bug 149421, the default hyphenation zone ¼ inch was added to the DOCX import. Also the zero hyphenation zone is always exported, i.e. in case of documents created in Writer, solving the text layout difference.

Note: it seems, MSO doesn’t follow its own standard, because it uses unknown default values in some languages, e.g. bigger as ¼ inch, depending on the language of the operating system. For example, the LibreOffice_tracked-changes_bug.docx of Bug 161628 got 425 twips (~0,75 cm) from Office 365 instead of the standardized 360 twips (¼ inch) on a Hungarian operating system. Microsoft “Open specification” mentions the difference, but  without the details: https://learn.microsoft.com/en-us/openspecs/office_standards/ms-oe376/660d0f16-dffb-48ea-a25d-7210fb2f2a7a.

7.4Analysis of the test document of Bug 149421 (hyphenation zone)

Test document hyphenation_zone.docx (Bug 149421) shows different line break (közvet=lenül/közvetle=nül), while the hyphenated word “közvetlenül” has the same possible hyphenation points in Writer and MSO (köz=vet=le=nül):

 

Composite image: red – MSO, black – Writer

Disabling hyphenation zone (in MSO, setting 0.01 cm) didn’t modify the hyphenation, so the difference is not related to the hyphenation zone. There is no justification and smart justify here (the test document is created in MSO 2010, so it doesn’t use smart justify for justified lines). Choosing sub hyphenation seems to be a bug in MSO, and it needs more investigation.

7.5Developments

Commit

Description

64365dfa67d5a1d8fbc710238a4ea9c492645de4

tdf#161643 sw DOCX import/export of maximum consecutive hyphenated lines

Fix line break interoperability by importing w:consecutiveHyphenLimit to ParaHyphenationMaxHyphens, and exporting ParaHyphenationMacHyphens to w:consecutiveHyphenLimit in OOXML import/export filters.

ca540209a8c20a2734f180d4706d5153bdf64523

tdf#160170 sw: fix overshrunk justified lines at hyphenation

Smart justify uses 2 SwTextGuess::Guess() calls to break a line, but using the same SwTextGuess object resulted overshrunk lines, if the first call resulted hyphenation, because of the bad state of the object for the second call. If we need a second call, now instantiate a new object for it. Regression from commit 36bfc86e27fa03ee16f87819549ab126c5a68cac "tdf#119908 tdf#158776 sw smart justify: shrink only spaces".

Note: the reported test document was already fixed by commit f050103c3324d878b310f37429ea3580a8230905 "stale hyphenation data after skipping blanks".

83733601124f611938c365426485d0001e1fe454

tdf#160170 sw: test for fix overshrank lines with hyphenation

Follow-up to commit ca540209a8c20a2734f180d4706d5153bdf64523

"tdf#160170 sw: fix overshrunk justified lines at hyphenation".

8d8bc48b5efacde6f99d78a557cd052ce9e0ed07

tdf#161628 DOCX import: set default hyphenation zone (1/4 inch)

Default value of hyphenationZone is 360 twips (0.25"). Apply this value, if hyphenationZone is not defined, according to the OOXML standard.

Follow-up to commit 5a079652c1b1f968a851f47995b0a65b84d2d192 "tdf#149421 DOCX: import/export hyphenation zone".

89a80d637e2831d49cdf48921f961b04fd03cffc

tdf#161628 sw DOCX: export zero hyphenation zone, if it's not defined

To keep the layout of the document, export zero hyphenation zone instead of nothing, otherwise it would be 360 twips after importing the document with the default hyphenation zone.

7.6Manual testing

Maximum consecutive hyphenated lines

  1. 1.Open 2007351228.docx (test document of Bug 76163). Check its paragraph setting Maximum consecutive hyphenated lines on Text Flow page in Format → Paragraph → Paragraph… dialog window. Value of the setting is 1 (not zero). 

  2. 2.Save the document in a different place in DOCX format, and reload it. The value is still 1. 

Fix overshrank lines in smart justify

  1. 1.Open the test document tdf160170.fodt (Bug 160170), and check the first line: it contains spaces. 

Export zero hyphenation zone of new documents

Create a new document with hyphenation, and with zero hyphenation zone (default in Writer):

  1. 1.Put the cursor in a paragraph in Default Paragraph Style. 

  2. 2.Choose  Format→Paragraph… Text Flow, and enable Hyphenation. The default hyphenation zone is zero. 

  3. 3.Export the document in DOCX 2010–365 format. 

  4. 4.Reload it, and check hyphenation zone: it is still zero. 

  5. 5.Load the exported document in MSO: search hyphenation settings in the search bar, and check hyphenation zone: it is zero (it was 360, 425 etc. before fixing the export). 

Import default OOXML hyphenation zone

  1. 1.Open the test document LibreOffice_tracked-changes_bug.docx (Bug 161628), which does’t contain w:hyphenationZone definition. 

  2. 2.Check the value of the hyphenation zone in Format→Paragraph… Text Flow: it’s ¼ inch (0,63 cm), not zero, like in MSO 

 

László Németh

2024-07-02

 

This project was funded through the NGI0 Entrust Fund, a fund established by NLnet Foundation with financial support from the European Commission's Next Generation Internet programme. More information: https://nlnet.nl/project/LO-Typography/