| Languages: | go to language |
Our hyphenators are not based on a hyphenated dictionary data base.
New hyphenator languages: Kazakh (Latin/Cyrillic), Khmer, Northern Kurdish, Swahili, Xhosa, Zulu, Hebrew, Irish/Gaelic (see Windows Unicode Demo).
Recent updated hyphenator languages: Galician, Finnish, Norwegian, Danish, Icelandic, German, Swiss German, Frisian/Frysk, Afrikaans, Turkish, Swedish, Norwegian, Russian, Romanian, Portuguese.
Updated hyphenators are built on larger learning corpora and are validated on larger
shadow corpora.
77 language modules
Dutch
(Update July 2009, ε < 0.0044 ‰)
supports the generally accepted spelling (the
Netherlands), progressive spelling (Belgium), and the 1996 & October 2005 spelling
reforms
— four
principles have been integrated in one hyphenator. Supports the Belgium, Surinam and Dutch idiom. The
hyphenator recognizes compound boundaries and covers the Dutch idiom in the most extensive way.
visit download page |
view
a Dutch example (PDF) | view
a newspaper mistake
English
(Update January 2008, ε < 0.0098 ‰)
supports phonetical hyphenation according to the world's most trusted
dictionaries: Webster's Third New International Dictionary, Webster’s
New Twentieth Century Unabridged Dictionary
(2nd edition), and
Longman’s Dictionary of Contemporary English; based on an unabridged
learning corpus, coming close in size to Webster's Unabridged
Dictionaries; a common hyphenator is
available for the British, Canadian, and American idiom. The hyphenator
solves the irregularity of the alternation of English strong and weak
syllables. The new double layer model enables the user to disregard
certain secondary divisions (adapt~able
instead of adapt~a~ble)
. Hyphenation agrees
with The Oxford Colour Spelling Dictionary (1995).
Compared to the other dictionaries, this last dictionary has fewer
syllables.
The English module has separate entries for British, American, Canadian, Australian, New Zealand and South African
English.
visit
download page | view
an English example (PDF)
German old (1980) and new (1996, 2006)
(Update July 2009, ε < 0.0045 ‰)
supports every characteristic German hyphenation according to the most recent Duden
Rechtschreibung August 2006. For German reformed two hyphenation styles have been implemented, one in agreement
with the Duden(s) 1996-2004, and one using strict eingedeutschte syllables.
The German hyphenator recognizes compound boundaries independent of the spelling reform.
The new feature for “der Verwendung von Großbuchstaben SS für ß” correctly hyphenates both “Schreibungsweisen”.
A special effort has been made to support medical and other scientific domains.
The German hyphenators have been compared to over two million German, Swiss German & Austrian German words as an independent estimate of accuracy.
visit
download page | view
a German example (PDF) | view
a newspaper mistake
Swiss German old (1980) and new (1996, 2006)
(Update July, 2009)
responds
accurately to the typical Swiss German deviations and local idiom
(including the ß
to ss transcription, Stra~ße comes Stras~se (not Stra~sse)).
visit
download page | view
a Swiss German example (PDF) | view
a newspaper mistake
Austrian German old (1980) and new (1996, 2006)
(Update July, 2009)
responds
accurately to the typical Austrian German peculiarities and local idiom.
visit
download page | view
a German example (PDF) | view
a newspaper mistake
French
(two versions, Update March 2008)
accepts etymological syllabification according to Grevisse’s “le bon
usage.” A second version accepts phonetical
hyphenation rules recommended by the leading French linguist Nina
Catach in Paris.
Both versions use the new double layer technique to enable or disable
hyphenation of muettes.
Covers French idiom nearly completely.
visit
download page | view
a French example (PDF) |
view
a French example (not hyphenating muettes)(PDF)
Spanish
(Update January 2006, ε < 0.0035 ‰)
supports the official hyphenation rules as published
by large dictionary publishers; completely covers the Spanish and Latin
American
idiom.
visit
download page | view
a Spanish example (PDF)
Italian
(Update June 2006, ε < 0.0008 ‰)
supports phonetical hyphenation, in Italian: "la sillabazione: basata prevalentemente sul criterio
di tenere uniti i gruppi consonantici attestati, anche una sola volta, come iniziale di parola".
In addition the new hyphenator handles hiatuses accurately, elisions (al-l’I.ta-lia),
conjugations, declensions,
and words that came from English and other foreign languages (beat-nik and not be-at-nik).
visit
download page | view
an Italian example (PDF)
Iberian and Brazilian Portuguese (previous and acordo ortográfico)
(Update November 2009, ε < 0.007 ‰)
based on the vowel as the syllabic unit,
but falling diphthongs and final diphthongs are kept together. Both
Iberian and Brazilian idioms are
supported by the Portuguese hyphenator engine. This engine supports doubling of the hyphen
(repetir o hífen na linha sequinte).
visit
download page | view
a Portuguese example (PDF)
Czech
(Update November 2007)
supports the reformed spelling. As is the case in every Slavic
language, a number of
additive vowels and consonants exists, which have a large impact on
hyphenation.
Syllables that solely consist of consonants are supported (ji-tr-nice).
visit
download page
Slovak
(Update November 2007)
supports the standard Slovak orthography. As is the case in every Slavic
language, a number of
additive vowels and consonants exists, which have a large impact on
hyphenation.
Syllables that solely consist of consonants are supported (ji-tr-nice).
visit
download page
Swedish
(Update September 2008)
accepts the mekaniska principen, but
compounded words are divided into their morphological roots. An
overwhelming
occurrence of compounds, and newly created forms, makes it a challenge
worth accepting.
You can switch between c-k or ck- hyphenation, and between within-word
vowel-vowel hyphenation.
The library version 6.2.1 also supports morphological hyphenation as specified
in the Svenska Akademiens ordlista över svenska språket.
visit
download page | view
a Swedish example (PDF)
Finnish
(Update February 2010)
is tuned in to the peculiarities of the Finnish
language and shares attributes with all Finno-Ugric languages. It has a
rich
structure, including a large number of falling and rising diphthongs.
The phonetical
base of the syllable is accepted, here, fully hyphenated despite it’s
overwhelming
inflection structure. You may find its resemblance to the neighboring
Estonian remarkable.
visit
download page | view
a Finnish example (PDF)
Catalan
(Update August 2007)
supports the mixed French and Spanish origins
of the Catalan language. A peculiarity of Catalan, needing special
care, is the l
geminada (l·l).
visit
download page | view
a Catalan example (PDF)
Danish
(Update January 2010)
accepts the hyphenation rules of the Dansk
Sprognævns Retskrivningsordbog. Compounds and newly created forms are
supported; please note that it even hyphenates Norwegian according to
consonant rules.
Norwegian/Nynorsk
(Update January 2010, ε < 0.0047 ‰)
accepts consonant rules
or the morphological rules of the Nordisk institutt of the
University of
Bergen. The principles of pattern recognition are put into practice
on Nynorsk as well; one hyphenator engine for mixed language applications
visit
download page | view
a Norwegian example (PDF)
Icelandic
(Update January 2010, ε < 0.021 ‰)
accepts morphological rules which separate
the attached article and nominative, dative, accusative, and genitive
cases and is
capable of dividing a pileup of compounds.
visit
download page |
view an Icelandic
example (PDF)
Estonian
(Update October 2008)
behaves like the Finnish hyphenator and is capable
of correctly hyphenating Estonian compounds and diphthongs. However,
there are
more diphthongs in the Estonian language than in the Finnish language
which increases
complexity. Taken widows in to account we hyphenate
as las~te~aia~laps and not as las~te~ai~a~laps
visit
download page |
view an Estonian
example (PDF)
New Greek
(Update January 2005)
is tuned in to the Greek script, the Elot codepage or Unicode. It
hyphenates more than between alpha and omega — not just the beginning
and the end (Classical Greek),
but a new era in progress (Modern Greek). Present-day Greek has evolved
and is flourishing with
diacritics.
visit
download page
Polish
(Update February 2008)
hyphenation of the Polish language is hindered by an
immense number of consonants, quite often unpronounceable for
non-Polish
speakers. However, the hyphenator has been fully adapted to these
difficult
syllables.
visit
download page
Latvian
(Update December 2007)
is tuned in to the properties of
Baltic languages. Words are richly declined. Latvian uses additional
consonants
and vowels, which are recognized by the hyphenator.
visit
download page
Azerbaijanian
(Update November 2009)
is one of the new Transcaucasian
republics that are now independent from the former USSR.
Azerbaijanian is related to Turkish.
The Azerbaijani now use a Latin script. There is no standard Byte CodePage script.
visit
download page
Turkish
(Update November 2008, ε < 0.007 ‰)
Present-day Turkish is spoken in SW Asia, but in
earlier times the Turkish region reached into the north of China. In
Chinese history, the
name Tu-kiu was mentioned 600 years ago. Turkish is characterized by a
lot of
additive particles that change the meaning of a word. A word can take
numerous forms and different parallel hyphenations.
visit
download page | view
a Turkish example (PDF)
Lithuanian
(Update December 2007)
is one of the Baltic languages which is richly
declined. The (semi-)diphthongs, palatals, and affricates have
been taken into consideration for hyphenation.
visit
download page
Afrikaans
(Update November 2009)
the Afrikaans language evolved from
17th-century Dutch and is an official language of South Africa. Its
hyphenation has
much in common with the Dutch language. Afrikanization of spelling has
given the Afrikaans
language its own identity. The Afrikaans
hyphenator takes all Afrikaans peculiarities into consideration,
including diaeresis
hyphenation.
visit
download page |
view an Afrikaans
example (PDF) |
die taal en die passende tegnologie (PDF) |
http://www.litnet.co.za/taaldebat/talo.asp
Russian
(Update July 2007)
accepts Cyrillic characters, but does not complicate hyphenation. It is the nature of
the Russian language: an abundance of prefixes and suffixes, modifying
different moods in a fine gradation.
The hyphenator has learned from a corpus of over a
million Russian words.
visit
download page
Basque
(Update April 2008)
the Basque language is one of Europe’s most exotic minority languages,
probably
unrelated to any other language in the world. The Basque hyphenator is
tuned in to all those peculiarities of real-life language.
visit
download page |
view a Basque/Euskara
example (PDF)
Hungarian
(Update April 2006)
the Hungarian language has lost many of its Uralic characteristics and
many words have been
borrowed from the Turkish and European languages. The language is
flavoured with compounds
and special hyphenations (briddzsel -> bridzs-dzsel).
visit
download page | view
a Hungarian example (PDF)
Bahasa Indonesia
(Update June 2005)
the Bahasa Indonesia (Standard Indonesian) is an
Austronesian language full of prefixes, suffixes, infixes, in general
terms affixes including large classes of sound changes. Hyphenation is
inextricably tied to meaning, even when the boundaries are masked by
sound changes (mengarang from meng + karang) hyphenation
is affected.
visit
download page | view
a Bahasa Indonesian example (PDF)
Bahasa Melayu
(Update June 2005)
What holds for Bahasa Indonesia applies as well to Bahasa Melayu.
visit
download page
Bulgarian
(Update July 2007)
is spoken by 90 % of the population of Bulgaria, 7 million
people. Modern Bulgarian alphabet is the same as the Russian
alphabet.
visit
download page
Serbian
(Update November 2007)
Serbian or srpski jezik is written in the Cyrillic alphabet.
Serbian is closely related to Croatian, however, Serbian characters are
written with single symbols
Џ, Љ, and Њ.
(Dž, Lj, Nj ).
Like words in any Slavic language Serbian words can have many prefixes
to be hyphenated.
visit
download page
Galician
(Update July 2010)
The Galician language is now spoken in Spanish Galicia, situated
north of Portugal. It is a Romance language related to
Portuguese. The orthography differs slightly from Spanish.
visit
download page
Rhaeto-Romance
(Februari 2002)
is the collective for three Romance dialects spoken in the
northeastern Italy and southeastern Switzerland.
visit
download page
Romanian
(Update August 2007)
is the national language of Romania. It is a Romance language
written in the latin script. One third of all Romanian words are of
French origin. S|s-comma below (ș) and T|t-comma below (ț) are supported
visit
download page
Croatian
(Update November 2007)
or hrvatski jezik is written in the Latin alphabet. Croatian is
closely related to Serbian. Croatian includes a few digraphs which
sound like a single consonant (Dž, Lj, Nj ).
Like words in any Slavic language Croatian words can have many prefixes
to be hyphenated.
visit
download page
Bosnian
(Update November 2007)
or Bosanski Jezik exists
since Bosnia & Herzegovina became independent. Bosnian has
developed its own identity,
written in Latin and closely related to Croatian.
visit
download page
Slovene
(Update November 2007)
or Slovenski jezik is written in the Latin alphabet. Slovene
includes a few digraphs (Dž, Lj, Nj).
Slovene has many prefixes and inflections. Some syllables divide
consonants only:
hm-kniti, kr-tina, tr-den.
visit
download page | view
a Slovene example (PDF)
Thai
(Update December 2004)
The Thai people build sentences in a different way. Therefore, the Thai
module
is not a hyphenator in the traditional sense, but it is a word
segmentation tool, that takes
context into consideration.
visit download page |
more on Thai |
view a Thai example (PDF)
Macedonian
(Update July 2007)
is the principal language of the new nation of Macedonia, it is closely
related to Bulgarian and written in the Cyrillic alphabet.
visit
download page
Maltese
(Update January 2006)
is one of the official languages of the islands of Malta, it is
a Semitic language written in the Latin alphabet,
including <ċ> <ħ> <ġ> and <ż>,
the variety of root words has a great impact on hyphenation.
visit
download page | view
a Maltese example (PDF)
Sámi
(Update April 2006)
The hyphenation agrees with the Nord Sámi language as spoken in Finnmark
county in the north of Norway.
visit
download page | view
a North Saami example (PDF)
Hebrew
(December 2006)
is written in Hebrew consonants only and therefore hyphenation is partially uncertain. Within this uncertainty the hyphenator accepts graphical hyphenations.
visit
download page
Irish/Gaelic
(December 2006)
is a Celtic language mainly spoken in Ireland.
visit
download page
Zulu
(September 2008)
is a Bantu language mainly spoken in the Republic of South Africa.
Zulu is one of the 11 South African languages and is very different from Afrikaans
and the other Indo-European language and so is hyphenation: be~nga~ka~la~li, ma~fu~ngwa~se.
visit
download page
Xhosa
(September 2008)
is a Bantu language mainly spoken in the Transkei, Ciskei and Eastern Cape regions of the Republic of South Africa.
Xhosa is one of the 11 official South African languages. It is very different from Afrikaans and the other Indo-European language,
and so is hyphenation: i~si~Mpo~ndo, ye~Bha~nki.
visit
download page
Swahili
(March 2009)
is a Bantu language mainly spoken in East Africa.
Swahili is principal language of Tanzania, Zanzibar, Uganga and many neighbouring countries.
Hyphenation examples are: ne~nda, ni~ra~ku~pi~ga, ni~li~m~pi~ga, u~nga~ma.
visit
download page
Kurdish (Northern)
(July 2009)
belongs to the Iranian group of languages. Kurdish is spoken in Turkey, Iraq, Iran, Armenia, Georgia and Azerbaijan.
The latin script is used for the Northern variety of Kurdish.
visit
download page
Khmer (Cambodia)
(November 2009)

belongs to the Austroasiatic languages. Khmer has its own script known as Aksar Khmer.
In Khmer no spaces are inserted between words. Yet words have to be segmented, even unknown words.
visit
download page
Kazakh (Latin/Cyrillic)
(Update May 2010)

belongs to the Turkic family languages. Kazakh is written in the Cyrillic, Arabic or Latin script.
An official transition to the Latin script could happen in a 10 to 12 year period.
The Kazakh hyphenator (Unicode) supports both the Latin and Cyrillic script.
visit
download page
Under development
Faroese, and Esperanto.