All tongues became different after the Babylonian Ziggurats were hit by God's fury.
Nowadays an even greater variety of languages exist.

Hyphenators — language specifications

Languages: go to language

Our hyphenators are not based on a hyphenated dictionary database, instead they rely on language models.
New hyphenator languages: Marathi (India), Bengali (India), Hindi (India), Malayalam (India), Tamil (India).

Recent upgraded hyphenator languages: Italian, Dutch, Flemish, Surinam Dutch, Afrikaans, Slovak, Czech, Galician, English (5x), French, Danish, Frisian (Frysk), Norwegian (2x), Nynorsk (2x), Swedish (3x), Icelandic, German, Swiss German, Austrian German, Portuguese (Iberian & Brazilian), Finnish, Spanish, Estonian, Thai, Canadian French .
Upgraded hyphenators are built on larger learning corpora and are validated against an even larger shadow corpora.

86 language modules

Dutch, Flemish, Surinam Dutch (Upgrade October 2024, ε < 0.00012 ‰) (Hyphenates coronavirus idiom, and medical terms)
supports the most recent accepted Dutch orthography, and recognizes earlier spelling variants such as the progressive spelling (Belgium), and the 1996 & October 2005 spelling reforms — four principles have been integrated in one hyphenator for the Belgium, Surinam and Dutch idiom. The hyphenator recognizes compound boundaries (Frank~rijk~lei not Frank~rij~klei, or boer~de~rij~wa~ter~ijs~je not boer~de~rij~wa~te~r~ijs~je, farmhouse ice lolly) and covers the Dutch idiom in the most extensive way. The hyphenator recognizes special hyphenations in which characters are replaced, removed or added due to hyphenation (aveetjeave-tje, NOT avé-tje). There are over 1400 cases registered in our corpora.
visit download page | view a Dutch example (PDF) | view a newspaper mistake | view a recent newspaper mistake

English (Upgrade July 2024, ε < 0.00015 ‰) (Hyphenates coronavirus idiom)
supports phonetical hyphenation according to the world's most trusted dictionaries: Webster's Third New International Dictionary, Webster’s New Twentieth Century Unabridged Dictionary (2nd edition), and Longman’s Dictionary of Contemporary English. Based on an unabridged learning corpus, the hyphenator's base is close to Webster's Unabridged Dictionaries. The hyphenator covers the British, Canadian, and American idiom and solves the extreme irregularity of the alternation of English strong and weak syllables. The new double layer model enables the user to disregard certain secondary divisions (adapt~able instead of adapt~a~ble). Hyphenation agrees with Oxford Colour Spelling Dictionary (1995). Compared to the other dictionaries, this last dictionary lists fewer hyphenation locations then there are syllables in a word (amazone instead of am~a~zon).
The English module has separate entries for British, American, Canadian, Australian, New Zealand and South African English, and supplies you with superior hyphenation and hyphenation density.
Be careful, some algorithms erroneously divide words as "new-spaper", "whites-pace", etc.
visit download page | view an English example (PDF) | view a newspaper mistake | view a book-printing mistake

German old (1980) and new (1996, 2024 plus) (Upgrade October 2024 with a new language description model, ε < 0.0014 ‰) (Hyphenates coronavirus idiom)
supports every characteristic German hyphenation according to the most recent Duden Rechtschreibung August 2006-2017 and Wahrig 2009. For German reformed two hyphenation styles have been implemented, one in agreement with the Duden(s) 1996-2017, and the other prefers the Duden 2017-2024 hyphenations, if meaningful strict eingedeutschte syllables are used. The German hyphenator recognizes compound boundaries independent of the spelling reform. The new feature “der Verwendung von Großbuchstaben ESZETT (ß / ẞ)” correctly hyphenates words written in both “Schreibungsweisen”. A special effort has been made to support medical and other scientific domains. The German hyphenators have been compared to over two million German, Swiss German & Austrian German words as an independent estimate of accuracy. The "schleswig-holsteinischen Bildungsministerin" writes "berufsfel-dorientierend" in her letter, we hyphenate as it has to be: "be-rufs-feld-ori-en-tie-rend".
visit download page | view a German example (PDF) | view a newspaper mistake

Swiss German old (1980) and new (1996, 2024 plus) (Upgrade October 2024) (Hyphenates coronavirus idiom)
responds accurately to the typical Swiss German deviations and local idiom (including the ß to ss transcription, Stra~ße comes Stras~se (not Stra~sse)) and with the addition of Swiss German & Swiss French geographical names.
visit download page | view a Swiss German example (PDF) | view a newspaper mistake

Austrian German old (1980) and new (1996, 2024 plus) (Upgrade October 2024) (Hyphenates coronavirus idiom)
responds accurately to the typical Austrian German peculiarities and local idiom.
visit download page | view a German example (PDF) | view a newspaper mistake

French (two versions, Upgrade May 2024)
accepts etymological syllabification according to Grevisse’s “le bon usage.” A second version accepts phonetica hyphenation rules recommended by the leading French linguist Nina Catach in Paris. Both versions use the new multilayer technique to enable or disable hyphenation of muettes. Covers French idiom nearly completely and has been tested on a full-size French corpus, including extended geographical data of the world (Kamt~chat~ka, Chev~tchen~ki~vie, etc.).
visit download page | view a French example (PDF) | view a French example (not hyphenating muettes)(PDF)

Spanish Peninsular, Argentine, Mexican & Latin American (Upgrade March 2020, ε < 0.0008 ‰)
supports the official hyphenation rules as published in the Ortografía de la lengua española (2010), and large dictionary publishers; completely covers the Spanish and Latin American idiom.
Be carefull with foreign words in Spanish! Do not hyphenate as "Burn the Wit-|ch" (El País, 23, 4-6-2016). It is not a syllable!
visit download page | view a Spanish example (PDF)

Italian (Upgrade October 2024, ε < 0.0008 ‰)
supports phonetical hyphenation, in Italian: "la sillabazione: basata prevalentemente sul criterio di tenere uniti i gruppi consonantici attestati, anche una sola volta, come iniziale di parola". In addition the new hyphenator handles hiatuses accurately, elisions (al-l’I.ta-lia), conjugations, declensions, and words that came from English and other foreign languages (beat-nik and not be-at-nik).
visit download page | view an Italian example (PDF)

Iberian and Brazilian Portuguese (previous and acordo ortográfico) (Upgrade April 2022, ε < 0.006 ‰)
based on the vowel as the syllabic unit, but falling diphthongs and final diphthongs are kept together. Both Iberian and Brazilian idioms are supported by the Portuguese hyphenator engine. This engine supports doubling of the hyphen (repetir o hífen na linha sequinte) and is based on the entire 2022 corpus.
visit download page | view a Portuguese example (PDF)

Czech (Upgrade July 2024)
supports the reformed spelling. As is the case in every Slavic language, a number of additive vowels and consonants exists, which have a large impact on hyphenation. Syllables that solely consist of consonants are supported (ji-tr-nice).
visit download page

Slovak (Upgrade August 2024)
supports the standard Slovak orthography. As is the case in every Slavic language, a number of additive vowels and consonants exists, which have a large impact on hyphenation. Syllables that solely consist of consonants are supported (ji-tr-nice, trvr-dý).
visit download page

Swedish (Upgrade February 2024 (3x), ε < 0.0009 ‰) (Hyphenates coronavirus idiom)
accepts the mekaniska principen, but compounded words are divided into their morphological roots. An overwhelming occurrence of compounds, and newly created forms, makes it a challenge worth accepting. You can switch between c-k or ck- hyphenation, and between within-word vowel-vowel hyphenation.
A third hyphenator model also supports morphological hyphenation as specified in the Svenska Akademiens ordlista över svenska språket (SAOL), however, Mediespråksgruppen (tt.se) tar avstånd från SAOL:s avstavningssystem, yet it has been introduced into the schooling system many years ago. The fact that Mediespråksgruppen rejected SAOL hyphenation might be related to technical hindrances to build a hyphenator just like that. Still it is possible given the use of a proper language model. The latest Swedish hyphenator (February 2024) agrees with SAOL 13/14 morfologiska avstavningar over a full Swedish language idiom.
All Swedish hyphenators have been optimized using a 2.1 million word hyphenation corpus of the Swedish language.
visit download page | view a Swedish konsonantregel example (PDF) | view a Swedish SAOL12 example (PDF)

Finnish (Upgrade June 2021, ε < 0.0005 ‰) (Hyphenates coronavirus idiom)
is tuned in to the peculiarities of the Finnish language and shares attributes with all Finno-Ugric languages. It has a rich structure, including a large number of falling and rising diphthongs. The phonetical base of the syllable is accepted, here, fully hyphenated despite it’s overwhelming inflection structure and a high proportion of compounds (e.g., kort~te~li~ajo (street racing), ko~ru~om~mel (embroidery), tul~kin~ta~ero, do not hyphenate orphans, e.g., tul~kin~ta~e~ro). The hyphenator supports special hyphenation types, e.g., vaa'an > vaa- // an. You may find its resemblance to the neighboring Estonian remarkable. The June 2021 hyphenator got a new extended language model, and learned the hyphenation principles embedded in more than five million words. The latest Finnish hyphenator was optimized over 5,368,692 words resulting in 27,826,228 hyphens inserted.
visit download page | view a Finnish example (PDF)

Catalan (nova ortografia) (Upgrade May 2017)
supports the mixed French and Spanish origins of the Catalan language. A peculiarity of Catalan, needing special care, is the l geminada (l·l). This version supports the spelling reform of October 2016.
visit download page | view a Catalan example (PDF)

Danish (Upgrade April 2024, ε < 0.0001 ‰)
accepts the hyphenation rules of the Dansk Sprognævns Retskrivningsordbog, med ændrede ord- og staveformer (2012). Nearly any compound and newly created neologism is correctly hyphenated; please note that it even hyphenates Norwegian according to consonant rules.
The latest hyphenator also includes hyphenation of COVID and corona-related idiom. The hyphenator is trained on the full size 2024 Danish hyphenation corpus. The language model has been updated.
visit download page | view a Danish example (PDF)

Norwegian/Nynorsk (Upgrade February 2024, ε < 0.0017 ‰)
accepts consonant rules or the morphological rules of the Språkrådet and the Nordisk institutt of the University of Bergen. The principles of pattern recognition are put into practice on Nynorsk as well; one hyphenator engine for mixed language applications using modern Norwegian idiom. The latest hyphenator also includes hyphenation of COVID and corona-related idiom.
visit download page | view a Norwegian example (PDF) | view a recent newspaper mistake

Icelandic (Upgrade November 2023, ε < 0.0018 ‰)
accepts morphological rules which separate the attached article and nominative, dative, accusative, and genitive cases and is capable of dividing a pileup of compounds, even neologisms, e.g.,
ól-ymp-íu-gull, not not ólymp-íug-ull (Olympic Gold);
að~al~aust~ur~dælu~kerf~is~ins, not að~a~l~aust~ur~dælu~kerf~is~ins (main-eastern-pump-system);
flugu~fót~anna, not flug~u~fót~anna (fly's-foot);
kassa~gít~ar, not kassagít~ar (a song's title);
hinu~meg~in not, don't know hinumegin (on the other side);
ses~am~ol~íu not, sesa~mol~íu (sesam-oil)

visit download page | view an Icelandic example (PDF)

Estonian (Upgrade January 2020)
behaves like the Finnish hyphenator and is capable of correctly hyphenating Estonian compounds and diphthongs. However, there are more diphthongs in the Estonian language than in the Finnish language which increases complexity. Taken orphans in to account we hyphenate as las~te~aia~laps and not as las~te~ai~a~laps, or as aa~to~mi~ener~gia~agen~tuu~ri and not as aa~to~mi~e~ner~gi~a~a~gen~tuu~ri.
visit download page | view an Estonian example (PDF)

New Greek (Upgrade February 2016, ε < 0.0015 ‰)
is tuned in to the Greek script, the Elot codepage or Unicode (Greek and Greek Extended). It hyphenates more than between alpha and omega — not just the beginning and the end (Classical Greek), but a new era in progress (Modern Greek). Present-day Greek has evolved and is flourishing with diacritics and highly significant for hyphenation.
visit download page

Latin (February 2016)
is an extinct language, but still spoken in the Roman Catholic church. Together with Greek, Latin had an enormous influence on nearly any European language. The Latin hyphenator divides Latin prefixes, suffixes, and enclitics.
visit download page

Polish (Upgrade February 2016)
hyphenation of the Polish language is hindered by an immense number of consonants, quite often unpronounceable for non-Polish speakers. However, the hyphenator's model has been fully adapted to these difficult syllables and pronunciation of Polish words is the guiding priciple to hyphenate. The hyphenator supports doubling of the hyphen.
visit download page

Latvian (Upgrade February 2016)
is tuned in to the properties of Baltic languages. Words are richly declined. Latvian uses additional consonants and vowels, which are recognized by the hyphenator.
visit download page

Azerbaijanian (Upgrade February 2016)
is one of the new Transcaucasian republics that are now independent from the former USSR. Azerbaijanian is related to Turkish. The Azerbaijani now use a Latin script. There is no standard Byte CodePage script.
visit download page

Turkish (Upgrade February 2016, ε < 0.007 ‰)
Present-day Turkish is spoken in SW Asia, but in earlier times the Turkish region reached into the north of China. In Chinese history, the name Tu-kiu was mentioned 600 years ago. Turkish is characterized by a lot of additive particles that change the meaning of a word. A word can take numerous forms and different parallel hyphenations.
visit download page | view a Turkish example (PDF)

Lithuanian (Upgrade February 2016)
is one of the Baltic languages which is richly declined. The (semi-)diphthongs, palatals, and affricates have been taken into consideration for hyphenation.
visit download page

Afrikaans (Upgrade August 2024, ε < 0.0040 ‰)
the Afrikaans language evolved from 17th-century Dutch and is an official language of South Africa. Its hyphenation has much in common with the Dutch language. Afrikanization of spelling has given the Afrikaans language its own identity. The Afrikaans hyphenator takes all Afrikaans peculiarities into consideration, including diaeresis hyphenation.
visit download page | view an Afrikaans example (PDF) | die taal en die passende tegnologie (PDF)

Russian (Upgrade July 2014)
accepts Cyrillic characters, but does not complicate hyphenation. It is the nature of the Russian language: an abundance of prefixes and suffixes, modifying different moods in a fine gradation. The hyphenator has learned from a corpus of over a million Russian words.
visit download page

Basque (Upgrade February 2016)
the Basque language is one of Europe’s most exotic minority languages, probably unrelated to any other language in the world. The Basque hyphenator is tuned in to all those peculiarities of real-life language.
visit download page | view a Basque/Euskara example (PDF)

Hungarian (Upgrade April 2006)
the Hungarian language has lost many of its Uralic characteristics and many words have been borrowed from the Turkish and European languages. The language is flavoured with compounds and special hyphenations (briddzsel -> bridzs-dzsel).
visit download page | view a Hungarian example (PDF)

Bahasa Indonesia (Upgrade July 2014)
the Bahasa Indonesia (Standard Indonesian) is an Austronesian language full of prefixes, suffixes, infixes, in general terms affixes including large classes of sound changes. Hyphenation is inextricably tied to meaning, even when the boundaries are masked by sound changes (mengarang from meng + karang) hyphenation is affected.
visit download page | view a Bahasa Indonesian example (PDF)

Bahasa Melayu (Upgrade July 2014)
What holds for Bahasa Indonesia applies as well to Bahasa Melayu.
visit download page

Byelorussian (Upgrade July 2007)
is is the language of the new nation of Belarus. It was proclaimed the country's sole official language, but Russian remains
dominant. Byelorussian is written in the Cyrillic alphabet.
visit download page

Bulgarian (Upgrade February 2016)
is spoken by 90 % of the population of Bulgaria, 7 million people. Modern Bulgarian alphabet is the same as the Russian alphabet, however, the pronunciation of the hard and soft sign differs.
visit download page

Serbian (Upgrade July 2014)
Serbian or srpski jezik is written in the Cyrillic alphabet. Serbian is closely related to Croatian, however, Serbian characters are written with single symbols Џ, Љ, and Њ. (Dž, Lj, Nj ). Like words in any Slavic language Serbian words can have many prefixes to be hyphenated.
visit download page

Galician (Upgrade July 2024)
The Galician language is now spoken in Spanish Galicia, situated north of Portugal. It is a Romance language related to Portuguese.  The orthography differs slightly from Spanish.
visit download page

Rhaeto-Romance (Februari 2002)
is the collective for three Romance dialects spoken in the northeastern Italy and southeastern Switzerland.
visit download page

Greenlandic (April 2002)
is an Eskimo language spoken in Greenland. Greenlandic is written in the Latin alphabet. Words can be very long and one word can be a complete sentence.
visit download page

Ukrainian (Upgrade July 2007)
is the national language of Ukraine. It is  spoken by a population of 35 million people. Ukrainian has many Polish loan words,
but the influences of Russian can be found in the east of Ukraine too.
visit download page

Romanian (Upgrade June 2009)
is the national language of Romania. It is a Romance language written in the latin script. One third of all Romanian words are of French origin. S|s-comma below (ș) and T|t-comma below (ț) are supported
visit download page

Croatian (Upgrade July 2014)
or hrvatski jezik is written in the Latin alphabet. Croatian is closely related to Serbian. Croatian includes a few digraphs which sound like a single consonant (Dž, Lj, Nj ).
Like words in any Slavic language Croatian words can have many prefixes to be hyphenated.
visit download page

Bosnian (Upgrade July 2014)
or Bosanski Jezik exists since Bosnia & Herzegovina became independent. Bosnian has developed its own identity, written in Latin and closely related to Croatian.
visit download page

Frisian (Frysk) (Upgrade March 2024)
or Frysk is spoken in Friesland the northernmost province of the Netherlands.  Frisian is closer related to English than Dutch. Hyphenation is in agreement with the "offisjele stavering fan de Fryske taal 2014" published by the of the Fryske Akademy, however, some cases have to be corrected: not jaz.zfilm but jazz~film, not ka.tûl.tsje but kat~ûl.tsje (as kat.ûle), not be.tsjin.ning.sor.gaan but be.tsjin.nings~or.gaan.
visit download page

Tagalog/Pilipino (Upgrade March 2016)
is the national language of the Philippines. Three centuries of Spanish rule left a strong imprint on the vocabulary. The pre-, in- and suffixes to modify word meaning make hyphenation irregular.
visit download page

Slovene (Upgrade May 2015)
or Slovenski jezik is written in the Latin alphabet. Slovene includes a few digraphs (Dž, Lj, Nj). Slovene has many prefixes and inflections. Some syllables divide consonants only: hm-kniti, kr-tina, tr-den.
visit download page | view a Slovene example (PDF)

Thai (Upgrade October 2019)
The Thai people build sentences in a different way. Therefore, the Thai module is not a hyphenator in the traditional sense, but it is a word segmentation tool, that takes context into consideration.
visit download page | more on Thai | view a Thai example (PDF)

Tamil (தமிழ) (March 2021)
The Tamil language is spoken in the southern part of India and in parts of Sri Lanka. As the other languages of India, vowels are placed above consonants, except for initial vowels. The script lets characters fuse together.
visit download page

Malayalam (മലയാളം) (April 2021)
The Malayalam language is spoken in the southern part of India. As the other languages of India, vowels are placed above consonants, except for initial vowels. The script lets characters fuse together.
visit download page

Hindi (हिन्दी), (Update May 2021)
The Hindi language is spoken as the most important language of India. As the other languages of India, vowels are placed above consonants, except for initial vowels. The script lets characters fuse together. Of course such a combination will not be broken.
visit download page

Bengali (বাংলা) (May 2021)
is an Indo-Aryan language of the Bengali-Assamese group of languages. Bengali is spoken in the Bengal regio of India. and is written in the Bengali script (Unicode 0x0980 - 0x09FF). Dependent vowel are placed above, below, before, or after a consonant, but in written (script) always AFTER the consonant. It is easy to write them in reverse order, a serious error. It is forbidden to hyphenate before an dependent vowel.
visit download page

Marathi (मराठी) (May 2021)
The Marathi language is spoken in the Mahatashtra state of India. It is an Indo-Aryan language written in the Devanagari script, but grammar and word forms differ from Hindi.
visit download page

Macedonian (Upgrade July 2007)
is the principal language of the new nation of Macedonia, it is closely related to Bulgarian and written in the Cyrillic alphabet.
visit download page

Maltese (Upgrade February 2016)
is one of the official languages of the islands of Malta, it is a Semitic language written in the Latin alphabet, including <ċ> <ħ> <ġ> and <ż>, the variety of root words has a great impact on hyphenation.
visit download page | view a Maltese example (PDF)

Sámi (Upgrade July 2014)
The hyphenation agrees with the Nord Sámi language as spoken in Finnmark county in the north of Norway.
visit download page | view a North Saami example (PDF)

Hebrew (Upgrade February 2016)
is written in Hebrew consonants only and therefore hyphenation is partially uncertain. Within this uncertainty the hyphenator accepts graphical hyphenations.
visit download page

Irish/Gaelic (Upgrade February 2016)
is a Celtic language mainly spoken in Ireland.
visit download page

Zulu (Upgrade February 2016)
is a Bantu language mainly spoken in the Republic of South Africa.
Zulu is one of the 11 South African languages and is very different from Afrikaans and the other Indo-European language and so is hyphenation: be~nga~ka~la~li, ma~fu~ngwa~se.
visit download page

Xhosa (Upgrade February 2016)
is a Bantu language mainly spoken in the Transkei, Ciskei and Eastern Cape regions of the Republic of South Africa.
Xhosa is one of the 11 official South African languages. It is very different from Afrikaans and the other Indo-European language, and so is hyphenation: i~si~Mpo~ndo, ye~Bha~nki.
visit download page

Swahili (Upgrade February 2016)
is a Bantu language mainly spoken in East Africa.
Swahili is principal language of Tanzania, Zanzibar, Uganga and many neighbouring countries. Hyphenation examples are: ne~nda, ni~ra~ku~pi~ga, ni~li~m~pi~ga, u~nga~ma.
visit download page

Kurdish (Northern) (July 2009)
belongs to the Iranian group of languages. Kurdish is spoken in Turkey, Iraq, Iran, Armenia, Georgia and Azerbaijan. The latin script is used for the Northern variety of Kurdish.
visit download page

Khmer (Cambodia) (February 2016)
belongs to the Austroasiatic languages. Khmer has its own script known as Aksar Khmer. In Khmer no spaces are inserted between words. Yet words have to be segmented, even unknown words.
visit download page

Kazakh (Latin/Cyrillic) (Upgrade February 2016)
belongs to the Turkic family languages. Kazakh is written in the Cyrillic, Arabic or Latin script. An official transition to the Latin script could happen in a 10 to 12 year period. The Kazakh hyphenator (Unicode) supports both the Latin and Cyrillic script.
visit download page

Welsh (Cymraeg) (August 2015)
is a Celtic language which is hyphenated according to morphological principles. In addition a lot of sound changes (mutations) replicates these principles. The Welsh hyphenator improves the density of hyphenation at the end of the line.
visit download page

Under development
Faroese, and Esperanto.