| Languages and sizes of dictionaries: | go to language |
New languages: Kazakh (Cyrillic/Latin), Khmer (Cambodia), Telugu (India), Punjabi (India), Sinhala (Sri Lanka), Tamil (India), Gujarati (India), Bengali (Bangladesh), Malayalam (India), Kurdish (Northern), Nepalese, Marathi (India), Hindi (India), Arabic, Azerbaijanian
Recent updated spell checker languages: Basque, Dutch, Italian, German, Austrian German, Swiss German, Estonian, Portuguese (acordo ortográfico), French/Canadian French, Finnish, Norwegian, Danish, Icelandic, English, Bahasa Indonesia, Bahasa Melayu, Spanish, Frisian/Frysk, Afrikaans, Romanian, Slovak.
The Arab and Hungarian lexicons have become the largest ever built without any artificial trick, both 5 million words.
94 languages (varieties)
English
(lexicon size between 315,000 and 325,000, selection
April 2010)
The American English (1), British English (2), Canadian English (3), South-African English (4),
Australian/New-Zealand English (5) versions include a set of collocations and
automatic respelling functions between American English,
Canadian English, and British English orthographical varieties, e.g.,
(to UK) counseling -> counselling or
(to US) counselling -> counseling;
(UK & US) Mao Tse-tung -> Mao Zedong
(see the Style Guides of the New York Times and the Economist).
Be careful with expressions as Thanks God its Friday!. Without an apostrophe it looks a bit strange.
Lexicons agree with the leading unabridged dictionaries.
The supplied idiom includes an extensive medical, chemical, social and geographical lexicon.
Finally the idiom includes an extensive orthographical variety of building compounds.
visit download page |
Continue ...
French/Canadian French
(lexicon size over 570,000, selection February
2010)
Includes the most extensive geographical lexicon. Two lexicons are
available, one according to the spelling of Le Larousse (2008) & Le Nouveau Petit Robert
(2003) and one according to the most recent Rectifications de
l’orthographe of the Conseil supérieur de la langue française first
published 6 December 1990 (see also http://www.orthographe-recommandee.info) and has become more and more accepted
at present time.
La nouvelle orthographe du français n'est pas imposée, mais elle est officiellement recommandée.
Les modifications, modérées, touchent environ deux-mille mots. Exemples :
Continue ...
German
(lexicon size 1,154,000, selection February 2010)
The German spelling has again been reformed in 2006.
Previous versions are kept available for a while,
but the regular German spelling is distributed in three versions, “alt, neu, dpa (2007)”,
including automatic respelling from old to new spelling forms (e.g.,
Prozeß → Prozess) and spelling of “feste grammatische und lexikale
Wendungen”. If you prefer “die alte Rechtschreibung” and wish to purify
your texts, a full re-spelling system from new to old will surprise you
(e.g., Prozess → Prozeß). A version for the
Nachrichtenagenturen (dpa) as proposed by the German-speaking news
agencies is also
available.
(http://www.die-nachrichtenagenturen.de),
Spelling neue Rechtschreibung in agreement with the Duden 24
2006, und IDS Sprach Report, July 2006. The German lexicon is based on over 260.000 expanded catchwords (konjugierte Stichwörter), and
includes all German toponyms (Ortsnamen),
over 13,000 autocorrections (Umschreibungen) and an extensive medical lexicon.
Moreover, spell checking is strict, we don't approve errors like: Oberklasse-Wagen, Oberstufe-Schüler, Klasse-Bücher.
It has to be: Oberklassenwagen, Oberstufenschüler, Klassenbücher.
visit download page |
Continue ...
Swiss German
(lexicon size 1,176,000, Swiss additions to German)
The Swiss German lexicon includes all Swiss toponyms (Schweizerische Ortsnamen).
There are three versions “alt, neu, dpa/SDA (2007/8)” see German.
visit download page |
Continue ...
Austrian German
(lexicon size 1,168,000, Austrian additions to German)
The Austrian lexicon includes all Austrian toponyms (Österreichische Ortsnamen).
There are three versions “alt, neu, dpa (2007)” see German.
visit download page |
Continue ...
Spanish
(lexicon size 891,000, selection August 2009)
The spelling is according to “Gran Diccionario de la lengua española”
and “Diccionario Real Academia Española”, 2001.
Includes respelling of a set of common errors, e.g. Adam y Eva → Adán y Eva, Edinburgo → Edimburgo.
visit download page |
Continue ...
Italian
(lexicon size 945,000, selection April 2010)
The spelling is according Lo Zingarelli 2006.
Includes pronomial forms, and an extensive geographical lexicon (comuni e luoghi italiani).
visit download page
Swedish
(lexicon size over 1,9 million words, selection February 2010)
Includes geographical and proper names
orthography according to Svenska Akademiens ordlista över svenska
språket.
visit download page |
Continue ...
Portuguese
(lexicon size 1,5 million words, selection February 2010)
Iberian and Brazilian Portuguese are very different in terms of use of verb tenses and idiom.
Often Brazilian Portuguese is unacceptable for Iberian Portuguese publications, and
the reverse is a source of misunderstanding too.
Independently of orthography dictionaries need to be different.
Therefore Iberian and Brazilian versions according to the previous and acordo ortográfico
, have been compiled.
These versions include respelling either between Iberian Portuguese and Brazilian Portuguese or
between the previous and acordo ortográfico.
O presidente de Portugal, Aníbal Cavaco Silva, promulgou o acordo ortográfico da língua
portuguesa, ratificado no Parlamento do país em maio, informaram hoje à Agência Efe
fontes da Presidência. ....,
O Novo Acordo Ortográfico da Língua Portuguesa está em vigor no Brasil desde o último dia 1º (2009).
Examples: equipolente versus eqüipolente or boleia versus boléia or ação versus acção.
visit download page
Dutch
(Nederlands, lexicon size 700,000,
selection April 2010)
The spelling according to the governmental rules (Groene Boekje, Workgroup Spelling, 2005, Taalunie, update Taalunie errata 27-05-2008) and in
agreement with
Van Dale Groot Woordenboek van de Nederlandse Taal (XIV ed.).
The lexicon's idiom covers national and mondial geographic information, medical, administrative,
social and many other special terms.
A set of collocations and respelling from old to
new orthography is included.
visit download page |
Continue ... |
hall of shame ...
Flemish
(Vlaams, lexicon size 705,000,
selection April 2010)
The spelling according to the governmental rules (Groene Boekje, Workgroup Spelling, 2005, Taalunie, update Taalunie errata 27-05-2008) and
agrees with
Van Dale Groot Woordenboek van de Nederlandse Taal (XIV ed.)
The lexicon's idiom covers national and mondial geographic information, medical, administrative,
social and many other special terms.
A set of collocations and respelling from old to
new orthography is included.
visit download page |
Continue ...
Surinam Dutch
(Surinaams-Nederlands, lexicon size 700,000,
selection April 2010)
The Republic of Surinam has entered the Dutch Taalunie (January 2005)
to unify their language
with the Dutch language. The peculiarities of Surinam
Dutch call for a separate lexicon.
The spelling agrees with the governmental rules (Groene Boekje, Workgroup Spelling, 2005, Taalunie).
The lexicon's idiom covers national and mondial geographic information, medical, administrative,
social and many other special terms.
A set of collocations and respelling from old to
new orthography is included.
visit download page |
Continue ...
Catalan
(lexicon size 700,000, selection August 2009)
The spelling agrees with Diccionari ortogràfic i de pronúncia,
Enciclopèdia Catalana.
visit download page
Danish
(lexicon size 810,000, selection January 2010)
The spelling agrees with the Contemporary Danish spelling according to
Dansk Retskrivingsordbogen, 1996.
visit download page
Norwegian, Nynorsk
(lexicon size Bokmål 1,015,000 Nynorsk 480,000,
selection February 2010)
The spelling agrees with the Contemporary Norwegian spelling
according to Tanums Store Rettskrivningsordbok.
visit download page
Sámi
(lexicon size 1,6 million, selection February 2008)
The spelling agrees with the Nord Sámi language as spoken in Finnmark
county in the north of Norway. Sámi is a highly inflected language and words can have numberous word forms.
This feature makes the North Sámi lexicon very lengthy.
visit download page
Finnish
(lexicon size over 4,2 million words,
selection February 2010)
The spelling agrees with the Contemporary Finnish, spelling according
Uusi
Suomi-Englanti Suur-Sanakirja, 1984.
visit download page
Afrikaans
(lexicon size 290,000, selection November 2009)
The lexicon agrees with the spelling rules of the Suid-Afrikaanse
Taalkommissie, 2002.
visit download page
Latin
(lexicon size 450,000,
selection August, 2007)
The Latin lexicon has been compiled from
classical, medieval, clerical, vulgate, and scientific texts. Names
from the classical period and from the clerical (and Biblical) world
have been included in the lexicon. Like dictionary publishers we do not use ligaturs:
oeconornicae, Aegiptum, etc.
visit download page
Basque
(lexicon size 3,05 million
selection July 2010)
The Basque language is highly inflected, and so is the
Basque lexicon. Financial, Scientific, Geographical and proper names are included in the
lexicon:
Euskadi, Euskadik, Euskadiko, Euskadikoa, Euskadin, Euskadira,
Euskadiren, Euskadirentzat, Euskaditik, Euskadiz, amortizazio-prezio..., banku-txartel..., efektu-biomarkatzaile..., epitelio-zelula..., etc.
visit download page
Russian (Россия)
(lexicon size 1,000,000,
selection January 2008)
The Russian language goes back to Old Church Slavic,
but a literacy tradition less tied to the church and Old Church Slavic
exists too.
The last extensive spelling reform occurred in 1917.
visit download page
Estonian
(lexicon size 1,500,000, selection February 2010)
The Estonian language belongs to the Finno-Ugric family of
languages. It is closely related to Finnish, and similar to Finnish
prepositions are attached to the end of the word.
visit download page
Icelandic
(lexicon size 747,000,
selection January 2010)
The Icelandic language is a North Germanic (Scandinavian) language,
since 1935 the official language of Iceland.
Icelandic is characterized by extensive vowel gradations, for masculine, feminine and neuter.
The historical morphological characteristics
have been preserved.
visit download page
Lithuanian
(lexicon size 862,000, selection June 2009)
The Lithuanian language, like Latvian, belongs to the Baltic
family of languages. Lithuanian uses the Latin alphabet with
diacritics, including as <ė>, <į>, <ų>. Lithuanian is
highly inflected.
visit download page
Latvian
(lexicon size 700,000,
selection June 2009)
The Latvian language is one of the Baltic languages (see Lithuanian).
The orthography is based on the Latin alphabet with diacritic marks,
including <ņ>, <ķ>, <ģ>, <ļ>.
visit download page
Polish
(lexicon size 1.9 million,
selection February 2008)
The Polish language is a West Slavic language spoken by approximately
42 million speakers.
It is written in the Latin alphabet with diacritic marks and special
characters: ł, Ł, ż, Ż.
visit download page
Frisian
(lexicon size 290,000,
selection July 2009)
The Frisian language is spoken by approximately 300,000 speakers in the
Dutch province of Friesland.
It has been standardized thanks to the efforts of the Fryske Akademy.
It is distinct from East and North Frisian dialects in Northern
Germany.
visit download page
Galician
(lexicon size 245,000,
selection August 2007)
The Galician language is now spoken in Spanish Galicia, situated north
of Portugal. It is a Romance language related to Portuguese. Spelling
according “Dicionário da língua galega, Sotelo Blanco”.
visit download page
Hungarian
(lexicon size over 5 million words,
selection December 2009)
The Hungarian language belongs to the Uralic family of languages. It is the official
language of Hungary. There is a weak relation to the Finno-Ugric
languages.
The orthography includes characters with the Hungarumlaut: <ő>,
<ű>.
visit download page
Czech
(lexicon size 1,690,000,
selection December 2009)
The Czech language is a West Slavic language. The orthography is based
on the
Latin alphabet, including diacritics: <č>, <ď>, <ě>,
<ů>, <ž>.
visit download page
Upper Sorbian
(lexicon size 770,000,
selection January 2009)
The Upper-Sorbian language is a West Slavic language. The orthography is based
on the Latin alphabet.
Upper and Lower Sorbian is spoken in the South Eastern section of the former German
Democratic Republic.
Spelling agrees with Hornjoserbskeje rěčneje komisje hač do junija 2005.
visit download page
Maltese
(lexicon size 845,000,
selection January 2006)
The Maltese language is a Semitic language written in the Latin
alphabet,
including <ċ> <ħ> <ġ> and <ż>,
orthography according to Joseph Aquilina (1987/1990).
The speller includes checks for proper use of assimilations of the
article.
visit download page
New Greek (Ελληνικά)
(lexicon size 785,000, selection September 2009)
The Greek characters α, β, γ, .... to ω have been used for millenniums.
We do not know how Ancient Greek was pronounced, but modern Greek
certainly is different.
It now uses only a limited number of accents and diaereses.
visit download page
Occitan
(lexicon size 250,000,
Selection June 2007)
Also known as Languedoc, is the original language spoken by the
troubadours and Cathars in the South of France. The reconstruction of
the language is based on the work of Loís Alibèrt (2000).
visit download page
Esperanto
(lexicon size 300,000,
selection August 2003)
Esperanto is an artificial language, introduced by Dr. Lazaro Ludoviko
Zamenhof. The language is based on several Indo-European languages.
Typical for Esperanto are the characters <ĉ>, < ĝ>,
<ĥ>, <ĵ>, <ŝ> and <ŭ>.
visit download page
Turkish
(lexicon size 1,680,000,
selection November 2008)
The Turkish language is written in the Latin alphabet, but a few
characters were added, such as the dotless-i which is very different
from the dotted-i. Therefore the letter i is not a lower case
of the majuscule letter I, a major problem to many systems.
A geographical and medical lexicon is included.
visit download page
Romanian
(lexicon size 1,000,000,
selection June 2009)
The Romanian language belongs to the Roman languages. It includes a few
additional characters such as the a-breve <ă>, i-circumflex
<î>, the s-cedille <ş>, the t-sedille <ţ>,
the s-comma below <ș>, the t-comma below <ț>.
visit download page
Bulgarian
(lexicon size 840,000,
selection February 2008)
The Bulgarian language is written in the Cyrillic alphabet.
visit download page
Faroese
(lexicon size 175,000,
selection February 2005)
The Faroese language is spoken by 50,000 inhabitants of the Faroer
Islands.
It is based on the old Norse as is the Islandic language.
visit download page
Bahasa Indonesia
(lexicon size 76,000,
selection May 2010)
The Bahasa Indonesian language is the standard language written and
spoken in the Republic of Indonesia. Many Austronesian languages are
spoken in the Indonesian Archipelago, but
Bahasa Indonesia is the lingua franca.
visit download page
Slovene
(lexicon size 425,000,
selection October 2007)
The Slovene language is spoken in the Republic of Slovenia, situated
between Austria, Hungary,
Croatia, and Italy. It is a south slavic language written in the Latin
alphabet, including a
few Slavic characters such as <č>, <š>, <ž> and the
diagraphs Lj and Nj.
Slovene is highly inflected and nearly every noun has an adjective form
too.
visit download page
Croatian
(lexicon size 547,000,
selection October 2009)
The Croatian language, formerly named Serbo-Croatic, is closely related
to Serbian.
The Croatian language is written in the Latin alphabet, including a few
typical Slavic
characters such as <č>, <ć>, <š>, <ž>, and
digraphs Lj and Nj.
visit download page
Bosnian
(lexicon size 565,000,
selection August 2009)
The Bosnian language, formerly named Serbo-Croatic, is closely related
to Serbian and Croatian.
visit download page
Serbian Cyrillic
(lexicon size 570,000,
selection August 2009)
The Serbian language is written in the Cyrillic alphabet, including
typical Serbian characters Dž, Lj, Nj (Џ, Љ, Њ).
visit download page
Byelorussian
(lexicon size 1,6 million,
selection January 2008)
The Byelorussian language is written in the Cyrillic alphabet, like the
Russian language,
but the language was heavily influenced by Polish for centuries.
Today, in the Byelorussian Republic,
Byelorussian plays a lesser role
compared to the Russian language.
visit download page
Slovak
(lexicon size 1 million words,
selection August 2009)
The Slovak language is closely related to Czech, but a few characters
differ.
visit
download page
Ukrainian
(lexicon size 1,15 million words,
selection November 2008)
The Ukrainian language is written in the Cyrillic alphabet,
but for centuries the language was heavily influenced by Polish.
visit download page
Swahili
(lexicon size 75,000,
selection February 2005)
The Swahili language is spoken along the East Coast of Africa. It is
the lingua franca of many coastal nations. The standardized language is
called Kiswahili Sanifu.
It shares the word kamusi (dictionary) with the Melayu word kamus.
Swahili is written in the Latin alphabet.
visit download page
Bahasa Melayu
(lexicon size 62,000,
selection September 2009)
Bahasa Melayu is the standard language of the Republic of Malaysia.
It has a common root with Bahasa Indonesia. However, Bahasa Melayu was
heavily influenced by the English language while
Bahasa Indonesia was influenced by Dutch during the colonial age.
visit download page
Irish (Gaelic)
(lexicon size 325,000,
selection August 2007)
The Gaelic language is a Celtic language spoken in Western Ireland.
A class of words is lenited, pronounced with palatalization.
A slightly different variety is spoken in the Highlands of Scotland.
visit download page
Welsh
(lexicon size 365,000,
selection August 2007)
The Welsh language is the Celtic language of Wales, spoken by about
500,000 people (mainly bilingual in English).
visit download page
Greenlandic
(lexicon size 85,000,
selection February 2008)
is an East Inuit language spoken by 50,000 Greenlanders.
The Greenlandic language adds particle to particle to words and
leading to a single word sentence. The Latin alphabet is used whereas
the Canadian Inuit make
use of their own script.
visit download page
Macedonian
(lexicon size 320,000,
selection January 2008)
The Macedonian language is written in the Cyrillic alphabet.
visit download page
Albanian
(lexicon size 310,000,
selection February 2006)
The Albanian language is written in the Latin alphabet. The Albanians
call their language shqip and their country Shqipëria.
Maori
(lexicon selection March 2004)
The Maori language is spoken in New Zealand and is written in the Latin
alphabet. A macron is placed above the vowels to differentiate between
long and short vowels.
visit download page
Xhosa
(lexicon size 165,000, selection September 2005)
The Xhosa language is spoken in the Republic of South Africa and is
written in the Latin
alphabet.
visit download page
Zulu
(lexicon size 330,000, selection September 2008)
The Zulu language is spoken in the Republic of South Africa and is
written in the Latin
alphabet.
visit download page
Arabic (العربية)
(lexicon size ca. 5 million, selection October 2009)
The Arabic languages have its own script and the orthography is mainly based on consonantal roots.
These roots are unfolded to millions of words.
visit download page
Azerbaijanian
(lexicon size 132,000, selection May 2010)
Azerbaijanian is written in the Latin alphabet. It has much in common with Turkish.
visit download page
Hebrew (עִבְרִית)
(lexicon size ca. 5.5 million, selection March 2008)
The Hebrew language is written in Hebrew characters, mainly consonants.
The orthography is based on roots of 3 radicals, which unfold to millions of words.
visit download page
Persian/Farsi (فارسی)
(lexicon size 450,000, selection October 2009)
The Persian language is written in the Arabic script, but being an Indo-European language
vowels are important.
visit download page
Urdu (اردو)
(lexicon size 131,000, selection October 2009)
The Urdu language is closely related to Hindi, but written in the Arabic script. Urdu and Hindi are
Indo-European languages.
visit download page
Breton
(lexicon size 210,000, selection July 2007)
The Breton language is spoken in French Bretagne. It is a Celtic language once related to extincted Cornish in the UK.
visit download page
Thai (ภาษาไทย)
(lexicon size 80,000, selection March 2008)
The Thai language is the official language of Thailand. Thai has its own script,
a syllable script and most vowels are written above the consonants.
Thai is a tone language and the tone marks are always written in top.
The words of a sentence are written without spaces and therefore a sentences has to be segmented (hyphenated) prior to spell checking.
visit download page
Hindi (हिन्दी)
(lexicon size 150,000, selection December 2009)
The Hindi language is spoken in northern and central India. Written Hindi is relatively standardized over the whole Hindi language area. It is an Indo-Aryan language. Althrough related to Urdu, Hindi does not favour the use of Persian and Arabic loanwords. Hindi is written in the Devanagari script, it includes a lot of complex characters, consisting of vowels, consonants, vowel-signs (matras), numerals, and diacritical marks.
visit download page
Marathi (मराठी)
(lexicon size 153,000, selection December 2009)
The Marathi language is spoken in the Mahatashtra state of India. It is an Indo-Aryan language written
in the Devanagari script.
visit download page
Nepalese (नेपाली)
(lexicon size 125,000, selection December 2009)
The Nepalese language (Nepali) is spoken in the Himalayan state of Nepal between India and China. Nepalese
is written in the Devanagari script.
visit download page
Kurdish (Northern)
(lexicon size 90,000, selection July 2009)
belongs to the Iranian group of languages. Kurdish is spoken in Turkey, Iraq, Iran, Armenia, Georgia and Azerbaijan.
The latin script is used for the Northern variety of Kurdish.
visit
download page
Malayalam (മലയാളം)
(lexicon size 410,000, selection December 2009)
The Malayalam language is spoken in Kerala, a state in the south of India. It is a Dravidian language written
in the Malayalam script, a descendant of the Brahmi script.
visit download page
Bengali (বাংলা)
(lexicon size 126,000, selection November 2009)
The Bengali language is spoken in Bangladesh. It is a Indo-Aryan language written
in the Bengali script, a descendant of the Brahmi script.
visit download page
Gujarati (ગુજરાતી)
(lexicon size 185,000, selection October 2009)
The Gujarati language is spoken in the Indian state of Gujarat. It is a Indo-Aryan language written
in the Gujarati script, a descendant of the Brahmi script.
visit download page
Tamil (தமிழ)
(lexicon size 105,000, selection December 2009)
The Tamil language is spoken in southern India (Tamil Nadu) and Sri Lanka. It is a Dravidian language written
in the Tamil script, a descendant of the Brahmi script.
Tamil has many Indo-Aryan loanwords. Tamil in Sri Lanka incorporates loadwords from the
Dutch, Portuguese, and English language.
visit download page
Sinhala (සිංහල)
(lexicon size 208,000, selection November 2009)
The Sinhala language is spoken in Sri Lanka India. It is an Indo-Aryan branch of the Indo-European languages written
in the Sinhala script, a descendant of the Indian Brahmi script.
There is some affinity to neighbouring languages. Sinhala has features that may be traced to Dravidian
influences.
visit download page
Punjabi (ਪੰਜਾਬੀ)
(lexicon size 37,500, selection October 2009)
The Punjabi language is spoken in Punjab state of India. It is an Indo-Aryan branch of the Indo-European languages written
in the Gurmukhi script, a descendant of the Indian Brahmi script.
visit download page
Telugu (తెలుగు)
(lexicon size 115,000, selection December 2009)
The Telugu language is spoken in Andhra Pradesh, one of the largest states of India. It is a Dravidian of the Indo-European languages written
in the Telugu script, a descendant of the Indian Brahmi script.
visit download page
Khmer (ភាសាខ្មែរ)
(lexicon size 30,000, selection November 2009)
The Khmer language is spoken in Cambodia.
It is the second most widely spoken Austroasiatic language. As in Thai Khmer sentences are written
without spaces. Therefore spell checking strongly depends on segmentation (see Hyphenator languages).
visit download page
Kazakh (Cyrillic/Latin)
(lexicon size 900,000, selection May 2010)
The Kazakh language is spoken east of the Caspian Sea. It is a Turkic language
related to Azerbaijan and Turkish.
Kazakh is mainly written in the Cyrillic alphabet in Kazakhstan but a transition to the
Latin script has already been brought up by the President of Kazakhstan in 2006.
For this reason both Cyrillic and Latin lexicons have been compiled.
visit download page