Conlang/Intermediate/Writing

← Intermediate/History Etymology	Intermediate Writing	Intermediate Irregularities →
	Conlang

(All phonemic and phonetic transcriptions mentioned here are in X-SAMPA)

Scripts, or writing systems, are one of the first things many people notice when they come in contact with another language. Many people, at some point in their life, have tried making some sort of "code", with a one-to-one correspondence between their symbols and their native language's letters. Read on to learn how to make a realistic script for a language.

Types of Script

There are five main types of scripts. This page lists them in order from most to least evolved. All these systems have different advantages and disadvantages, for example, the logographic writing system would be useful if a language had many different dialects, but an alphabet would be useful if you want to have a higher literacy rate among the speakers in your conlang.

Alphabets are the type you may be more familiar with. They consist of a series of symbols, usually between 10 and 45, where each symbol represents a consonant or vowel in the language. Some may have some duplicate letters (e.g., English C, K, and Q), but these usually have some historical meaning, discussed later on. Other will have symbols for consonant clusters, such as Greek "Ξ" /ks/, Cyrillic "Ц" /ts/, or even English "X" /ks, gz/.
Abjads, also known as Consonantal Alphabets, mark consonants but not vowels. Arabic and Hebrew are examples. While these do have a means of showing vowels, via diacritics, they usually are not shown, except in children's books, books for foreigners, or religious texts. Some, such as Phoenician, had no system for marking vowels at all! They usually contain between 15 and 35 symbols.
Abugidas contain consonants paired with vowels that can be changed with diacritics. Hindi and many other South Asian languages are examples. Each symbol has an "inherent vowel", commonly /a/, so you have each individual symbol meaning /pa/, /ba/, /ta/, /da/, etc. The vowels can be changed or muted with diacritics. They usually contain between 30 and 60 symbols, not counting diacritics.
Syllabaries contain different symbols for every syllable in the language, such as Amharic, Yi, or Cherokee. In other words, there is a separate syllable for /ka/, /ke/, /ki/, /ko/, etc., for each vowel and consonant. The symbols standing for the same consonant may be related (as in Ojibwe, where symbols are rotated), or may bear no resemblance to other symbols, such as in Yi or Cherokee. Syllabaries generally range in size between about 40 symbols and up to 400, with Yi being the largest standardized one, with over 800.
Logographic systems contain symbols that represent single morphemes, or the smallest meaningful unit in language. (e.g. "hand", "-er", "un-" in English.) Many languages contain some logographic symbols, such as "1", "@", and "&", but there are some languages, such as Chinese, Japanese, ancient Egyptian, and Mayan, that use(d) logographs to represent much or all of their entire language.

Ideographic systems use one symbol for one thought or idea. Since languages aren't made up directly of thoughts and ideas, and writing systems represent language, true ideographic systems do not exist. They will be covered for the purposes of "alien" languages, though. Chinese is often described as "ideographic" — this is either just a loose use of the term to mean "logographic", or it is a misunderstanding — Chinese characters do not represent ideas directly. Japanese kanji come closer to the "ideographic" ideal, with one symbol representing any of several synonyms in context.

Alphabets

Alphabets are the dominant writing system in use in Europe and areas formerly colonized by Europe. Basically, they represent one symbol per sound, including vowel sounds.

All too often, however, alphabets, especially the Latin alphabet, are more complex. Each symbol often represents more than one sound, sometimes distinguished in the language, sometimes not. For example, English <c> represents /k/ and /s/, <x> /ks/ and /gz/ (as in exist), or <a> /a/, /A/, /{/, /O/, or /@/. The main reason for this is that the Latin alphabet was made for Latin and simply adapted for later languages. English itself had originally used a runic script, as did the Scandinavian languages. Latin written in the Latin script is entirely phonetic.

Many alphabets also have combinations of letters representing a completely different sound. These are called digraphs. In English, the main consonant digraphs are <sh> /S/, <ch> /tS/, and <ng> /N/, plus many others. Diphthongs, or vowel-combinations, also occur often. English <i> represents /aj/ as well as /i/ or /I/. Some languages make use of digraphs for diphthongs, such as <ei> for /ej/.

Note that situations like the above happen very often. As languages change, their orthography (writing system) may adapt, but not always grow to match it. Almost never does a language have letters that represent only one sound but no others. A realistic con-script should not have a direct correspondence between English and itself as well.

Diacritics are another method alphabets may use. Creating new symbols can be difficult and hard to remember, so many languages created "diacritics", or small, additional strokes, to letters to mark new sounds. The main diacritics used in Latin script languages are the acute: ' (á,é, etc., and usually mark stress), the grave (à, è, etc.), the circumflex (â, ê), the diaeresis (ä, ë), the tilde (ã, ñ, usually marking nasalization on vowels), the cedilla (ç), and many others. "Ligatures", or combinations of letters, may also be brought in, such as "æ". The symbol "&" is actually a heavily-modified ligature of <E> and <T>, making "et", the Latin word for "and".

Some languages, though, do make new letters, which are often based on previous systems. Examples include the "þ" (thorn, /T/), which was derived from a rune (ᚦ) and "ð" (eth, /D/), developed in the Middle Ages, which are used in Icelandic, Old English, and several related languages.

Some letters may not make a sound themselves, but do affect other letters. For example, the Cyrillic <Ь> (soft sign) palatalizes the consonant before it. Some letters can perform both a function and a sound, such as the Russian <Я> ("ya"), which both palatalizes the previous consonant and is pronounced /a/.

There are a few scripts termed featural which tend to be similar to alphabets but show phonemic features rather than phonemes themselves. The best example is undoubtedly Korean hangul, though featural elements also occur in a few other scripts, such as Japanese hiragana and katakana, which are mostly syllabaries but take diacritics to indicate voicing.

Examples of Natlang Alphabets

Latin (includes English, German, French, Spanish, Romanian, Polish, Czech, Swahili, Finnish, Vietnamese, Nahuatl and many others)
Cyrillic (includes Russian, Ukrainian, Bulgarian, Macedonian, Kazakh, and several others)
Greek
Georgian
Armenian
Mongolian

Abjads

Abjads are the main writing system of the Middle East. Basically, they mark consonants in an alphabetic fashion, but vowels are not typically marked.

The Arabic and Hebrew abjads are entirely composed of consonants. Vowels can be marked, but generally aren't. Arabic has six vowel marks. Three of them, the 'short vowels', are only marked with diacritics. The other three, the 'long vowels', share symbols with some of the consonants. /a/ shares 'alif /?/, /i/ shares yaa /j/, and /u/ shares waaw /w/. When they are treated as consonants or as vowels can be more clearly seen when vowels are shown.

Arabic has an additional diacritic called the "sukuun", meaning the letter it is placed over does not have a vowel over it. If the sukuun is placed over 'alif, yaa, or waaw, they function as consonants. If there is no sukuun, but instead a vowel marker, they become long vowels. They can also become diphthongs if they have a sukuun, but the previous consonant has a vowel. So, the following form is pronounced /kaj/:

Ø a

Y K

(where Ø is the sukuun, and remember that Arabic is read from right to left).

Since vowels aren't usually marked, pronunciation has to be inferred by context. The sequence "KY" (using English order) can be pronounced:

/ kaja kajI kajV kIja kIjI kIjV kVja kVjI kVjV kaj /

Examples of Natlang Abjads

Arabic (including Arabic, Iranian, and Pashto)
Urdu
Hebrew (including Hebrew and Yiddish)
Syriac (used by modern Syriac/Aramaic)
Phoenician and Demotic, all extinct.

Abugidas

Abugidas are the main writing system of India, other South Asian countries, and South-East Asia. They are basically a group of syllables with a common inherent vowel, which has to be changed with diacritics.

All syllables start out with an inherent vowel, usually /a/, and a different consonant: ka, ca, ta, ta, pa, etc. These each have their own symbol. So, say you want to make the syllable <ki>. The <i> diacritic is then added to the <ka> symbol, and negates the /a/. Note that in the Indian and South-East Asian languages diacritics can come before and after a letter instead of only above or below.

Hindi has diacritics for all of its vowels and diphthongs — a, e, i, o, u, r, ai, and au. But what if a word or syllable begins with a vowel? There are two ways you could potentially handle this. In most of the abugidas of Asia, there is a set of "independent vowels", which are full letters just like any of the other consonants, and their purpose is only for syllable-initial vowels.

The other way is not done in any of the South Asian abugidas, but still doable. Just make a "vowel carrier", a letter that makes no sound, but can carry vowel diacritics, like the Hebrew aleph.

Many of these abugidas also contain "Conjunct Consonants", which you don't have to do too. These are basically ligatures, where the form of two letters are combined into one "space", and they together can take a vowel. For example, when "ta" and "ka" are combined, they become "tka", which can take diacritics like any other consonant.

Examples of Natlang Abugidas

Devanagari, used by Hindi, Marathi, Nepali, and many other languages
Malayalam, used by Malayalam, Konkani, and sometimes Kodagu.
Bengali, used by Bengali and Assamese.
Gurmukhi, used by Punjabi.
Gujarati, used by Gujarati and Kachchi.
Oriya, used by Oriya and several other languages.
Tamil, used by the Tamil language, and sometimes for liturgical Sanskrit.
Sinhala, used by Sinhalese.
Kannada, used by Kannada and Tulu.
Telugu, used by Telugu.
Tibetan, used by Tibetan and Dzongkha
Thai, used by Thai, Southern Thai, and several other languages.
Khmer, used by Khmer and sometimes Pali.
Lao, used by Lao and some other languages in Laos.
Burmese, used by Burmese, Mon, and the Karen languages.

Syllabaries

Syllabaries are spread around the world in use. They contain separate symbols for every CV (consonant + vowel) combination in the language, although final consonants have many ways of being done.

Here is an example of a typical syllabary, romanized. This is the Cree system:

ê pê tê kê cê mê nê sê šê yê wê rê lê
i pi ti ki ci mi ni si ši yi wi ri li
î pî tî kî cî mî nî sî šî yî wî rî lî
o po to ko co mo no so šo yo wo ro lo
ô pô tô kô cô mô nô sô šô yô wô rô lô
a pa ta ka ca ma na sa ša ya wa ra la
â pâ tâ kâ câ mâ nâ sâ šâ yâ wâ râ lâ

-h -p -t -k -c -m -n -s -š -y -w -r -l

Note that there are also symbols for the vowels alone. To make final consonants (of the CVC form), the last row of symbols, called the "Finals", are added. So, "tak" would be written with two symbols: <ta> and <-k>. Many syllabaries use this method, and it is basically limited to languages with a small number of legal final consonants.

Another system for finals, used by the Mayan syllabary system, is the "Principle of Synharmony". This states that a final vowel is marked by reduplicating the previous vowel. For example, a word like "kalo" is done with two symbols, as usual: ka-lo. But, if you want the syllable "kal", the previous vowel is repeated: ka-la. Since the vowel was repeated, it is ignored on the second syllable. This applies for any vowel. But what if you want to make the syllable "kala"? That requires three syllables, adding in a simple vowel: ka-la-a. The second "a" is negated, but returned by the next syllable.

A third system, used by Hittite, comprised of two sets of syllables, one set in CV form, and another set in VC form. This way, when a syllable of the form C₁VC₂ was needed, one would write C₁V-VC₂, for instance, <ka><al> for "kal". Additionally, when the vowel in question needed to be made long, like "kāl", the pattern became C₁V-V-VC₂: <ka><a><al>. For "kala" one would simply write <ka><la> as the /l/ would move onto the next syllable.

Sometimes, though not always, syllables with a relation, such as the same consonant, look similar. For example, the syllables "ka", "ke", "ki", etc., may look similar. In systems such as in Ojibwe, the vowel signs are changed by simply rotating the symbol around.

Examples of Natlang Syllabaries

Cree (rotational), used by Ojibwe, Cree, Inuktitut, and many other Native American languages; collectively may be known as "Aboriginal Syllabics".
Cherokee
Ethiopian, used by Amharic, Ge'ez, Tigrinya, and several other languages
Katakana and Hiragana, used by Japanese
Yi
Cypriot, Linear B, and Celtiberian, all extinct

And many more systems.

Semisyllabaries

There is a set of scripts from pre-Roman Spain that sit in an intermediate category between true alphabets and syllabaries. In those scripts, the plosives combine with the following vowels in a single syllabogram — for instance, there is a single symbol for PA, another for PI, another for PE, etc. — but all other consonants and vowels also have single symbols. In such a system, for example, the phrase "barenarkenti" would be spelled with the nine symbols <ba> <r> <e> <n> <a> <r> <ke> <n> <ti>. In fact, it's likely that such systems evolved out of so-called redundant alphabets, where the stops had separate signs depending on the vowel that followed them, much like the ancient Etruscan alphabet, where the /k/ sound was written <K> before <A>, <C> before <E> or <I>, and <Q> before <U> (the Etruscan language lacked an /o/ sound).

Logographic systems

Logographic systems are/were used to write Chinese, Japanese, ancient Egyptian, Mayan, Sumerian, and many other languages. Ancient civilizations created logography first, and then evolved them into other systems.

In a pure logographic system (if there were such a thing in nature), each symbol represents a morpheme, or the smallest meaningful part of language. For example, "hand", "red", "place", "-er", "-ist", "un-" and "anti-" are all morphemes. In a logographic system, each of these would get its own symbol. As you can probably imagine, there are thousands of morphemes in most languages, and hence thousands of symbols exist in any logographic system.

These symbols are not all random. In general, symbols of words that represent visible ideas or things are often little drawings. (Remember though that writing constantly develops, so after a few thousand years, there wouldn't be too much resemblance left!) For example, words like "sun", "moon", "man", "child", "water", "fire", "left", "right", "dog", "horse", "tree", "flower", etc. are likely to be drawn as pictures.

Not all words can be drawn this way. How are you going to draw "love", or "-er", or "anti-"? Different languages had different approaches. In some cases a word with a similar pronunciation is substituted, for example, writing "ant" to mean "anti-". If you do this often enough, you're on your way to a syllabary, an abugida, or an abjad. In some cases however, another picture is drawn to "explain" the new word: for example, you might draw a spear next to "ant" to mean "anti-" — spear means you're opposed, right? In ancient Egyptian, the spear would be set aside as a separate symbol, while in Chinese the spear would be incorporated into the "ant" symbol to make a new symbol for "anti-". (This is why scripts like Egyptian are sometimes called logophonetic, a name also applied to languages using a mix of logographs and some other system.) The choice here is really up to you.

In fact, it is estimated that 90%+ of Chinese characters are formed using the above method: taking an existing symbol with a similar pronunciation, and incorporating another symbol to explain the meaning. Over time, however, these characters' shapes have fossilized, and a lot of sound and meaning changes occur, so some characters are made up of symbols that really don't represent their meanings or sounds very well any more.

The sheer size of such a system guarantees it won't be pure; bits of it, or whole sections of it, may be sliding toward some other sort of system, and different parts of it may be sliding in different directions. There will be morphemes represented by multiple symbols that don't have independent meanings, and symbols that represent multiple words.

Languages using Logographic systems

Chinese
Japanese Kanji, Korean Hanja, and Vietnamese Chu-Nom, all borrowings of Chinese logographs
Khitan, an extinct Altaic language of ancient northeastern Asia
Jurchen, the precursor to the Manchu language
Tangut, an ancient language of northwestern China
Naxi, a Burmese-Lolo language of Yunnan, China

Languages using Logophonetic systems

Japanese
Korean (mixed Hangul and Hanja form)
Ancient Egyptian
Luwian (in ancient Asia Minor)
Sumerian
Akkadian
Mayan
Zapotec
Mixtec
Aztec

Now, if you have decided what kind of writing system you want, there are three more steps left to take: Writing Direction, Script History, and Mediums of Writing. Then you can get on to the actual script-making.

Writing direction

All scripts have a certain writing direction. Latin writes from left-to-right. Hebrew and Arabic write from right-to-left. Mongolian is top-to-bottom. These are the main types of directions accounted for in real Earth scripts:

Left-to-Right: English, Cyrillic, Greek, and many other systems.
Right-to-Left: Hebrew, Arabic, Urdu, Assyrian, and several other systems.
Top-to-Bottom: Mongolian, and sometimes in Chinese
Boustrophedon: Ancient Greek

"Boustrophedon" is where on one line a text is read left-to-right, on the next right-to-left, on the next left-to-right again, and so forth.

THIS IS AN EXAMPLE
ИI HƧI˩GИƎ FO
BOUSTROPHEDON
.Ǝ˩YTƧ

Note that in Ancient Greek, a Boustrophedon language, sometimes the letters would literally be flipped over when reading in the other direction. So, for example, the Greek letter <F> ('digamma') would be <F> when read left-to-right, but the prongs would face left when reading right-to-left.

If you would like to stick with one of the above methods, go ahead. Most scripts use them. A few more complex systems are described below:

Some scripts can be written in many directions, that is, left-to-right, right-to-left, and top-to-bottom are all acceptable. Chinese and Ancient Egyptian do/did this. In Ancient Egyptian, all of the hieroglyphs with a face would face toward the direction in which to start reading. So, if you were reading right-to-left, all of the hieroglyphs that had a face (or a discernible front side) would 'face' right. In Chinese, it must be determined by context.

Another system is the "Rotational" system, a very rare one. As far as we've heard, only one script ever used it, rongorongo (the Easter Island script). It is similar to boustrophedon, except that the letters also rotate 180 degrees after each line! If you rotate the reading board around after each line, they would look normal. Here is an example of this using English letters:

Some other ideas that, so far as we've heard, have not yet been used:

Spiraling in or out
Writing in clockwise or counterclockwise circles, rotating each character as you write; essentially left to right or right to left but with each sentence on a separate ring outside the last.
Vertexual; each part of the sentence is on its own line, which points towards a vertex that the entire sentence converges at.

Script history

Scripts are not static. They change constantly, just as language does. Compare writing like Ancient Egyptian hieroglyphics and modern Latin script. Believe it or not, modern Latin script ultimately derives from Proto-Sinaitic, a small collection of Egyptian hieroglyphs. These hieroglyphs were later adopted into Phoenician, where they were drastically simplified and each assigned a single sound. Through heavy contact with the Phoenicians, soon the Greeks adopted their alphabet, but changed some of their consonant letters, representing sounds that don't exist in Greek, into the Greek vowel signs. Sometime, though we're not entirely sure why, Greek started being written in boustrophedon instead of right-to-left, and then ultimately stuck with left-to-right. When this switch occurred, all of the letters were 'flipped' over horizontally, giving the modern-day Greek alphabet. Then the Etruscans and Latins in the Italian peninsula adopted the Greek alphabet with minor modifications, the Latin one ultimately becoming the script now used throughout most of Europe. Cyrillic script also came from Greek, after the Russians' conversion to Orthodox Christianity, and after about 1,000 years of change, the modern Cyrillic alphabet came to be.

A Brief History of the Letter "A"
Egyptian hieroglyph ox head	Proto-Semitic ox head	Phoenician aleph	Greek Alpha	Etruscan A	Roman A

In Proto-Sinaitic and Phoenician the letter stood for the glottal stop /?/, but Greek lacked this sound, so it became /a/.

"A" originally started out as 'aleph, the ox. Later the body disappeared, leaving only the head. Then the head was simplified to a simple line, leaving the most distinctive parts (the horns) intact. Eventually the horns pierced the line, and the letter flipped on its side in Phoenician. In Greek, it was flipped again, and after several lines were trimmed down, it became the modern Latin/Greek/Cyrillic "A".

Note the change of complexity. Early writing is almost guaranteed to be some sort of picture-writing like hieroglyphics, where each symbol represents exactly what it looks like. But, as time goes on and writing spreads, for many this just became pointless complexity. The symbols start to simplify and sometimes turn into alphabets. When one literate people start to trade with an illiterate one, their writing system has a very good opportunity to spread. Sometimes one script wins over another. For example, in ancient India, there were two main scripts used: Brahmi and Kharosthi, but Kharosthi fell out of use and now all of the major scripts of India and parts of South-East Asia are descended from it.

Sometimes the early pictographic systems never simplify that much. In the Ancient Chinese writing, the characters did change, but many did not simplify as significantly as the Proto-Sinaitic ones. The modern Chinese logographic system is still very complex. The major reform in 1949 was instituted by the Chinese government.

One thing to note is the diversity of scripts in Europe compared to South-East Asia. Almost all of Europe uses Latin script or Cyrillic script, but in South-East Asia every country has at least one of its own scripts. This was probably because of the Roman Empire. The Romans unified most of Europe and had their script well established before they fell, and their script stayed. Such an empire did not exist in South-East Asia, leading to the vast diversity of scripts.

So, basically, the ancient hieroglyphic systems simplify when they are no longer only for priestly use, and therefore can be abbreviated. For the common person, carefully elaborating each symbol is just a major waste of time. This is still true today, to a lesser extent, but that is where cursive came from. Even the Ancient Egyptian hieroglyphs had heavily-reduced forms, known as "Demotic", and this was used by the common person. You can see demotic, for example, in the middle section of the Rosetta Stone (image at right).

Here's a link to a good chart (external to wikimedia) showing the evolution of a single symbol through the South-East Asian languages: [1]

Medium of Writing

The surface upon which a script is written actually does influence the script a great deal. When carved into stone, such as in the Fuþark or other Runic scripts, the letters tend to be very angular and composed of straight lines. Curves are harder to do in stone than on paper. Also, despite being carved on stone in the Rosetta Stone, the Egyptian Demotic script is much more 'fluid' than the hieroglyphs, and it generally was written on papyrus.

The ancient Sumerians wrote on clay, using a reed stylus to impress symbols on it. This also gives it a very unique look. This type of writing is known as "Cuneiform".

Some modern languages, such as Burmese, retain their unique scripts from the past. The rounded shape of Burmese stems from the fact that it was originally written on palm leaves, which would tear if there were too many straight lines.

In the mixed-case systems of Latin, Greek, and Cyrillic scripts (meaning there is both an "upper-case" (majuscule) and a "lower-case" (minuscule)), the lower-case forms came from cursive versions of the capital letters.

Now, if you have thought about your script and chosen a script type, writing direction, and a medium, now you are ready to create your own script.

Next: Irregularities