When you write a blog on translations of songs between languages, spelling and transliteration issues will definitely come to bite. Let me comment on the spelling/transliteration I use for each language I've worked with.
- Languages with standard orthographies in the Latin alphabet: Italian, English, French, German, Spanish, Portuguese, Galician, Swedish, Indonesian, Xhosa, Zulu, Haitian Creole, Latin, Venetian, Friulian, Occitan, Middle English, Swahili, Finnish, Albanian, Czech, Slovak, Croatian, Vietnamese, Turkish, Irish, Romanian, Danish, Hungarian, Lingála; for these, there isn't really much of an issue; just a couple special mentions: for Vietnamese, I decided to link syllables of single words with dashes, whereas the normal orthography is to separate all syllables with spaces (so I write "cô-bé" for "girl" while it's normally written "co bé"), and for Irish I use both the standard and my own creation, which was conceived to be easier for me to crack, but I don't really remember exactly how it was supposed to work; oh, and in Indonesian I try to mark all /e/ as é to distinguish them from schwas, which the standard orthography doesn't do; and I try to mark tones and the e/ɛ and o/ɔ distinctions in Lingála, which isn't always done;
- Languages spelled with the Latin alphabet for which I either know of no standard, or of multiple competing standards: Romagnolo, Neapolitan, Sicilian, Mende; let's look at these one by one:
- For Romagnolo, I'm transcribing a specific variety, and I devised my own orthography; there is a standard orthography, but I don't know how it works; judging by a dictionary I consult, I'm really not too far off, the main difference between me and it being it's transcribing a dialect with /e/ and /e:/~/ej distinct, so it uses ē for the long vowel / diphthong and é for the short one, whereas I, not having the short vowel, just use é for the diphthing;
- For Sicilian, I know of two ways to spell autoctonous sounds, but I decided to devise my own conventions;
- For Neapolitan, there are three conventions regarding the spelling of the schwa sound: Neapolitan proper spells schwas as what they were before vowel reduction; Abruzzese dialects (as well as the San Benedetto del Tronto dialect, and perhaps others as well) use e for the schwa; Facebook illiterate dialect speakers don't write the schwa at all; as for me, I usually stick to whatever convention the dialect I'm working with uses, perhaps marking schwas with some diacritic; I guess it's typically using overdotted non-reduced vowels in Neapolitan proper, and ë or ė for other "nap-code" dialects;
- For Mende, there are distinct conventions as regards: vowel lengthe, either using macrons or doubling vowels; and e/ɛ and o/ɔ, where these symbols are used sometimes, the distinction is sometimes ignored, and sometimes you find ẹ/e (or e/ẹ?) and o/ọ (or viceversa), and then there is o̱ which I assumed was I-don't-recall-which of these sounds; I use e/ɛ, o/ɔ, and macrons;
- Languages with a standard orthography that uses a non-Latin alphabet: Russian, Ukrainian, Mandarin, Cantonese, Japanese, Modern and Ancient Greek, Korean, Arabic, Hindi, Urdu, Persian, Bulgarian, Hebrew; for the native orthography, no issue; for the transliteration, as with any transliteration, I strive for back-transliteratability as well as phonetic transparency; let's discuss each language:
- My romanization of Russian is a modification of some "standard" systems; here's how I transliterate each Ciryllic letter:
АаБбВвГгДдЕеЁёЖжЗзИиЙйКкЛлМмНнОоПпРрСсТтAaBbVvGgDdĴe ĵeĴo ĵoŽžZzIiJjKkLlMmNnOoPpRrSsTt
УуФфХхЦцЧчШшЩщЪъЫыЬьЭэЮюЯяUuFfKh khTs tsĆć (except when
it's pronounced š,
in which case it's Čč)ŠšŚś`Yy'EeĴu ĵuĴa ĵa
So there is one case where I privileged phonetic transparency over back-transliteratability: the choice not to use c for the ts sound; note that the ĵ has a circumflex, which was placed there specifically to have back-transliteratability; I originally used carons instead, but ě́ and similar didn't render well in the captions for this video (or was it the title?), so I switched; NOTE: I was checking up transliteration explanations when I went «Oh, so I use ` for the tvĵórdyj znak?», so I may have transliterated that differently or not at all in some places; - I honestly haven't put much thought into how I transliterate Ukranian;
- For Mandarin, the native orthography is well-standardized, and the transliteration I use is Pinyin; there may be some oscillation on whether I mark "neutered" tones as neuter or as what the character's root tone is, for example 个 is gè, but is often pronounced ge; I suspect I normally mark that because Google does, and I fix Google's Pinyin when I transliterate stuff here rather than transliterating from scratch; spaces are mostly placed according to dictionary words, save for what I do with aspect particles guò le zhe, which I usually stick only to verbs; I use dashes for doubled verbs and adjectives, I think, and perhaps for some other 4-char phrases too;
- For Cantonese, the orthography is well-established, and the transliteration is Jyutping with tone numbers and dashes for syllable division within words; when I worked on my first Cantonese song, I was transliterating with Wiktionary which uses Yale, then I discovered CantoDict, a much more complete (at least at the time) dictionary using Jyutping; the switch may have left hybrids in my old file, so some transliterations may still be hybrids;
- For Japanese, the orthography is… fuzzy; I probably tend to over-kanji-ize, since I use furigana on every kanji, though this trend may have reduced in recent times; the romanization is Hepburn in any case, modulo using ou and ei as distinct from ō and ē which are reserved for おお and ええ and not applied to おう and えい; I wonder if I've ever attempted to distinguish ou and ei pronounced ō and ē from those pronounced ou and ei;
- For Modern and Ancient Greek, the orthography is well-established (except for some Modern words like μαντήλι/μαντίλι, though I believe the rule is now no eta in loanwords? In any case, in the specific example, I used the eta because my lyrics used it); as for transliteration, again, a compromise between back-transliterability and phonetic transparency; I'm too lazy to make another table, so here's the Greek alphabet as transliterated for Modern Greek: a-b-g-d-e-z-i̱-th-i-k-l-m-n-x-o-p-r-s-t-y-f-kh-ps-ō, but then you have αι ει οι υι = ä ë ö ÿ and αυ ευ = ay̆ ey̆ or af̆/av̆ ef̆/ev̆, and that should be all; as for Ancient, a-b-g-d-e-z-ē-th-i-k-l-m-n-x-o-p-r-s-t-y-ph-kh-ps-ō;
- For Korean, I use the Wiktionary's Revised transliterations; I don't really take cases of non-phonetic Hangul into account, but I haven't really thought much about this anyway;
- For Arabic, the standard transliteration is just outright stupid; I means, are you seriously having us distinguish LEFT AND RIGHT QUOTES? You gotta be kidding me, right? Since ayn sounds kind of like a nonstandard pronunciation of r I've heard in Italy, I decided that ř would be ayn (I mean, řayn), so the apostrophe is definitely 'ālif; for a few more remarks, I'll quote this post:
The transliteration scheme is my own. In particular, I differ from the "standard" transliteration in the following respects: I use ŕ for the left-quote transliterating ŕayn (ع), because using left-quote and right-quote for two different sounds dounds like complete madness to me, and because the sound of ŕayn sounds like a particular kind of "r moscia" (nonstandard pronunciation of the Italian phoneme /r/); I prefer using th and dh to ṯ and ḏ for the dental fricatives (ث and ذ), because it is quicker to type and easier to read; I use gh for ghayn (غ), for similar reasongs to th and dh; I guess I did not have to think about what to do with kha (خ), but either kh or x are fine: x is IPA, kh is just as easy to type and read; standard probably has ḵ, which I doff for the same reasongs as ṯ and ḏ; should ta-ha, dal-ha or kaf-ha ever occur, to avoid confusion with tha, dhal and kha, I'd use t.h, d.h and k.h respectively; I doffed š for šin in favor of sh, with the same solution to sin-ha combinations, for the same reasons.And then I promptly forget about everything said above in my other Arabic post Problems, where the romanization is compatible with the following table:
ابتثجحخدذرزسشصضطظعغفقکلمنهوی'/ābtṯjḥxdḏrzsšṣḍṭẓ`ğfqklmnw/ūhy/ī
- For Hindi, I use the Devanagari standard orthography as a reference for back-transliteratability; now, Hindi has this perfect system which could be fully phonetic… and then it decides to not use it as such; more specifically, it mutes a bunch of schwas, and virtually regularly turns "aha" into /ɛh(ɛ)/, like in the word पहले, which looks like "pahale" but is actually "peh(e)le", and "ahu" and "uha" into /ɔhɔ/, as in बहुत "bahut", pronounced "bohot"; for back-transliteratability, I use ä for /e/s that are spelled a, å and ů for a and u pronounced o (so "båhůt", "můhåbbat"), and ' to indicate mute schwas, so the word from before is "päh'le"; then it happens that some normally-muted schwas resurface in singing, in which case I use a literal schwa, ǝ; for example, एक looks like eka, but is actually pronounced ek', except in "ekǝ din" found in this video; I use ai and au for ऐ and औ, as is standard; then there is the matter of anusvāra and candrabindu; normally, these are nasalizing diacritics: put one on इ i, and it becomes nasalized; in that case, I use ṅ for the anusvāra and ṃ for the candrabindu; for example, the verb "to be" (honā) conjugates हूँ, है, है, हैं, हो, हैं, a.k.a. hūṃ, hai, hai, haiṅ, ho, haiṅ; then there's words like चांद or संबंध, which look like cāṅd' and saṅbaṅdh', but where the anusvāras are pronounced as nasals that assimilate to the next consonant; I use ń in those cases, so cāńd' and sańbańdh'; I don't know if candrabindus can do that, but if they can and I ever find one that does, I'll definitely use ḿ; I believe that's all; note that this was conceived over a long time, so there may be leftover errors that I missed when correcting posts – especially in the one I haven't corrected yet [as of the last edit to this post, I wonder if by 1/7/21 I had done that…]; this also means that the same word in Hindi or Urdu spelling would be transliterated differently;
- Speaking of, let's deal with the ABSOLUTE MESS that the Urdu script is; let's make a table of letters, transliterations, and letter names:
ابپتٹثجچحخدڈذرڑزژسشصضطظ', âbptṭs̱jchxdḍẕrṛzžsšṡẑṫż'alifbêpêtêṭês̱êjîmcêbaṛî hê
huttî hêxêdâlḍâlẕâlrêṛêzêžêsînšînṡẃâdẑẃâdṫôêżôê
عغفقکگلمنںوهھیےؤ ,ئ ,إ/أ`ğfqkglmnṅ, ṇw, û, ô, ô̱, ŭḥ, ĥẖy, î, ě, ě̱, ẹ̌, ĭ?ê, ê̱'', ŷ, ŵ`ě̱nğě̱nfêqâfkâfgâflâmmîmnûnnûn ğunnaĥwâŵogôl ḥê
cḫôṭî ḥêdô cašmî ḥêcḫôṭî yêbaṛî yê'alif hamzaĥ,
yê hamzaĥ,
wâŵo hamzaĥ
If you're going «What the heck?!», that's exactly what I thought; a few remarks:- First of all, ẃ is probably ONLY in the names ṡẃâd and ẑẃâd, because there is a w in the pronunciation, but it is not written; and I thought the consonants were the strong point of Urdu…;
- Next up, yes, s̱ê and sîn and ṡẃâd, as well as zê and ẕâl and ẑẃâd and żôê, as well as tê and ṫôê, are pronounced exactly the same; they are purely etymological distinction for Arabic terms;
- ḫ just indicates the aspiration of the previous consonant;
- As for baṛî hê and cḫôṭî ḥê, they are pronounced the same, except when the latter is silent, in which case I write ĥ;
- Nûn ğunnaĥ was apparently created to nasalize vowels, hence my ṅ, the same as for the anusvār' in Hindi; they even created ڻ for the retroflex nasal ṇ, it would seem; and then I find दर्पण darpaṇ' spelled with a nûn ğunnaĥ; like WTF? So I guess they ditched the specific character, and used nûn ğunnaĥ for ṇ too, hence my double transliteration; also, I think medial and initial forms of nûn ğunnaĥ and of plain nûn coincide?
- For `ě̱n, I always make it a silent `; some argue it represents vowels in some places, but I bet it's only ever found in (ultimately) Arabic loans, where it was an actual /ʕ/, hence my `;
- And speaking of vowels, they are a complete mess; so, 'alif is used either as etymological, in which case I use ', or to represent a long a (the आ ā of Hindi), in which case I use â; I haven't developed a convention for 'alif madda (whatever that is spelled like), i.e. آ, but I guess it would have to be 'â;
- Wâŵo is either w, its root sound, or used to represent the sounds ओ o and औ au and उ u and ऊ ū of Hindi, which I render respectively as ô, ô̱ (note the underline), ŭ, and û; because not writing short vowels was bad enough, now we hacve some of them written and some not, and I think there are also some unwritten ऊ ū, and am pretty sure some ओ o and औ au aren't written either;
- Yê is the biggest mess of all; well, *"the yê's are"; so, yê is of couirse used for y, and for long ई ī, which I render y and î respectively; all good there; then you have ए e and ऐ ai in Hindustani, right? What do you do for those? Apparently they created baṛî yê ad hoc… and then restricted it to word end; yes, you heard that right; they create a glyph for two sounds, and then restrict it to the end of words, and use, guess what, yê (I mean cḫôṭî yê) for those same sounds in other positions; like, seriously? Anyway, when ě = ए e spelled with cḫôṭî yê, ê = ए e spelled with baṛî yê, ě̱ = ऐ ai spelled with cḫôṭî yê, ê̱ = ऐ ai spelled with baṛî yê, and then since I'm crazy I decided I would specify when something is an ezafe (you know that Persian grammar construct which Urdu borrowed? Yeah, that one), and went with ẹ̌; that is AFAIK pronounced as Hindi ए e, never ऐ ai, and that is a blessing because I couldn't render such a difference, that would be too many diacritics; finally, as with w, I suppose y could also happen to write short इ i at times, which is why I have "ĭ?";
- Then we have the hamza carriers, which are their consonant representations with a circumflex, except for 'alif hamzaĥ where apostrophe-circumflex '̂ would look terrible, so I just gave up some degree of back-transliteratability – or did I? You judge if '' can be mistaken for double 'alif; actually, I think that's disallowed, and the madda ˜ was invented in Arabic just to avoid that combo; btw, WTH is up with the keyboard layout that doesn't have any of these except for ŷ, which it has on layer 1? I don't think a lone hamzaĥ is allowed in Urdu, so that is not a problem luckily;
- I may have to come back to this because apparently ezafe can also be written with heh (see Persian below), but I haven't seen that yet in Urdu.
- For Persian, I don't exactly remember what conventions I developed; looking at this video, it seems I used ë for /e/ "written as heh" (including ezafes, transliterated -e when unwritten), separated plural marker hâ with a dash, used ř for ayn, č for če, ǧ for qaf, ' for alif (except for when I used ʔ instead, as in ʔAgë), š for shin, â for "long a", x for kh, ẃ for the mute w in bexẃâd which is pronounced bexâd, ṣ for ṣad (as opposed to s for sin), presumably underdots for all unmentioned emphatics, and ḥ for ḥe, and marked all written vowels with macrons (except of course for â);
That was my original comment; I have since decided to actually develop a scheme to apply to posts and translations from after 1/7/21; this will transliterate the letters as follows:
ابپتٹجچحخدذرزژسشصضطظعغفقکگلمنوھی'/âbpts̱jčh/ëxdẕrzžsšṣẑṭẓ`ğfqkglmnw/û/ôẖy/î/ê
When I use ë, it means "vowel e spelled with he"; note that ezafes, progressive(?) mî's, and possessives are separated with dashes, so mî-konî, beh-et, etc; - For Bulgarian, I didn't put much thought into it, I probably followed Wiktionary; look at this post and infer what you can;
- For Hebrew, I have two systems: one fully back-transliteratable, and one less cluttered and more readable; let's see a table of the transliterations of letters in the first system, make a couple remarks, and then see what changes in the second system:
If dageshes were used more often, most pairs of transliterations would be no dagesh / dagesh, the exception being s/sh whcih depends on the also-omitted top dot; yod and vav are often used to mark vowels, hence the multiple transliterations, the ogonek showing the vowel in question is marked; I also separate articles from nouns and conjunctions and prepositions from whatever follows them with dashes, so you see things like l-i for "in me", shel-i for "of me" (=my, mine), and ḥa-kavód for "the respect"; the transcription was made a bit carelessly at times, but it should drop the ogoneks (ǫ ų į ę), the ḥ (meaning ḥeḥ is effectively not transliterated in the transcription), and the ' and ` (so 'alef and `ayn are also untransliterated); I will make sure that is all the differences;אבגדהוזחטיכךלמםנןסעפףצץקרשת'v/bgdḥv/ų/ǫzħṭj/į/ękh/klmnṣ`f/ptzqrs/sht
- My romanization of Russian is a modification of some "standard" systems; here's how I transliterate each Ciryllic letter:
- Languages without a standard orthography or a standard script: Min Nan, Hakka, Teochew; I've covered all of these in this post.
No comments:
Post a Comment