Single Transliteration Scheme for all CM Languages - Part 2

Languages used in Carnatic Music & Literature
vgvindan
Posts: 1430
Joined: 13 Aug 2006, 10:51

Post by vgvindan »

kutty,
The SW has been developed by me in VB. Please refer to http://www.rasikas.org/forums/viewtopic.php?pid=49602#p49602 (#173)
Last edited by vgvindan on 08 Jun 2007, 08:30, edited 1 time in total.

kutty
Posts: 149
Joined: 21 May 2005, 08:23

Post by kutty »

Mahakavi

Kutty:
You have to bear with me for transforming your scheme into this one.
OK, that really took some labor!
Very good effort Mahakavi. You are welcome to transform mine into the standard one, as I find mine more convenient to express the sound we use in colloquial form of Thamizh and Sanskrit which need not be adhered to in the interest of maority. I do realise the transformation to Thamizh would be a bit laborious. If you desire, hereafter I will render the song in Thamizh script to reduce your labour in addition to English if the members/mods do not object.

mahakavi

Post by mahakavi »

kutty:
Please do so, if you don't mind. I really had to do line by line keeping two or more URLs open and switching between them. While arunk's scheme is a boon to do the transformation, it is laborious since long songs are difficult to do line by line. Perhaps you have a better software. I will post the meanings after you transcribe them into Thamizh script. As for the Thamizh script it will make more sense to relate to the meaning which I will post after you do the Thamizh versions of the other two songs. Incidentally one of our forum members (a Keralite) is very much interested in getting the meanings of these songs and that was the primary reason I requested the lyrics.

arunk:
Is there a way to copy/paste the transliterated Roman script of a whole song into your scheme and get the transformation at one stroke?

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

mahakavi,

actually i was trying this yesterday - although ran into some anomalies which i couldnt figure out.

What i did (and what you could do) is follows:
1. copy and paste into the transl. editor
2. click on the "fix" button (the hammer and tools button).
3. It brings up a dialog where you can select one or more "transformations" to apply. You could ask e.g. to convert aa to A, ee to I etc, you could also convert "th" to t etc.
4. Click ok and this should fix most of the text to conform to the scheme and thus could make your job easier.
5. Make any other corrections
6. Click on translate button.

However, the anamoly I ran into was after transformation, when I translated, the text was quite off. For example, "muruga" came up as "m" (mei) + u (uyir)!!! I suspect the copy and paste form the forum page into transl. editor brought with some hidden formatting which confused the translator that instead of seeing "muruga" as one word it perhaps saw it as several words - "m", "u", "r" , "u" etc. This is just my theory. If this is indeed the cause, one way to avoid would be to copy and paste into notepad (i.e. an editor without formatting stuff) and then copy and paste into the transl editor as step 1 above.

Arun
Last edited by arunk on 14 Jun 2007, 19:39, edited 1 time in total.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

note that ee => I doesnt really work - must be a bug.

Also I see usage of e' for E. I can add one but this is not really part of any other convention and so I dont want.

Perhaps a search and replace feature may be useful here.

Arun
Last edited by arunk on 14 Jun 2007, 19:46, edited 1 time in total.

rveeraraghavan
Posts: 29
Joined: 02 Feb 2010, 23:43

Post by rveeraraghavan »

mahakavi, what did you post above ( at 00:27 ) looks like a whole lot of gibberish to me! Does it require something to be installed on the machine? I am running linux, and firefox

mahakavi

Post by mahakavi »

rveeraraghavan:
I wish I could tell you in one line what to do.
Perhaps arunk or thanjavur might help. It has something to do under "View" and selecting the encoding to "unicode" or "western" or something like that. Wait for one of them to respond. I can see clear Thamizh script on my conputer.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

rveeraraghavan - you need to enable Indic script support. Pl. check the following link for details:

http://en.wikipedia.org/wiki/Wikipedia: ... ic_scripts

Arun
Last edited by arunk on 14 Jun 2007, 20:03, edited 1 time in total.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

mahakavi - i will be updating the editor in the next few minutes. After that it will include a search/replace (so you can do e' to E, o' to O in above), another rule to translate some capital letters which dont have representations (e.g. V => v, G => g), and also a bug fix for ee => I.

Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

done. The search/replace button is the last button to the left on the second row of buttons

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

FYI: A few minutes ago I posted an update to the Carnatic Music Transliterator: http://arunk.freepgs.com/cmtranslit

This update has the following:

1. A search/replace feature (the A->B button on the second row - the rightmost button)
2. Fix a bug where ee => I translation (accessible via the wrench/hammer button)
3. Added a rule for converting certain capital letters which have representation only in lower case forms (again, accessible via the wrench/hammer button)

Arun

kutty
Posts: 149
Joined: 21 May 2005, 08:23

Post by kutty »

Mahakavi:

I will send you the Thamizh versions of all the three tomorrow morning. I use a very good transliterating Thamizh SW (which of course does not follow the international standard) named "Azhagi" which you can try from www.azhagi.com Really it is a nice one to transliterate from English to Thamizh. If you have not tried please do so.

palpaandi
Posts: 1
Joined: 21 Jun 2007, 15:50

Post by palpaandi »

hi!!!

as you guys are discussing about the tamil tools for posting
i would like to contribute with my findings so that you ppl will find it useful ,

i recently came across one blog and it directed it to this site, http://quillpad.in/tamil
it will peacefully help in creating long tamil blogs in no time if you can talk in tamil .
so you can write in tamil without knowing tamil script :)
isnt the funda amusing :) :) :)

see through it and have fun

mahakavi

Post by mahakavi »

That is neat, palpaandi!

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

indeed!

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

However, for our purposes (i.e. as in this thread), i think it is better if the representation is easily translatable to other languages as well. One representation - conveys the phonetics of the words, and a way to transcribe it in all the cm languages.

Although that is perhaps a utopian view, as I find most uses being limited to getting it transcribed to only one language :)

Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

Hi folks,

A couple of you had asked me about this and I am glad it turned out to be reasonably easy to implement.

I have updated the cm transliteration editor at http://arunk.freepgs.com/cmtranslit with a new feature that allows you to paste already created tamil/kannada/telugu/sanskrit script. Once you hit the Translate button, this will convert it into the transliteration scheme (results under English tab) as well as to other languages.

You can use this as a convenient "quick start" or "starting point" to get something info the unified scheme - particularly if you already have text in an Indic language or you find it convenient to create it elsewhere. Once you do that you can copy the results under the English tab, modify it as necessary and do further tweaking.

A few points to note:

1. The feature is experimental and so there may be bugs
2. The feature is not 100% reliable. From tamizh script it is not possible to unambuously figure out ka vs ga, pa vs ba etc. So always check the English results and modify it as necessary. The same thing applies for anuswara. While kannada and telugu always use anuswara in some contexts (and hence no ambiguity), Sanskrit doesnt. So conversion of anuswaras using this feature may end being less than satisfactory.

Pl. let me know of bugs and any improvements.

Thanks
Arun

Suji Ram
Posts: 1529
Joined: 09 Feb 2006, 00:04

Post by Suji Ram »

That's cool,
Now I can say I can read thamizh in English.
Thanks.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

thanks suji. if only the notation "editor" was as easy as this one :)

Actually that one has been "sort of close to initial delivery" for a while. The trouble is it requires a lot of documention and that is not always fun to do :)

Arun

Suji Ram
Posts: 1529
Joined: 09 Feb 2006, 00:04

Post by Suji Ram »

arunk wrote:thanks suji. if only the notation "editor" was as easy as this one :)

Actually that one has been "sort of close to initial delivery" for a while. The trouble is it requires a lot of documention and that is not always fun to do :)

Arun
I was going to enquire about the notation editor..looks like I reminded you indirectly... :P

vgvindan
Posts: 1430
Joined: 13 Aug 2006, 10:51

Post by vgvindan »

Those who know telugu script may please comment whether the implementation 'O' in Unicode is correct or not -

kO,khO,gO, ghO - కో ఖో గో ఘో
cO, chO, jO, jhO - చో ఛో జో ఝో
TO, ThO, DO, DhO - టో ఠో డో ఢో
tO, thO, dO, dhO - తో థో దో ధో
pO, phO, bO, bhO - పో ఫో బో భో
mO, yO, rO, lO, vO, LO - మో యో రో లో వో ళో
sO, SO, shO, hO - సో శో షో హో

It may be seen that some letters have been implemented differently. Can someone knowing Telugu language comment whether the highlighted letters are correct?
Last edited by vgvindan on 31 Aug 2007, 22:40, edited 1 time in total.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

vgv,

this would be font dependent. So what I see may not be the same as what you see - since depending on what fonts are installed, which browser (and what the browser selects for telugu - which can be automatic or explicit) we would see different things.

If you are observing a problem, change to a different font (downloading it if necessary) and see if you see things differently. That will let you know if the font you chose has a problem.

Arun

vgvindan
Posts: 1430
Joined: 13 Aug 2006, 10:51

Post by vgvindan »

Can someone please explain how 'SArnga' (in SArnga dhara) will be written in Telugu and Kannada?
In Sanskrit it is written as शार्ङ्ग -
In view of the virama after 'r' - it is not possible to codify Anusvara (sunna) in place of G-n. Can it be written as శార్ఙ్గ - ಶಾರ್ಙ್ಗ - ie without Sunna?
Last edited by vgvindan on 11 Jan 2008, 00:41, edited 1 time in total.

shishya
Posts: 262
Joined: 08 Jan 2007, 20:02

Post by shishya »

this is how it is written శార్ఙ
Last edited by shishya on 11 Jan 2008, 00:55, edited 1 time in total.

vgvindan
Posts: 1430
Joined: 13 Aug 2006, 10:51

Post by vgvindan »

shisya, You have given 'sArnga' without 'g'
శార్ఙ్గము

శాగ్ఙ్గి
This is how it is given in the Telugu Dictionary. Can you please again check.
http://dsal.uchicago.edu/cgi-bin/romadi ... able=brown

ramakriya
Posts: 1876
Joined: 04 Feb 2010, 02:05

Post by ramakriya »

vgvindan wrote:Can someone please explain how 'SArnga' (in SArnga dhara) will be written in Telugu and Kannada?
In Sanskrit it is written as शार्ङ्ग -
In view of the virama after 'r' - it is not possible to codify Anusvara (sunna) in place of G-n. Can it be written as శార్ఙ్గ - ಶಾರ್ಙ್ಗ - ie without Sunna?
vgvindan,

It looks right in Kannada

-Ramakriya

vgvindan
Posts: 1430
Joined: 13 Aug 2006, 10:51

Post by vgvindan »

ramakriya,
Thanks

shishya
Posts: 262
Joined: 08 Jan 2007, 20:02

Post by shishya »

my mistake. Without the ga, it would be sArnya. The dictionary way is right.

vgvindan
Posts: 1430
Joined: 13 Aug 2006, 10:51

Post by vgvindan »

shishya,
Thanks

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

FYI: I have updated the transliteration scheme and transliterator to version 1.1. You can find information about it at http://arunk.freepgs.com/wordpress/cm-t ... lease-v11/ . The big change is support for the "grantha Sa" character i.e. the one for Siva Sakthi (the one you find in Slokam books).

Also I have "redressed" by arunk.freepgs.com website a couple of weeks ago: http://arunk.freepgs.com . It now follows a "blog" like format and so you can leave feedback (or ask questions) on various topics on the website itself.

Thanks
Arun

Mahalakshmi
Posts: 145
Joined: 20 Feb 2008, 17:28

Post by Mahalakshmi »

Dear arunk

Is it possible to transliterate malayalam script into english?

Mahalakshmi

vgvindan
Posts: 1430
Joined: 13 Aug 2006, 10:51

Post by vgvindan »

mahalakshmi,
Arun's website is comprehensive enough for such transliterations. However, if you need any further help, please contact me at vgvindan@gmail.com with sample Malayalam Unicode script.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

Sorry - my transliterator http://arunk.freepgs.com/wordpress/cm-t ... literator/ does not yet have Malayalam to English.

I should be able to easily add but the problem is Malayalam support itself is shaky - so results may not be entirely accurate.

Arun

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Post by arunk »

A new release (v1.2) of the CM Transliterator and the Unified CM transliteration Scheme is now available at http://arunk.freepgs.com. This release has the following features/enhancements:
* Open/Save feature - now you can save you transliteration input to a file on your computer and open it later in the transliterator. Previously you had it do this manually by copying the contents to clipboard and pasting it into notepad or a text editor.
* More options for Sanskrit anuswara control (and some bug fixes in the previous anuswara control)
* Font control - you can control which font the transliterator should use for various languages
* Sanskrit avagraha symbol support.

Please let me know if you have any questions or run into problems.

Arun

ramakriya
Posts: 1876
Joined: 04 Feb 2010, 02:05

Post by ramakriya »

Arun,

I had not seen this post, but noticed the save button when I was doing something today :) Very helpful!

Thanks

-Ramakriya

rshankar
Posts: 13754
Joined: 02 Feb 2010, 22:26

Post by rshankar »

Here is a very nice site that does transliteration very well. I like it particularly because it has an intuitive use of the candrabindu among other things.

http://www.quillpad.com/hindi/

Replace it with http://www.quillpad.com/tamil/ and you will get the tamizh transliterator page and so on....

rajesh_rs
Posts: 184
Joined: 01 Dec 2007, 11:18

Post by rajesh_rs »

This is such an awesome idea. Kudos to Arun!

ragam-talam
Posts: 1896
Joined: 28 Sep 2006, 02:15

Re: Single Transliteration Scheme for all CM Languages - Par

Post by ragam-talam »

Have a question for Arun:
Malayalam has the sound similar to 'Ta' but the T is pronounced with the tip of the tongue touching the front teeth. So, for example, the word 'enTe' (meaning 'my' in Malayalam) uses this sound for T.
How does one represent this in the transliteration scheme?

Also, how does one represent the 'chandra-kalai'? e.g. the ending sound in 'pandu' (for ball)

[ The quillpad site mentioned above provides alternatives, and you can choose the correct one. So for e.g if you enter 'n', the editor proposes ன் / ந் / ண் etc - and you choose the preferred alphabet in the target language. ]
Last edited by ragam-talam on 27 Aug 2010, 01:40, edited 1 time in total.

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Re: Single Transliteration Scheme for all CM Languages - Par

Post by arunk »

r-t,

I have not worked on this in general for a while. As per last status (from a while ago) Malayalam support is incomplete - at that time I remember encountering that Malayalam Unicode support had problems (as in buggy) which was the main problem. I do not know if it has been sorted out.

BTW, the "Ta" with the tip of tongue is pretty much how the first R of kaRRa in Tamizh should be pronounced (it is say nearer to tip) but in transliteration scheme it is "preferably" to say kaTRa since that is "more obvious" (i.e. from a more universal perspective). But kaTRa if spelt stricly using Tamizh phonetics would take a harder Ta (tongue rolled inside a bit) - the one that would be inappropriate for enTe in malayalalam, as well as kaRRa. People may remember srkris saying there was difference between kaRRa and kaTRa phonetically. This is it.

Long story short at this point it would have to be Ta. But given incomplete malayalam support, that isnt going to be a lot of help! Sorry!

Arun

ragam-talam
Posts: 1896
Joined: 28 Sep 2006, 02:15

Re: Single Transliteration Scheme for all CM Languages - Par

Post by ragam-talam »

Thanks, Arun. Hope the improvement will happen some day.
What about chandrakalai?

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41

Re: Single Transliteration Scheme for all CM Languages - Par

Post by arunk »

No support yet (I think, its been a while, I will check). Since I couldnt get basic support working, I did not concentrate on malayalam specifics.

Arun

vgovindan
Posts: 1865
Joined: 07 Nov 2010, 20:01

Re: Single Transliteration Scheme for all CM Languages - Par

Post by vgovindan »

ஶிவன் - The grantha letter ஶ representing श seems to have been codified as part Tamil Unicode set with hex 0BB6. The keyboard mapping for the letter is not known.

srkris
Site Admin
Posts: 3497
Joined: 02 Feb 2010, 03:34

Re: Single Transliteration Scheme for all CM Languages - Par

Post by srkris »

Thanks Sri Vgovindan. That letter is like the ழ, which is playing hide and seek these days.

vgovindan
Posts: 1865
Joined: 07 Nov 2010, 20:01

Re: Single Transliteration Scheme for all CM Languages - Part 2

Post by vgovindan »

Please refer to Post #83 -
viewtopic.php?f=21&t=1999&hilit=anuswar ... =75#p35793

The question of anuswara was discussed in this thread earlier. However, a doubt has been raised by Shri Sivaramakrishnan, a forum member, that in case of Malayalam the anuswara is to be invariably used at the end of a word - like चरणं and not चरणम्. I am of the opinion that virama म् should be used at the end of a sentence or words standing alone; in the middle of the sentence - as the ending of a word (चरणं) - and in certain compound words with prefix like 'saM', anuswara should be used. (saMyukta, haMsa etc - which is not a syllable)

I request views of members - not only with reference to Malayalam, but also Kannada and Telugu. This is required in regard to implementation in my blog posts.

vgovindan
Posts: 1865
Joined: 07 Nov 2010, 20:01

Re: Single Transliteration Scheme for all CM Languages - Part 2

Post by vgovindan »

अङ्क अङ्ख अङ्ग अङ्घ
അങ്ക അങ്ഖ അങ്ഗ അങ്ഘ
?? അംക അംഖ അംഗ അംഘ

अञ्च अञ्छ अञ्ज अञ्झ
അഞ്ച അഞ്ഛ അഞ്ജ അഞ്ഝ
?? അംച അംഛ അംജ അംഝ

अण्ट अण्ठ अण्ड अण्ढ अण्ण
അണ്ട അണ്ഠ അണ്ഡ അണ്ഢ അണ്ണ
?? അംട അംഠ അംഡ അംഢ അണ്ണ

अन्त अन्थ अन्द अन्ध अन्न
അന്ത അന്ഥ അന്ദ അന്ധ അന്ന
?? അംത അംഥ അംദ അംധ അന്ന

अम्प अम्फ अम्ब अम्भ अम्म
അമ്പ അമ്ഫ അമ്ബ അമ്ഭ അമ്മ
?? അംപ അംഫ അംബ അംഭ അമ്മ

Please confirm whether the (??) (sunna-anusvara) marked line transliterations are correct.
However, अण्ण, अन्न and अम्म will be transliterated without anusvara (sunna)

Govindaswamy
Posts: 120
Joined: 21 Feb 2010, 06:55

Re: Single Transliteration Scheme for all CM Languages - Part 2

Post by Govindaswamy »

In My opinion the sunna (anuswara) is used in Telugu and Kannada exclusively. In these two Dravidian languages the sunna replces all the corresponding nasal consonants (மெல்லின மெய்யெழுத்துக்கள் ங்,ஞ்,ண்,ந்,ம், ) which precede the corresponding consonants (க, ச,ட,த ப ) in Tamil. The nasal consonants have become redundant in Telugu and Kannada.
Though I can read Malayalam I have only partial knowledge in it. But I feel that Malayalam is similar to Tamil in use of these nasal consonants ങ്,ഞ്,ണ്,ന്, മ്. However these nasal consonants are joined with the corresponding consonants which follow these. About 50 years ago people used to join couple of letters while writing in Tami
In Indian Language converter which is one of the transliteration softwares which I use the words which you gave get typed as അണ്ണ, അന്ന, അമ്മ.

I checked up your blog spot at random . The Malayalam transliteration is fine. The sample is given below. മന്നിമ്പുമയ്യ . No anuswara/sunna is used. Among the Dravidian languages only Telugu and Kannada have borrowed Anuswara from Sanskrit and in the process lost the use of nasal consonants. In these two languages sunna takes different nasal sounds depending upon the consonantt which follows it.

vgovindan
Posts: 1865
Joined: 07 Nov 2010, 20:01

Re: Single Transliteration Scheme for all CM Languages - Part 2

Post by vgovindan »

Govindaswamy Sir,
This has been projected by a forum member by name Sivaramakrishnan. He says that in places like चरणं, sunna (anuswara) should be used and not as चरणम्. That is why I asked those who know Malayalam to confirm whether use of Sunna is acceptable in Malayalam. (In Tamil, as you know, there is no sunna principle. Malayalam has it.)

When I started my blog, I had placed my transliteration scheme of Telugu, Kannada and Malayalam for suggestions. It was then (2007) suggested that - as you said - only Telugu and Kannada use sunna in all places and not G, J, N, n and m. However, no one suggested about Malayalam then. All my blogposts follow this method only.

That is why I needed approval from people knowledgeable in Malayalam.
Thanks for the input. I shall await more response.

Post Reply