Single Transliteration Scheme for all CM Languages - Part 2

Languages used in Carnatic Music & Literature
Post Reply
vasya10
Posts: 101
Joined: 26 Mar 2005, 22:32
Location: USA
Contact:

#101

Post by vasya10 » 06 Feb 2007, 05:28

Arun,

One useful feature could be, and may be you already thought about it, is export the transliterated data as a pdf.
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#102

Post by arunk » 06 Feb 2007, 06:37

vasya,

this is doable now itself. All you need to do is get a pdf print driver which allows you to save what you would normally send to a printer as a PDF file (e.g google for pdf995). With this then from the Printable View, you just choose Print options on your browser, and instead of sending to your printer, choose the pdf printer.

Arun
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#103

Post by arunk » 07 Feb 2007, 01:25

i have tested with looking up a sanskrit word database for using anuswara, and it works. However, there is a significant problem: The input text (as in sAhitya) can have many words (that can be a potential match in the dictionary) combined into single words in english. Note also that when words are combined, they morph as per rules of language.

So unless language rules are applied (which is very difficult), it is impossible to reliably figure out which words in the input do correspond to words in dictionary (i.e. those that require anuswara in sanskrit).

For example, if sangIta comes as such, I can match against saMgIta (with some smart logic). I can even match sangItam (add m if word ends with a and try for a match), but what if the word is karnAtakasangItam in one word (or something else)? "sangIta" can occur anywhere in an input word. Now a solution could be match it anywhere in an input word, but I see an entry for aMsa - and does it mean amsa anywhere should match? . I am thinking not.

So while the dictionary would help, i may not help that much. Of course, i can introduce a feature, where use highlights some text and explicitly asks for a match in database - but that means only a user who knows sanskrit well will be able to provide the correct input that will translate to all languages :(:(. I guess that is going to be our achilles heel.

We are so close to our solution, yet there seems to be an insurmountable barrier :(.

Any suggestions?

Thanks
Arun
Last edited by arunk on 07 Feb 2007, 01:33, edited 1 time in total.
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#104

Post by arunk » 07 Feb 2007, 02:43

arunk wrote:but that means only a user who knows sanskrit well will be able to provide the correct input that will translate to all languages :(:(.
May be this isnt a big deal. If the input represents a sanskrit krithi, then it is not an unfair expectation for the user to be aware of where anuswara figures?

But if the krithi is non-sanskrit, and the user entering the krithi dont know sanskrit rules - how would it be if certain words (in a language other than sanskrit), that happen to be sanskrit based get rendered in sanskrit with no anuswara?

For example, if the word like sangItamu (as entered) is in a telugu krithi, but as rendered in sanskrit say doesn appear with anuswara - is that too bad?

Arun
0 x

vasya10
Posts: 101
Joined: 26 Mar 2005, 22:32
Location: USA
Contact:

#105

Post by vasya10 » 07 Feb 2007, 02:59

(May be im simplifying things a bit, because I didnt understand all the discussions)

For anusvara logic, isnt it enough just to follow the pANinI's rule "anusvArasya yayi parasavarNah" ? Or is the issue beyond that ?
0 x

vasya10
Posts: 101
Joined: 26 Mar 2005, 22:32
Location: USA
Contact:

#106

Post by vasya10 » 07 Feb 2007, 03:02

Just want to clarify what I meant -- if you just encode the 14 sutras of pANini into the database, you should be able to derive anusvara logic.
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#107

Post by arunk » 07 Feb 2007, 03:18

vasya,

yes but that is easier said than done :). It isnt worth it for the scale of our use.

Arun
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#108

Post by arunk » 07 Feb 2007, 21:57

Please let me know if this is ok.. Drs/ramakriya/jayaram - in particular i am going to bother you specifically :). Feedback from others are also very welcome

After racking my brains over this more, I have an alternative proposal which may be the best given our constraints.

For kannada and telugu, there are contexts which certain combinations ALWAYS use anuswara i.e. #n[kg], ~n[cj], n[td], N[TD], m[pb]. Note that for making the input easier to read, for the first two cases, the scheme allows you just n instead of #n, ~n, i.e. pankaja, panca is ok. Also, currently, you would simply use M instead of #n/~n/n/m in all these cases. But as i have noted many times, except in the last case where M represents m, it is not recommended to use this as it not as phonetic, and also can lead to misleading pronunciation for people who do not know the language. Besides, one of the aims of the scheme was to avoid script specific artifacts wherever possible, and this is definitely one place where it can be avoided for these 2 languages.

However, note that for kannada and telugu, there are contexts where certain combinations do NOT ALWAYS use anuswara. Example is mya, mSa etc. We decided here that user would need to explicitly the anuswara (raMya). I think for kannada and telugu, these contexts only have the anuswara implying "m" sound (and not #n/~n/N - right?)

I am thinking sanskrit should also follow the same rule but obviously in more contexts because of use of anuswara in the language. IN THE MIDDLE of a word (end of words - see below), whenever anuswara is required, it needs to e explicitly specified - else no anuswara would be rendered. Of course as per current scheme, this would mean saMgIta, saMtOsha etc. which again is not phonetic, and can mislead pronounciation for some people.

i think malayalam can follow same rule (but contexts where anuswara figures would be the least of the 4 languages).

A more phonetic explicit anuswara specifier for use inside words
But what if we adopt a different more phonetically fair specifier for anuswara in places it represents #n, ~n, n and N sound? For example, one that uses n/N but with a prefix. I propose the back-tick character ` - so you have sa`ngIta, sa`ntOsha. The advantage here is the explicit anuswara specification is still phonetically quite fair - sa`ngIta is much better than saMgIta. I find this a whole lot more desirable than M in such cases. But in contexts where anuswara represents the "m" sound (ahamkAra), we still use M as ahaMkAra. So we have 3 representations for explicit anuswara: `n, `N and M.

(note: we could choose a different character than backtick - only constraint being it should not be too "visible" and intrusive that it becomes an eyesore. We could also use it as a suffix - san`gIta as opposed to sa`ngIta - this may be better representation of the internal structure of the word?

anuswara at end of words for sanskrit
This is tricky in sanskrit as it depends on end of sentence etc. I can detect many cases in logic and apply but i dont think in a reliable way - which means a user that cares need to have control. So I am just going to have three options for sanskrit:
(a) always use anuswaras end of words (regardless of m/M)
(b) never use anuswaras at end of words (regardless of m/M)
(c) use anuswaras only when M is specified explicitly at end of words. This can allow a meticulous user to get the rendition to use anuswaras (at word-endings) in middle of sentences, and not at end of sentence - but its up to the user.

Conclusion:
I think all this basically puts the responsibility on the user to know when sanskrit requires anuswaras and when it doesnt. I think this is ok, the editor is not involved in "teaching how to write sanskrit" :) Besides we were ok with that rule for "my" combinations in kannada and telugu. I dont know why I forgot that :)

Rules for specifying Anuswara
So based on this here are some concise rules i can think of:
(a) tamizh krithis: no need to specifify anuswara ever as it doesnt make sense for the language. When this gets transl. to kannada/telugu, anuswara would be used in middle of words for #n[kg], ~n[cj], n[td], N[TD], m[pb], and also when m is at end of word. When a tamizh krithi gets transl to sanskrit/malayalam, sanskrit-based words may not appear ideally, as they wont have anuswara. This may be ok as, while the word is sanskrit-based, one could argue it is still in the context of a tamizh krithi and thus non-sanskrit, and sanskrit rules for anuswara may not apply. Of course, a person who does care about sanskrit rendition, can introduce explicit anuswara specifiers even in tamizh krithi (e.g. sa`ngIta)
(b) kannada,telugu krithis:
(i) Should not explicitly specifiy anuswara in contexts where it represents ~n, #n, n, N, M (i.e. use panca/pa~nca, Sankara/Sa#nkara, pANDava, amba).
(ii) Should not explicitly specify anuswara for end of words as it always imply anuswara. Use "m" instead
(iii) Must specify in contexts which do not automatically imply anuswara - e.g. raMya.
So basically specify anuswara only when it is not automatically implied. Note again, that this means that when the krithi gets transl. to sanskrit/malayalam, sanskrit-words may not appear ideally. Depending on user's preference then explicit anuswara may be specified for (i) and (ii), but as `n, `N where it represents #n, ~n, n, N, and M ONLY when it represents m sound.
(c) sanskrit krithis: Must specify anuswaras but only where they occur. Again specify `n, `N where it represents #n, ~n, n, N, and M ONLY when it represents m sound. When a sanskrit krithi gets translated to kannada/telugu, it *may* force anuswaras in places which normally are not there? But I am not sure.
(b) malayalam krithis: Must specify anuswaras but only where they occur. I think anuswara would figure and hence need be specified only in cases where it represents "m" sound (like raMya)? If so, the editor may ignore anuswara specifier in places where it represents #n, ~n, n and N sound? (and use actual characters) - so sa#ngIta/sangIta/sa`ngIta would all be rendered as sa#ngIta.

Thanks
Arun
Last edited by arunk on 07 Feb 2007, 22:24, edited 1 time in total.
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#109

Post by arunk » 08 Feb 2007, 02:24

so nobody gives a hoot :)??

Your silence will be conveniently interpreted as rousing approval ;)!

I will implement these and may be when you see it in action, you may be forthcoming in your approval/disapproval!

Arun
0 x

Suji Ram
Posts: 1529
Joined: 09 Feb 2006, 00:04
x 1

#110

Post by Suji Ram » 08 Feb 2007, 02:26

arunk wrote:vasya,

this is doable now itself. All you need to do is get a pdf print driver which allows you to save what you would normally send to a printer as a PDF file (e.g google for pdf995). With this then from the Printable View, you just choose Print options on your browser, and instead of sending to your printer, choose the pdf printer.

Arun
Arun
I downloaded the free version and tried. But all I can get is a pdf file without my work. ??
The way I am doing it is -right click on printable view,print target, and choose pdf995 and hit Ok. It asks for file name to save as pdf. A screen appears asking me to upgrade or continue with sponsor page..... The outcome is a pdf file of the sponsor page.
Help Please
0 x

ramakriya
Posts: 1833
Joined: 04 Feb 2010, 02:05

#111

Post by ramakriya » 08 Feb 2007, 03:15

arunk

have been tied up all day .. Hope to completely read your post and send my feedback by the end of the day..

-Ramakriya
0 x

ramakriya
Posts: 1833
Joined: 04 Feb 2010, 02:05

#112

Post by ramakriya » 08 Feb 2007, 03:18

Suji Ram wrote:Arun
I downloaded the free version and tried. But all I can get is a pdf file without my work. ??
The way I am doing it is -right click on printable view,print target, and choose pdf995 and hit Ok. It asks for file name to save as pdf. A screen appears asking me to upgrade or continue with sponsor page..... The outcome is a pdf file of the sponsor page.
Help Please
Try using primopdf or pdfcreator; I have had better results with these two. The former has some problems when converting word documents with certain formatting. But should not be a problem for normal use. I have not seen any issues with pdfcreator.

www.primopdf.com

http://sourceforge.net/projects/pdfcreator/

-Ramakriya
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#113

Post by arunk » 08 Feb 2007, 03:35

pdfcreator works fine too (although the free version i think it puts something in the footer). Its got a slick interface.

pdf995 is what I use. Its not the greatest interface, and it does bring up the browser to throw up an innocuous of ad of themselves - it is NOT adware. Its a small price to pay for something free and which doesnt put up stuff in the footer. (but if there are other better free tools which dont put up stuff in the footer, i say ditch this one).

suji - i dont know why you got that. I have used it many times and have not seen the problem you are seeing. Perhaps you let it open the (sponsor-ad) page and THEN clicked ok on the dialog where it asks for file?

Arun
Last edited by arunk on 08 Feb 2007, 03:35, edited 1 time in total.
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#114

Post by arunk » 08 Feb 2007, 03:36

thanks ramakriya.
0 x

Suji Ram
Posts: 1529
Joined: 09 Feb 2006, 00:04
x 1

#115

Post by Suji Ram » 08 Feb 2007, 04:40

arunk wrote:suji - i dont know why you got that. I have used it many times and have not seen the problem you are seeing. Perhaps you let it open the (sponsor-ad) page and THEN clicked ok on the dialog where it asks for file?

Arun
Thanks,

got it now ... was doing something dumb. :)
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#116

Post by arunk » 08 Feb 2007, 23:19

ramakriya,

did you get a chance to look at it? If not, I can post an update which has changes adhering to above. I am ready to post it.

BTW, coming to think of it is not a major change to the scheme. In essence, it involves only things:

1. instead of always M for anuswara in EVERY context, use `n or`N for anuswara when the underlying sound is not ma. So you use `n when it represents #n, ~n and n, and `N when it represents N sound.

2. Try to avoid specifying M unless absolutely needed. This is not a new rule.

Thanks
Arun
Last edited by arunk on 08 Feb 2007, 23:20, edited 1 time in total.
0 x

ramakriya
Posts: 1833
Joined: 04 Feb 2010, 02:05

#117

Post by ramakriya » 08 Feb 2007, 23:32

Finally, some comments -
arunk wrote:For kannada and telugu, there are contexts which certain combinations ALWAYS use anuswara i.e. #n[kg], ~n[cj], n[td], N[TD], m[pb]. Note that for making the input easier to read, for the first two cases, the scheme allows you just n instead of #n, ~n, i.e. pankaja, panca is ok. Also, currently, you would simply use M instead of #n/~n/n/m in all these cases.
Correct

arunk wrote:But as i have noted many times, except in the last case where M represents m, it is not recommended to use this as it not as phonetic, and also can lead to misleading pronunciation for people who do not know the language. Besides, one of the aims of the scheme was to avoid script specific artifacts wherever possible, and this is definitely one place where it can be avoided for these 2 languages.
That is fine too.
arunk wrote:However, note that for kannada and telugu, there are contexts where certain combinations do NOT ALWAYS use anuswara. Example is mya, mSa etc. We decided here that user would need to explicitly the anuswara (raMya). I think for kannada and telugu, these contexts only have the anuswara implying "m" sound (and not #n/~n/N - right?)
In these cases, it is not an anusvAra ; It is the vyanjana 'm' that appears in words like ramya, tAmra, Amla etc.

A anuswara is a representaion of an anunAsika (5th letter of each varga #n, ~n, N, n, M), occuring before a letter which is a non-anunAsika vargIya vyanjana ( k c T t p vargas, leaving out the last letter)

When the letter following an anunAsika is another anunAsika, (like in amnAya, vA#nmaya, amma, haNNu, kenne) or one of the following three avargIya vyanjanas (y r l - as in ramya, tAmra, Amla) then the anunAsika is used as it is in the samyuktAkshara.

(This info may be a repetition of what DRS may have said earlier).

When an anunAiska (normally m) is followed by v, S, Sh, s, h, L -> it will be represented by anusvAra.
arunk wrote:I am thinking sanskrit should also follow the same rule but obviously in more contexts because of use of anuswara in the language. IN THE MIDDLE of a word (end of words - see below), whenever anuswara is required, it needs to e explicitly specified - else no anuswara would be rendered. Of course as per current scheme, this would mean saMgIta, saMtOsha etc. which again is not phonetic, and can mislead pronounciation for some people.

i think malayalam can follow same rule (but contexts where anuswara figures would be the least of the 4 languages).
samskrita and malayALam experts should pitch in. All these discussions have made my head dizzy and now I am doubting myself when to use the bindu in samskrita :D


A more phonetic explicit anuswara specifier for use inside words
arunk wrote:But what if we adopt a different more phonetically fair specifier for anuswara in places it represents #n, ~n, n and N sound? For example, one that uses n/N but with a prefix. I propose the back-tick character ` - so you have sa`ngIta, sa`ntOsha. The advantage here is the explicit anuswara specification is still phonetically quite fair - sa`ngIta is much better than saMgIta. I find this a whole lot more desirable than M in such cases. But in contexts where anuswara represents the "m" sound (ahamkAra), we still use M as ahaMkAra. So we have 3 representations for explicit anuswara: `n, `N and M.

(note: we could choose a different character than backtick - only constraint being it should not be too "visible" and intrusive that it becomes an eyesore. We could also use it as a suffix - san`gIta as opposed to sa`ngIta - this may be better representation of the internal structure of the word?
I agree that sa`ngIta is better representation than saMgIta even though I have got used to the baraha's standard saMgIta :)
arunk wrote:anuswara at end of words for sanskrit
This is tricky in sanskrit as it depends on end of sentence etc. I can detect many cases in logic and apply but i dont think in a reliable way - which means a user that cares need to have control. So I am just going to have three options for sanskrit:
(a) always use anuswaras end of words (regardless of m/M)
(b) never use anuswaras at end of words (regardless of m/M)
(c) use anuswaras only when M is specified explicitly at end of words. This can allow a meticulous user to get the rendition to use anuswaras (at word-endings) in middle of sentences, and not at end of sentence - but its up to the user.
Time to dust any samskrita grammar books I have or find one to borrow :/
arunk wrote:Conclusion:
I think all this basically puts the responsibility on the user to know when sanskrit requires anuswaras and when it doesnt. I think this is ok, the editor is not involved in "teaching how to write sanskrit" :) Besides we were ok with that rule for "my" combinations in kannada and telugu. I dont know why I forgot that :)
There you go ..
arunk wrote:(b) kannada,telugu krithis:

(iii) Must specify in contexts which do not automatically imply anuswara - e.g. raMya.
This, again, is not an anusvAra, but vyanjana - So the correct representation is ramya; and hey - that is your current implementation too :)

All this talk about anusvAras reminds me of something funny that happened at the kid's kannada class here; One of the beginner kids told his mother that he could write amma (mother) - The mother was surprised, because in the class the teacher had only covered the vowels and not yet taught any of the vyanjanas let alone samyukAksharas. When asked the kid wrote ಅಂಅ to the surprise of both the teacher and the mother :) which exacty sounds like ಅಮ್ಮ :cool:

-Ramakriya
Last edited by ramakriya on 08 Feb 2007, 23:43, edited 1 time in total.
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#118

Post by arunk » 08 Feb 2007, 23:38

So no occurences of M in kannada EVER when preceding ya,sa,Sa, varieties? Hmm.. I thought someone mentioned otherwise a while ago, but i think i must have been confused it with sanskrit rules.

This does make it easier - no need to specify anuswara in the script for kannada and telugu, since the places it figures are places where there is no ambiguity (it always figures in those contexts).

Arun
Last edited by arunk on 08 Feb 2007, 23:39, edited 1 time in total.
0 x

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01
Location: Maidstone, UK

#119

Post by drshrikaanth » 08 Feb 2007, 23:43

arunk wrote:So no occurences of M in kannada EVER when preceding ya,sa,Sa, varieties? Hmm.. I thought someone mentioned otherwise a while ago, but i think i must have been confused it with sanskrit rules.
Your memory serves you right. There are exceptions her and I had mentioned earlier. sometime anuswAra does occur before y,r & l e.g saMyukta, saMyama, saMrakShaNe, saMlApa saMyOjane etc
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#120

Post by arunk » 08 Feb 2007, 23:45

Yes drs. I was about to post a link your post long ago :)

Anyway here it goes: http://www.rasikas.org/viewtopic.php?pid=27669#p27669 (post #115)

Arun
Last edited by arunk on 08 Feb 2007, 23:45, edited 1 time in total.
0 x

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01
Location: Maidstone, UK

#121

Post by drshrikaanth » 08 Feb 2007, 23:45

Arun
I suggest you cut and paste these bits of info/rules on MSword/Notepad as and when they come up. Then you dont have to rely on memory or others will not have to repeat what they said earlier. Well-meaning comment. Not having a go at you at all
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#122

Post by arunk » 08 Feb 2007, 23:46

yep thanks. I should have done this before :) but was lazy and I thought i could use the search facility on the forum. But separate notes is better indeed

Arun
0 x

ramakriya
Posts: 1833
Joined: 04 Feb 2010, 02:05

#123

Post by ramakriya » 08 Feb 2007, 23:50

arunk wrote:So no occurences of M in kannada EVER when preceding ya,sa,Sa, varieties? Hmm.. I thought someone mentioned otherwise a while ago, but i think i must have been confused it with sanskrit rules.

Arun
Not so fast :( I made an error in making a blanket statement - For eg there are words like samyukta , samyOga etc which are written with anusvAra .. This may be influenced by how this these words are written in samskR.ta also. Let me check with a samskR.ta expert (who is also a kannaDa expert) I know of. Better still, if I can make a member of this forum, and make him contribute to the thread :)

-Ramakriya
Last edited by ramakriya on 08 Feb 2007, 23:57, edited 1 time in total.
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#124

Post by arunk » 09 Feb 2007, 01:09

ramakriya,

it shouldnt matter. For all cases where usage of anuswara is not unambiguouosly implied, explicit specifier needs to be specified - this applies to all languages.

I will upload my new version soon

Arun
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#125

Post by arunk » 09 Feb 2007, 02:21

Hi folks,

I have uploaded another update. This includes the following enhancements

Enhanced Anuswara support
1. Scheme now accepts `n and `N as alternate explicit specifiers for anuswara in addition to the already existing M. These should be instead of M, in contexts when anuswara represents a non-ma sound (i.e. `n when it represents #n/~n/n, and `N when it represents N).
2. Explicit anuswara specifiers should be use only when necessary depending on the language. This means for tamizh usually never, kannada/telugu only in cases like saMyukta as similar, and for sanskrit only for words that do use anuswara.
3. For sanskrit, there are 4 choices that controls use of anuswara at end of words (that end with "m"). The default is anuswara is used for words in middle of sentence but not at the end. Note that the editor tries to figure this out automatically. From my limited testing, it seems to do a fair job. But if it misses an anuswara, you can use M to specify it explicitly. The other choices are: no anuswara (whether or not M is used at end of words), always use anuswara (for all words ending in m/M), and anuswara only for words ending in M. So if you have rAgaM tALam sa`ngItam (a hypothetical and not exactly correct example), then
(a) default would treat it like rAgaM tALaM sa`ngItam
(b) "No anuswara at word endings" would treat it like rAgam tALam sa`ngItam
(c) "Always use anuswara at word endings" would treat it like rAgaM tALaM sa`ngItaM
(d) "Use anuswara only for words ending in M" would treat it like rAgaM tALam sa`ngItam

Fix text to convert to scheme button (the new button which has a spanner/hammer.
This allows you to tell the editor make some conversions so that input text conforms to scheme, and various other changes (e.g. remove unnecessary anuswara specifiers etc.)

My intention is for people to be able to copy/paste text in other "informal" schemes and be able to easily "fix" it to conform to the unified scheme (e.g. vaataapi gaNapatim => vAtApi gaNapatim, and ashaindhaadum mayiloNDRu => asaindAdum mayilonDRu). Please let me know if you find this useful.

For people who havent seen this before:
The link to the unified transliteration scheme editor is http://arunk.freepgs.com/cmtranslit
The link to the scheme is http://arunk.freepgs.com/cmtranslit/cmt ... cheme.html

Any feedback is most welcome.

Thanks
Arun
Last edited by arunk on 09 Feb 2007, 02:35, edited 1 time in total.
0 x

Post Reply