Single Transliteration Scheme for all CM Languages - Part 2

Languages used in Carnatic Music & Literature
Post Reply
drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01
x 1

#76

Post by drshrikaanth » 03 Feb 2007, 01:07

See post 127 and around it. Same logic holds here as well. There may have been another discussion as well about this. You maybe abe to fiind it

http://rasikas.org/forums/viewtopic.php?pid=27698#p27698
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#77

Post by arunk » 03 Feb 2007, 02:01

its possible we discussed this but the case of "M" at end is what I remembered and it is implemented (i.e. without anuswara for sanskrit).

I did the anuswara for panca etc. based on that book i was talking about. But i also vaguely remember seeing other sources like http://carnatica.net/lyrics/ooth9.pdf, where anuswara is used at the end (!) but not in nca/cha etc. (2nd krithi). There doesnt seem any consistency - atleast thats what I thought.

Once I put sanskrit logic, I had asked people several times to point out any errors so that i can fix the logic after I put it up. I didnt hear a peep. Perhaps they assumed i wasnt listening or incapable of listening :)

Arun
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#78

Post by arunk » 03 Feb 2007, 02:26

never mind - i think it is easy to make it an option. The default would be no anuswara in the middle or at end, but people can change it if they want. The second would handle most of hindi except for the urdu influenced words.

Arun
0 x

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01
x 1

#79

Post by drshrikaanth » 03 Feb 2007, 02:27

arunk wrote:But i also vaguely remember seeing other sources like http://carnatica.net/lyrics/ooth9.pdf, where anuswara is used at the end (!) but not in nca/cha etc. (2nd krithi). There doesnt seem any consistency - atleast thats what I thought.
You still have doubts about "m" occurring at the end!:rolleyes:
Once I put sanskrit logic, I had asked people several times to point out any errors so that i can fix the logic after I put it up. I didnt hear a peep.
Wish I had all the time in the world(And no job) to answer you queries :)

Its the same logic in midde as well. "Nearly" Always show the vyanjanas explicity even when the conjunct has a nasal consonant as the 1st half. "Nearly" beacusem there are some exceptions like samyukta where "sam" is a prefix to an otherwise independent word(yukta in this case). This means samga will not be saMga. saMsarga, saMyukta, saMtOSha, saMgIta, saMgAna etc yes but Not saMga, saMkaTa, etc. Note here that tOSha, yukta, gIta and gAna are independent words witha saM suffix but Noy ga, kaTa,

Likewise words with "kAra" suffix like ahaMkAra, jhaMkAra will feature the bindu only, not the consonant itself. I am not sure if there are exceptions to this. My thinking tells me other "suffixes" like cAra will also behave similarly. Basically, if they are one unit and form an integral part of the word to make sense, use consonant. If added as suffix or prefix, use anuswAra/bindu.
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#80

Post by arunk » 03 Feb 2007, 02:27

ramakriya - did the export to dokuwiki feature help?

Thanks
Arun
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#81

Post by arunk » 03 Feb 2007, 02:45

drshrikaanth wrote:You still have doubts about "m" occurring at the end!:rolleyes:
i guess i do now :). Doesnt that pdf file use anuswara at the end (e.g. santatam aham)? My point is whatever the correct rules are, in practice (i am guessing owing to hindi's popularity), there are variations (?)

Please also check my other post in languages thread in response to rules you mention

Arun
0 x

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01
x 1

#82

Post by drshrikaanth » 03 Feb 2007, 02:50

Arun
We have dissussed at length about m/M use in the end. We aso discussed the reasons for variations- not necessarily hindi's popuarity but because of the influence of spelling in one's mother tongue. If you still have doubts, Iam not responsible for it. I dont have doubts in this matter at least.
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#83

Post by arunk » 03 Feb 2007, 02:51

DRS said the following in another thread regarding rules as to when anuswara appears in sanskrit in the middle of words:
Its the same logic in midde as well. "Nearly" Always show the vyanjanas explicity even when the conjunct has a nasal consonant as the 1st half. "Nearly" beacusem there are some exceptions like samyukta where "sam" is a prefix to an otherwise independent word(yukta in this case). This means samga will not be saMga. saMsarga, saMyukta, saMtOSha, saMgIta, saMgAna etc yes but Not saMga, saMkaTa, etc. Note here that tOSha, yukta, gIta and gAna are independent words witha saM suffix but Noy ga, kaTa

Likewise words with "kAra" suffix like ahaMkAra, jhaMkAra will feature the bindu only, not the consonant itself. I am not sure if there are exceptions to this. My thinking tells me other "suffixes" like cAra will also behave similarly. Basically, if they are one unit and form an integral part of the word to make sense, use consonant. If added as suffix or prefix, use anuswAra/bindu.
Unless I am mistaken, things got a bit complicated now.

What this tells me is that for my logic, it would be best if I force anuswara for sanskrit, only in the middle and only if explicitly specified as M and let people specify it judiciously (i.e. it would be too difficult for the logic to know which is one unit vs. suffix etc)..

But for languages like kannada, telugu when preceding k(h)a, g(h)a, c(h)a, j(h)a (and others), the anuswara always figures right? So this would mean that specifying M in the middle for stuff should be used judiciously even when entering for other languages - should be used ONLY if it is an anuswara in sanskrit, otherwise sanskrit rendition would be screwed up. This is certainly a big wrench since a person entering telugu or kannada, and even worse tamil may have no idea about these rules in sanskrit.

This also means that for such words "phonetically better variant in english" would be wrong and cannot be used (i.e. never sangIta, always saMgIta)

This allows me to ask a question which i have add ever since i was exposed to it: What is the purpose behind the answara? It seems they represent some other sound for which a character does exist in the script? Why then not use the character itself?

(or may be i should retire to a "less than perfect" sanskrit rendition - i.e. always use anuswara or never use anuswara)

Arun
Last edited by arunk on 03 Feb 2007, 03:05, edited 1 time in total.
0 x

ramakriya
Posts: 1833
Joined: 04 Feb 2010, 02:05
x 1

#84

Post by ramakriya » 03 Feb 2007, 02:51

arun - I have not experimented with the export feature yet.

I found one problem with the variables. Or I may not have understood how to use it :(

1. If I type as caraNam -then the kannaDa transliteration should show it as caraNa. Right? But that is not happening. It does show up as caraNam, with a bindu at the end

2. The key word is not recognized as a variable at all sometimes - even though the spelling is correct.

-Ramakriya
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#85

Post by arunk » 03 Feb 2007, 02:54

variables are experimental.

#1. It shows up as caraNam because I seemed to have (incorrectly) defined it as such. I need some help in knowing this (for all). I know drs gave the kannada equivalents, i need to go and incorporate them.

#2: Even when you click on a word and hit the "$" button? If so, can you give an example? If it is on "convert all" (i.e. 3 arrows pointin to $ button), then it is on purpose. I didnt want to mistakenly convert words in the sAhitya portion and thus am extra careful in looking for certain patterns.

Arun
Last edited by arunk on 03 Feb 2007, 03:00, edited 1 time in total.
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#86

Post by arunk » 03 Feb 2007, 02:59

did i say you are wrong or that i was somehow right so as to try to put doubts in your mind?

Jeez!

Arun
0 x

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01
x 1

#87

Post by drshrikaanth » 03 Feb 2007, 03:20

Did I say you did that to me! Jeez!;)
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#88

Post by arunk » 03 Feb 2007, 03:28

Unless my assumptions/conclusions are yet again wrong, i am thinking of doing the following
1. Change the word "Sanskrit" to "Devanagiri" as it appears on the editor. This is mainly to indicate that generate script may not be considered proper Sanskrit as all written rules are not followed
2. Have 2 anuswara options for devanagiri:
(i) Always generate (so more like Hindi)
(ii) Never generate (closer to Sanskrit but not that close=> words like sangIta would be all messed up).

I dont know if this salvages the situation enough. Also I dont know if option 2(ii) is that useful as it would be a mixed bag (neither hindi like nor sanskrit like)

Suggestions?


Arun
Last edited by arunk on 03 Feb 2007, 03:30, edited 1 time in total.
0 x

jayaram
Posts: 1306
Joined: 30 Jun 2006, 03:08
x 4
x 4

#89

Post by jayaram » 03 Feb 2007, 03:34

Arun - the bindu at the end is how I know, based on my Sanskrit classes in school and college. Usage of M seems to be a variation, sometimes for aesthetics. If you read thru No.2 kriti (vAnchasi yadi) in the pdf file, you will find occurrence of both m and M for the word kuSalam/kuSalaM. To make it simple for yourself, I would suggest you go with the bindu version.

Btw, the way they have written rAgaM and tAlaM is jarring, at least to my eyes!

Also, 'ambika' (as in kamalAmbika) is not written with bindu, the half-consonant is used. At least this is the way I have read and written all these years.
0 x

jayaram
Posts: 1306
Joined: 30 Jun 2006, 03:08
x 4
x 4

#90

Post by jayaram » 03 Feb 2007, 03:37

Also you will note 'vAnchasi' is written without bindu, but with the half-consonant.
0 x

jayaram
Posts: 1306
Joined: 30 Jun 2006, 03:08
x 4
x 4

#91

Post by jayaram » 03 Feb 2007, 03:46

And DRS is correct in saying that one's mother tongue has an influence on how these are written in Sanskrit. Coming from a Kerala background, I was taught to use the half-consonants instead of the bindu in most cases (within words). Malayalam follows similar rules.

The Namboodiris of Kerala are reputed to have the 'most authentic' knowledge of Sanskrit, so obviously I had assumed we were taught the most accurate version! :)

(finally, perhaps we should move this language discussion to where it belongs - arun's thread!
Let OP-ji rest in peace!)
Last edited by jayaram on 03 Feb 2007, 03:54, edited 1 time in total.
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#92

Post by arunk » 03 Feb 2007, 05:10

I found this link which talks about anuswaras in context of sandhi rules:
http://www.sanskrit-sanscrito.com.ar/en ... rules.html. It talks about when "m" at end of word becomes anuswara and when it does not. Basically if it is followed by a word that begins with a consonant.

This seems to be followed here: http://sanskrit.safire.com/pdf/DURGA700.pdf, where you have cases where "m" at end is rendered as consonant, and also cases where you have it as anuswara. You see it at an anuswara at end a "line"/"sentence" (so no word to follow and hence no consonant to follow) i.e. before a | or ||. For example, the title itself, first line on the right side, and also several other places. You see the bindu used "within a line/sentence". The cases of bindu inside words is much much rarer (but is there on page 6 - "saMhati..."(?), also on page 12 - saMyugE (?)), and that is of course what drs said.

Of course I dont know how official/authentic these are but atleast I wanted to see some reasoning behind the "mixture of bindu and no bindu cases" - and I see it now.

Now the rule for end of word within a sentence and followed by a consonant is something that is possible to program.

The trouble is when bindu occurs in the middle depends on interpretation of words etc. and not possible to program without an elaborate setup with look ups to dictionary and such.

So I think we are still down to either

(a) use it like telugu and kannada, and hindi. (i.e. always use it).
(b) or not use it.
(c): use it only at end (i.e. following end of word rule above) but never in the middle.

Of course all of them are not correct for Sanskrit, but I am guessing/hoping that

(a) would be ok for people to read (as they may apply their native language rules).
(c) looks like closer to sanskrit and ma....y be passable although it will definitely messup words that drs mentioned.
if (c) is done at all, (b) is useless

Can people pl. chime in and give me advice on whether (a) is ok, and whether i should even bother with (c)?

Thanks
Arun
Last edited by arunk on 03 Feb 2007, 05:11, edited 1 time in total.
0 x

jayaram
Posts: 1306
Joined: 30 Jun 2006, 03:08
x 4
x 4

#93

Post by jayaram » 03 Feb 2007, 15:25

Arun - I get the feeling if you go with option (a) for Devanagari, we may do the same for Malayalam! And it does look a bit weird if this option is used in Malayalam, at least for old-timers like myself.

My own take on this:
1. ok to use bindu across the board for the endings. as i said earlier, the M ending is for aesthetics, don't believe there's a rigid rule for this.
2. use half-consonant within a word using the appropriate rules - tough to implement, I agree, but at least this can be done for certain often-occurring words, perhaps you could look thru Dikshitar kritis for words such as 'ambika': http://www.rogepost.com/n/4405894335
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#94

Post by arunk » 03 Feb 2007, 20:08

yes jayaram it would be less than ideal for malayalam - that is not good either.

I will try the more difficult approach. For sanskrit (and malayalam too?), as drs indicated, the # of cases which DONT employ bindu in the middle of the word outnumber the cases where it does. So I could build up a database of known words that do employ bindu and use smart matching. So by default no bindu except for these known words. This will handle amba etc. correctly by default. It will also handle sangIta, santOsha (assuming they are in database).

On top of that, it may be possible to introduce a feature in the editor (not the scheme), to force use of bindu in sanskrit/malayalam for a specific word. So with a combination of this and the database of known words, we may be able to get things right. Although unless the database of known words is good (so that it takes care of almost all common cases of occurences in kriti), it would be a pain for the user to have to spoon feed the editor.

I will look into this.

Thanks
Arun
Last edited by arunk on 03 Feb 2007, 20:10, edited 1 time in total.
0 x

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01
x 1

#95

Post by drshrikaanth » 03 Feb 2007, 20:22

arunk wrote:For sanskrit (and malayalam too?), as drs indicated, the # of cases which DONT employ bindu in the middle of the word outnumber the cases where it does. So I could build up a database of known words that do employ bindu and use smart matching.
Forget about doing this Arun as the list of words will stretch to several thousands! I just checked. The way out would be to link up with a pre-existing onine dictionary and match with that spelling.
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#96

Post by arunk » 03 Feb 2007, 21:54

I was afraid of that. It may be possible to interface with a dictionary (or build our own which can be interfaced more easily). Of course more work :) but not herculean

Arun
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#97

Post by arunk » 03 Feb 2007, 22:18

i did multiple searches on the cologne-sanskrit dictionary for occurence of aM, eM, iM, uM, oM (i think their transl. scheme use M only in right places - pl. confirm). The search is case-insensitive so it matches stuff we dont need. So some filtering was needed afterwards.

I saved the (massive) results on my local disk. Did some (programmatic) filtering and assuming I did it right, there are 3076 words in that dictionary which use M (in those contexts). The cumulative # of bytes for all these words is about 34K. Not that bad actually that loading it into memory with editor is not fully ruled out.

Of course the scheme that cologne-sanskrit dictionary uses is different and so some more "translation" is needed to our scheme (which can increase the # of chars). This is no big deal.

Drs - pl. let me if it is ok for me to send you the results to see if he list of matched words make sense (i.e. whether i got a good representative list).

Arun
Last edited by arunk on 03 Feb 2007, 22:19, edited 1 time in total.
0 x

drshrikaanth
Posts: 4066
Joined: 26 Mar 2005, 17:01
x 1

#98

Post by drshrikaanth » 04 Feb 2007, 00:24

arunk wrote:i did multiple searches on the cologne-sanskrit dictionary for occurence of aM, eM, iM, uM, oM (i think their transl. scheme use M only in right places - pl. confirm). The search is case-insensitive so it matches stuff we dont need. So some filtering was needed afterwards.
I searched on Cologne too but used a different combination. Your combinations like am , eM will come up with what we dont need as well as you have roghtly pointed out that it is case=insensitive. But use these combinations, Mk, Mkh, Mg, Mgh etc. You cant go wrong here :) It is only in the (p, ph, b, bh, m) entad you will have problems. Also some overlap in (y,r,l). Otherwise we are fine.
I saved the (massive) results on my local disk. Did some (programmatic) filtering and assuming I did it right, there are 3076 words in that dictionary which use M (in those contexts).
There will easily more than 10,000 words. More towards 20K I estimate.
Of course the scheme that cologne-sanskrit dictionary uses is different and so some more "translation" is needed to our scheme (which can increase the # of chars). This is no big deal.
The transliteration scheme used there is the H-K convention(Harvard-Kyoto). I had ealer in a post given a step-by-step procedure to convert H-K to our scheme. I think in this thread itself. Check that
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#99

Post by arunk » 04 Feb 2007, 00:39

drshrikaanth wrote:I searched on Cologne too but used a different combination. Your combinations like am , eM will come up with what we dont need as well as you have roghtly pointed out that it is case=insensitive. But use these combinations, Mk, Mkh, Mg, Mgh etc. You cant go wrong here :)
Filtering out non-M was no big deal. There are several utilities on unix like systems (e.g. my mac) that makes this very easy.
There will easily more than 10,000 words. More towards 20K I estimate.
I guess then I did something wrong in my steps. The total #of words (i.e. case-insensitive) was 51618. So it did match a lot. Still doesnt add up, either the dictionary does not include most of it, or my search criteria was wrong (it is quite difficult to screw-up the filter step - a very simple command), or i didnt save all the results.

Arun
Last edited by arunk on 04 Feb 2007, 00:40, edited 1 time in total.
0 x

arunk
Posts: 3424
Joined: 07 Feb 2010, 21:41
x 2

#100

Post by arunk » 04 Feb 2007, 05:42

after exchanging some emails with drs, we solved a "mystery" as to why my searches werent getting all the words. Anyway the entire list is about 7400, which i think is still manageable (but need to confirm).

Arun
Last edited by arunk on 04 Feb 2007, 21:03, edited 1 time in total.
0 x

Post Reply