Single Transliteration Scheme for all CM Languages - Part 2

drshrikaanth · Post by **drshrikaanth** » 03 Feb 2007, 01:07

See post 127 and around it. Same logic holds here as well. There may have been another discussion as well about this. You maybe abe to fiind it

http://rasikas.org/forums/viewtopic.php?pid=27698#p27698

arunk · Post by **arunk** » 03 Feb 2007, 02:01

its possible we discussed this but the case of "M" at end is what I remembered and it is implemented (i.e. without anuswara for sanskrit).

I did the anuswara for panca etc. based on that book i was talking about. But i also vaguely remember seeing other sources like http://carnatica.net/lyrics/ooth9.pdf, where anuswara is used at the end (!) but not in nca/cha etc. (2nd krithi). There doesnt seem any consistency - atleast thats what I thought.

Once I put sanskrit logic, I had asked people several times to point out any errors so that i can fix the logic after I put it up. I didnt hear a peep. Perhaps they assumed i wasnt listening or incapable of listening

Arun

arunk · Post by **arunk** » 03 Feb 2007, 02:26

never mind - i think it is easy to make it an option. The default would be no anuswara in the middle or at end, but people can change it if they want. The second would handle most of hindi except for the urdu influenced words.

Arun

drshrikaanth · Post by **drshrikaanth** » 03 Feb 2007, 02:27

arunk wrote:But i also vaguely remember seeing other sources like http://carnatica.net/lyrics/ooth9.pdf, where anuswara is used at the end (!) but not in nca/cha etc. (2nd krithi). There doesnt seem any consistency - atleast thats what I thought.

You still have doubts about "m" occurring at the end!:rolleyes:

Once I put sanskrit logic, I had asked people several times to point out any errors so that i can fix the logic after I put it up. I didnt hear a peep.

Wish I had all the time in the world(And no job) to answer you queries

Its the same logic in midde as well. "Nearly" Always show the vyanjanas explicity even when the conjunct has a nasal consonant as the 1st half. "Nearly" beacusem there are some exceptions like samyukta where "sam" is a prefix to an otherwise independent word(yukta in this case). This means samga will not be saMga. saMsarga, saMyukta, saMtOSha, saMgIta, saMgAna etc yes but Not saMga, saMkaTa, etc. Note here that tOSha, yukta, gIta and gAna are independent words witha saM suffix but Noy ga, kaTa,

Likewise words with "kAra" suffix like ahaMkAra, jhaMkAra will feature the bindu only, not the consonant itself. I am not sure if there are exceptions to this. My thinking tells me other "suffixes" like cAra will also behave similarly. Basically, if they are one unit and form an integral part of the word to make sense, use consonant. If added as suffix or prefix, use anuswAra/bindu.

arunk · Post by **arunk** » 03 Feb 2007, 02:27

ramakriya - did the export to dokuwiki feature help?

Thanks
Arun

arunk · Post by **arunk** » 03 Feb 2007, 02:45

drshrikaanth wrote:You still have doubts about "m" occurring at the end!:rolleyes:

i guess i do now

. Doesnt that pdf file use anuswara at the end (e.g. santatam aham)? My point is whatever the correct rules are, in practice (i am guessing owing to hindi's popularity), there are variations (?)

Please also check my other post in languages thread in response to rules you mention

Arun

drshrikaanth · Post by **drshrikaanth** » 03 Feb 2007, 02:50

Arun
We have dissussed at length about m/M use in the end. We aso discussed the reasons for variations- not necessarily hindi's popuarity but because of the influence of spelling in one's mother tongue. If you still have doubts, Iam not responsible for it. I dont have doubts in this matter at least.

arunk · Post by **arunk** » 03 Feb 2007, 02:51

DRS said the following in another thread regarding rules as to when anuswara appears in sanskrit in the middle of words:

Its the same logic in midde as well. "Nearly" Always show the vyanjanas explicity even when the conjunct has a nasal consonant as the 1st half. "Nearly" beacusem there are some exceptions like samyukta where "sam" is a prefix to an otherwise independent word(yukta in this case). This means samga will not be saMga. saMsarga, saMyukta, saMtOSha, saMgIta, saMgAna etc yes but Not saMga, saMkaTa, etc. Note here that tOSha, yukta, gIta and gAna are independent words witha saM suffix but Noy ga, kaTa

Likewise words with "kAra" suffix like ahaMkAra, jhaMkAra will feature the bindu only, not the consonant itself. I am not sure if there are exceptions to this. My thinking tells me other "suffixes" like cAra will also behave similarly. Basically, if they are one unit and form an integral part of the word to make sense, use consonant. If added as suffix or prefix, use anuswAra/bindu.

Unless I am mistaken, things got a bit complicated now.

What this tells me is that for my logic, it would be best if I force anuswara for sanskrit, only in the middle and only if explicitly specified as M and let people specify it judiciously (i.e. it would be too difficult for the logic to know which is one unit vs. suffix etc)..

But for languages like kannada, telugu when preceding k(h)a, g(h)a, c(h)a, j(h)a (and others), the anuswara always figures right? So this would mean that specifying M in the middle for stuff should be used judiciously even when entering for other languages - should be used ONLY if it is an anuswara in sanskrit, otherwise sanskrit rendition would be screwed up. This is certainly a big wrench since a person entering telugu or kannada, and even worse tamil may have no idea about these rules in sanskrit.

This also means that for such words "phonetically better variant in english" would be wrong and cannot be used (i.e. never sangIta, always saMgIta)

This allows me to ask a question which i have add ever since i was exposed to it: What is the purpose behind the answara? It seems they represent some other sound for which a character does exist in the script? Why then not use the character itself?

(or may be i should retire to a "less than perfect" sanskrit rendition - i.e. always use anuswara or never use anuswara)

Arun

ramakriya · Post by **ramakriya** » 03 Feb 2007, 02:51

arun - I have not experimented with the export feature yet.

I found one problem with the variables. Or I may not have understood how to use it

1. If I type as caraNam -then the kannaDa transliteration should show it as caraNa. Right? But that is not happening. It does show up as caraNam, with a bindu at the end

2. The key word is not recognized as a variable at all sometimes - even though the spelling is correct.

-Ramakriya

arunk · Post by **arunk** » 03 Feb 2007, 02:54

variables are experimental.

#1. It shows up as caraNam because I seemed to have (incorrectly) defined it as such. I need some help in knowing this (for all). I know drs gave the kannada equivalents, i need to go and incorporate them.

#2: Even when you click on a word and hit the "$" button? If so, can you give an example? If it is on "convert all" (i.e. 3 arrows pointin to $ button), then it is on purpose. I didnt want to mistakenly convert words in the sAhitya portion and thus am extra careful in looking for certain patterns.

Arun

arunk · Post by **arunk** » 03 Feb 2007, 02:59

did i say you are wrong or that i was somehow right so as to try to put doubts in your mind?

Jeez!

Arun

drshrikaanth · Post by **drshrikaanth** » 03 Feb 2007, 03:20

Did I say you did that to me! Jeez!;)

arunk · Post by **arunk** » 03 Feb 2007, 03:28

Unless my assumptions/conclusions are yet again wrong, i am thinking of doing the following
1. Change the word "Sanskrit" to "Devanagiri" as it appears on the editor. This is mainly to indicate that generate script may not be considered proper Sanskrit as all written rules are not followed
2. Have 2 anuswara options for devanagiri:
(i) Always generate (so more like Hindi)
(ii) Never generate (closer to Sanskrit but not that close=> words like sangIta would be all messed up).

I dont know if this salvages the situation enough. Also I dont know if option 2(ii) is that useful as it would be a mixed bag (neither hindi like nor sanskrit like)

Suggestions?

Arun

jayaram · Post by **jayaram** » 03 Feb 2007, 03:34

Arun - the bindu at the end is how I know, based on my Sanskrit classes in school and college. Usage of M seems to be a variation, sometimes for aesthetics. If you read thru No.2 kriti (vAnchasi yadi) in the pdf file, you will find occurrence of both m and M for the word kuSalam/kuSalaM. To make it simple for yourself, I would suggest you go with the bindu version.

Btw, the way they have written rAgaM and tAlaM is jarring, at least to my eyes!

Also, 'ambika' (as in kamalAmbika) is not written with bindu, the half-consonant is used. At least this is the way I have read and written all these years.

jayaram · Post by **jayaram** » 03 Feb 2007, 03:37

Also you will note 'vAnchasi' is written without bindu, but with the half-consonant.

jayaram · Post by **jayaram** » 03 Feb 2007, 03:46

And DRS is correct in saying that one's mother tongue has an influence on how these are written in Sanskrit. Coming from a Kerala background, I was taught to use the half-consonants instead of the bindu in most cases (within words). Malayalam follows similar rules.

The Namboodiris of Kerala are reputed to have the 'most authentic' knowledge of Sanskrit, so obviously I had assumed we were taught the most accurate version!

(finally, perhaps we should move this language discussion to where it belongs - arun's thread!
Let OP-ji rest in peace!)

arunk · Post by **arunk** » 03 Feb 2007, 05:10

I found this link which talks about anuswaras in context of sandhi rules:
http://www.sanskrit-sanscrito.com.ar/en ... rules.html. It talks about when "m" at end of word becomes anuswara and when it does not. Basically if it is followed by a word that begins with a consonant.

This seems to be followed here: http://sanskrit.safire.com/pdf/DURGA700.pdf, where you have cases where "m" at end is rendered as consonant, and also cases where you have it as anuswara. You see it at an anuswara at end a "line"/"sentence" (so no word to follow and hence no consonant to follow) i.e. before a | or ||. For example, the title itself, first line on the right side, and also several other places. You see the bindu used "within a line/sentence". The cases of bindu inside words is much much rarer (but is there on page 6 - "saMhati..."(?), also on page 12 - saMyugE (?)), and that is of course what drs said.

Of course I dont know how official/authentic these are but atleast I wanted to see some reasoning behind the "mixture of bindu and no bindu cases" - and I see it now.

Now the rule for end of word within a sentence and followed by a consonant is something that is possible to program.

The trouble is when bindu occurs in the middle depends on interpretation of words etc. and not possible to program without an elaborate setup with look ups to dictionary and such.

So I think we are still down to either

(a) use it like telugu and kannada, and hindi. (i.e. always use it).
(b) or not use it.
(c): use it only at end (i.e. following end of word rule above) but never in the middle.

Of course all of them are not correct for Sanskrit, but I am guessing/hoping that

(a) would be ok for people to read (as they may apply their native language rules).
(c) looks like closer to sanskrit and ma....y be passable although it will definitely messup words that drs mentioned.
if (c) is done at all, (b) is useless

Can people pl. chime in and give me advice on whether (a) is ok, and whether i should even bother with (c)?

Thanks
Arun

jayaram · Post by **jayaram** » 03 Feb 2007, 15:25

Arun - I get the feeling if you go with option (a) for Devanagari, we may do the same for Malayalam! And it does look a bit weird if this option is used in Malayalam, at least for old-timers like myself.

My own take on this:
1. ok to use bindu across the board for the endings. as i said earlier, the M ending is for aesthetics, don't believe there's a rigid rule for this.
2. use half-consonant within a word using the appropriate rules - tough to implement, I agree, but at least this can be done for certain often-occurring words, perhaps you could look thru Dikshitar kritis for words such as 'ambika': http://www.rogepost.com/n/4405894335

arunk · Post by **arunk** » 03 Feb 2007, 20:08

yes jayaram it would be less than ideal for malayalam - that is not good either.

I will try the more difficult approach. For sanskrit (and malayalam too?), as drs indicated, the # of cases which DONT employ bindu in the middle of the word outnumber the cases where it does. So I could build up a database of known words that do employ bindu and use smart matching. So by default no bindu except for these known words. This will handle amba etc. correctly by default. It will also handle sangIta, santOsha (assuming they are in database).

On top of that, it may be possible to introduce a feature in the editor (not the scheme), to force use of bindu in sanskrit/malayalam for a specific word. So with a combination of this and the database of known words, we may be able to get things right. Although unless the database of known words is good (so that it takes care of almost all common cases of occurences in kriti), it would be a pain for the user to have to spoon feed the editor.

I will look into this.

Thanks
Arun

drshrikaanth · Post by **drshrikaanth** » 03 Feb 2007, 20:22

arunk wrote:For sanskrit (and malayalam too?), as drs indicated, the # of cases which DONT employ bindu in the middle of the word outnumber the cases where it does. So I could build up a database of known words that do employ bindu and use smart matching.

Forget about doing this Arun as the list of words will stretch to several thousands! I just checked. The way out would be to link up with a pre-existing onine dictionary and match with that spelling.

arunk · Post by **arunk** » 03 Feb 2007, 21:54

I was afraid of that. It may be possible to interface with a dictionary (or build our own which can be interfaced more easily). Of course more work

but not herculean

Arun

arunk · Post by **arunk** » 03 Feb 2007, 22:18

i did multiple searches on the cologne-sanskrit dictionary for occurence of aM, eM, iM, uM, oM (i think their transl. scheme use M only in right places - pl. confirm). The search is case-insensitive so it matches stuff we dont need. So some filtering was needed afterwards.

I saved the (massive) results on my local disk. Did some (programmatic) filtering and assuming I did it right, there are 3076 words in that dictionary which use M (in those contexts). The cumulative # of bytes for all these words is about 34K. Not that bad actually that loading it into memory with editor is not fully ruled out.

Of course the scheme that cologne-sanskrit dictionary uses is different and so some more "translation" is needed to our scheme (which can increase the # of chars). This is no big deal.

Drs - pl. let me if it is ok for me to send you the results to see if he list of matched words make sense (i.e. whether i got a good representative list).

Arun

drshrikaanth · Post by **drshrikaanth** » 04 Feb 2007, 00:24

arunk wrote:i did multiple searches on the cologne-sanskrit dictionary for occurence of aM, eM, iM, uM, oM (i think their transl. scheme use M only in right places - pl. confirm). The search is case-insensitive so it matches stuff we dont need. So some filtering was needed afterwards.

I searched on Cologne too but used a different combination. Your combinations like am , eM will come up with what we dont need as well as you have roghtly pointed out that it is case=insensitive. But use these combinations, Mk, Mkh, Mg, Mgh etc. You cant go wrong here

It is only in the (p, ph, b, bh, m) entad you will have problems. Also some overlap in (y,r,l). Otherwise we are fine.

I saved the (massive) results on my local disk. Did some (programmatic) filtering and assuming I did it right, there are 3076 words in that dictionary which use M (in those contexts).

There will easily more than 10,000 words. More towards 20K I estimate.

Of course the scheme that cologne-sanskrit dictionary uses is different and so some more "translation" is needed to our scheme (which can increase the # of chars). This is no big deal.

The transliteration scheme used there is the H-K convention(Harvard-Kyoto). I had ealer in a post given a step-by-step procedure to convert H-K to our scheme. I think in this thread itself. Check that

arunk · Post by **arunk** » 04 Feb 2007, 00:39

drshrikaanth wrote:I searched on Cologne too but used a different combination. Your combinations like am , eM will come up with what we dont need as well as you have roghtly pointed out that it is case=insensitive. But use these combinations, Mk, Mkh, Mg, Mgh etc. You cant go wrong here

Filtering out non-M was no big deal. There are several utilities on unix like systems (e.g. my mac) that makes this very easy.

There will easily more than 10,000 words. More towards 20K I estimate.

I guess then I did something wrong in my steps. The total #of words (i.e. case-insensitive) was 51618. So it did match a lot. Still doesnt add up, either the dictionary does not include most of it, or my search criteria was wrong (it is quite difficult to screw-up the filter step - a very simple command), or i didnt save all the results.

Arun

arunk · Post by **arunk** » 04 Feb 2007, 05:42

after exchanging some emails with drs, we solved a "mystery" as to why my searches werent getting all the words. Anyway the entire list is about 7400, which i think is still manageable (but need to confirm).

Arun

vasya10 · Post by **vasya10** » 06 Feb 2007, 05:28

Arun,

One useful feature could be, and may be you already thought about it, is export the transliterated data as a pdf.

arunk · Post by **arunk** » 06 Feb 2007, 06:37

vasya,

this is doable now itself. All you need to do is get a pdf print driver which allows you to save what you would normally send to a printer as a PDF file (e.g google for pdf995). With this then from the Printable View, you just choose Print options on your browser, and instead of sending to your printer, choose the pdf printer.

Arun

arunk · Post by **arunk** » 07 Feb 2007, 01:25

i have tested with looking up a sanskrit word database for using anuswara, and it works. However, there is a significant problem: The input text (as in sAhitya) can have many words (that can be a potential match in the dictionary) combined into single words in english. Note also that when words are combined, they morph as per rules of language.

So unless language rules are applied (which is very difficult), it is impossible to reliably figure out which words in the input do correspond to words in dictionary (i.e. those that require anuswara in sanskrit).

For example, if sangIta comes as such, I can match against saMgIta (with some smart logic). I can even match sangItam (add m if word ends with a and try for a match), but what if the word is karnAtakasangItam in one word (or something else)? "sangIta" can occur anywhere in an input word. Now a solution could be match it anywhere in an input word, but I see an entry for aMsa - and does it mean amsa anywhere should match? . I am thinking not.

So while the dictionary would help, i may not help that much. Of course, i can introduce a feature, where use highlights some text and explicitly asks for a match in database - but that means only a user who knows sanskrit well will be able to provide the correct input that will translate to all languages

:(. I guess that is going to be our achilles heel.

We are so close to our solution, yet there seems to be an insurmountable barrier

.

Any suggestions?

Thanks
Arun

arunk · Post by **arunk** » 07 Feb 2007, 02:43

arunk wrote:but that means only a user who knows sanskrit well will be able to provide the correct input that will translate to all languages :(.

May be this isnt a big deal. If the input represents a sanskrit krithi, then it is not an unfair expectation for the user to be aware of where anuswara figures?

But if the krithi is non-sanskrit, and the user entering the krithi dont know sanskrit rules - how would it be if certain words (in a language other than sanskrit), that happen to be sanskrit based get rendered in sanskrit with no anuswara?

For example, if the word like sangItamu (as entered) is in a telugu krithi, but as rendered in sanskrit say doesn appear with anuswara - is that too bad?

Arun

vasya10 · Post by **vasya10** » 07 Feb 2007, 02:59

(May be im simplifying things a bit, because I didnt understand all the discussions)

For anusvara logic, isnt it enough just to follow the pANinI's rule "anusvArasya yayi parasavarNah" ? Or is the issue beyond that ?

vasya10 · Post by **vasya10** » 07 Feb 2007, 03:02

Just want to clarify what I meant -- if you just encode the 14 sutras of pANini into the database, you should be able to derive anusvara logic.

arunk · Post by **arunk** » 07 Feb 2007, 03:18

vasya,

yes but that is easier said than done

. It isnt worth it for the scale of our use.

Arun

arunk · Post by **arunk** » 07 Feb 2007, 21:57

Please let me know if this is ok.. Drs/ramakriya/jayaram - in particular i am going to bother you specifically

. Feedback from others are also very welcome

After racking my brains over this more, I have an alternative proposal which may be the best given our constraints.

For kannada and telugu, there are contexts which certain combinations ALWAYS use anuswara i.e. #n[kg], ~n[cj], n[td], N[TD], m[pb]. Note that for making the input easier to read, for the first two cases, the scheme allows you just n instead of #n, ~n, i.e. pankaja, panca is ok. Also, currently, you would simply use M instead of #n/~n/n/m in all these cases. But as i have noted many times, except in the last case where M represents m, it is not recommended to use this as it not as phonetic, and also can lead to misleading pronunciation for people who do not know the language. Besides, one of the aims of the scheme was to avoid script specific artifacts wherever possible, and this is definitely one place where it can be avoided for these 2 languages.

However, note that for kannada and telugu, there are contexts where certain combinations do NOT ALWAYS use anuswara. Example is mya, mSa etc. We decided here that user would need to explicitly the anuswara (raMya). I think for kannada and telugu, these contexts only have the anuswara implying "m" sound (and not #n/~n/N - right?)

I am thinking sanskrit should also follow the same rule but obviously in more contexts because of use of anuswara in the language. IN THE MIDDLE of a word (end of words - see below), whenever anuswara is required, it needs to e explicitly specified - else no anuswara would be rendered. Of course as per current scheme, this would mean saMgIta, saMtOsha etc. which again is not phonetic, and can mislead pronounciation for some people.

i think malayalam can follow same rule (but contexts where anuswara figures would be the least of the 4 languages).

A more phonetic explicit anuswara specifier for use inside words
But what if we adopt a different more phonetically fair specifier for anuswara in places it represents #n, ~n, n and N sound? For example, one that uses n/N but with a prefix. I propose the back-tick character ` - so you have sa`ngIta, sa`ntOsha. The advantage here is the explicit anuswara specification is still phonetically quite fair - sa`ngIta is much better than saMgIta. I find this a whole lot more desirable than M in such cases. But in contexts where anuswara represents the "m" sound (ahamkAra), we still use M as ahaMkAra. So we have 3 representations for explicit anuswara: `n, `N and M.

(note: we could choose a different character than backtick - only constraint being it should not be too "visible" and intrusive that it becomes an eyesore. We could also use it as a suffix - san`gIta as opposed to sa`ngIta - this may be better representation of the internal structure of the word?

anuswara at end of words for sanskrit
This is tricky in sanskrit as it depends on end of sentence etc. I can detect many cases in logic and apply but i dont think in a reliable way - which means a user that cares need to have control. So I am just going to have three options for sanskrit:
(a) always use anuswaras end of words (regardless of m/M)
(b) never use anuswaras at end of words (regardless of m/M)
(c) use anuswaras only when M is specified explicitly at end of words. This can allow a meticulous user to get the rendition to use anuswaras (at word-endings) in middle of sentences, and not at end of sentence - but its up to the user.

Conclusion:
I think all this basically puts the responsibility on the user to know when sanskrit requires anuswaras and when it doesnt. I think this is ok, the editor is not involved in "teaching how to write sanskrit"

Besides we were ok with that rule for "my" combinations in kannada and telugu. I dont know why I forgot that

Rules for specifying Anuswara
So based on this here are some concise rules i can think of:
(a) tamizh krithis: no need to specifify anuswara ever as it doesnt make sense for the language. When this gets transl. to kannada/telugu, anuswara would be used in middle of words for #n[kg], ~n[cj], n[td], N[TD], m[pb], and also when m is at end of word. When a tamizh krithi gets transl to sanskrit/malayalam, sanskrit-based words may not appear ideally, as they wont have anuswara. This may be ok as, while the word is sanskrit-based, one could argue it is still in the context of a tamizh krithi and thus non-sanskrit, and sanskrit rules for anuswara may not apply. Of course, a person who does care about sanskrit rendition, can introduce explicit anuswara specifiers even in tamizh krithi (e.g. sa`ngIta)
(b) kannada,telugu krithis:
(i) Should not explicitly specifiy anuswara in contexts where it represents ~n, #n, n, N, M (i.e. use panca/pa~nca, Sankara/Sa#nkara, pANDava, amba).
(ii) Should not explicitly specify anuswara for end of words as it always imply anuswara. Use "m" instead
(iii) Must specify in contexts which do not automatically imply anuswara - e.g. raMya.
So basically specify anuswara only when it is not automatically implied. Note again, that this means that when the krithi gets transl. to sanskrit/malayalam, sanskrit-words may not appear ideally. Depending on user's preference then explicit anuswara may be specified for (i) and (ii), but as `n, `N where it represents #n, ~n, n, N, and M ONLY when it represents m sound.
(c) sanskrit krithis: Must specify anuswaras but only where they occur. Again specify `n, `N where it represents #n, ~n, n, N, and M ONLY when it represents m sound. When a sanskrit krithi gets translated to kannada/telugu, it *may* force anuswaras in places which normally are not there? But I am not sure.
(b) malayalam krithis: Must specify anuswaras but only where they occur. I think anuswara would figure and hence need be specified only in cases where it represents "m" sound (like raMya)? If so, the editor may ignore anuswara specifier in places where it represents #n, ~n, n and N sound? (and use actual characters) - so sa#ngIta/sangIta/sa`ngIta would all be rendered as sa#ngIta.

Thanks
Arun

arunk · Post by **arunk** » 08 Feb 2007, 02:24

so nobody gives a hoot

??

Your silence will be conveniently interpreted as rousing approval

!

I will implement these and may be when you see it in action, you may be forthcoming in your approval/disapproval!

Arun

Suji Ram · Post by **Suji Ram** » 08 Feb 2007, 02:26

arunk wrote:vasya,

this is doable now itself. All you need to do is get a pdf print driver which allows you to save what you would normally send to a printer as a PDF file (e.g google for pdf995). With this then from the Printable View, you just choose Print options on your browser, and instead of sending to your printer, choose the pdf printer.

Arun

Arun
I downloaded the free version and tried. But all I can get is a pdf file without my work. ??
The way I am doing it is -right click on printable view,print target, and choose pdf995 and hit Ok. It asks for file name to save as pdf. A screen appears asking me to upgrade or continue with sponsor page..... The outcome is a pdf file of the sponsor page.
Help Please

ramakriya · Post by **ramakriya** » 08 Feb 2007, 03:15

arunk

have been tied up all day .. Hope to completely read your post and send my feedback by the end of the day..

-Ramakriya

ramakriya · Post by **ramakriya** » 08 Feb 2007, 03:18

Suji Ram wrote:Arun
I downloaded the free version and tried. But all I can get is a pdf file without my work. ??
The way I am doing it is -right click on printable view,print target, and choose pdf995 and hit Ok. It asks for file name to save as pdf. A screen appears asking me to upgrade or continue with sponsor page..... The outcome is a pdf file of the sponsor page.
Help Please

Try using primopdf or pdfcreator; I have had better results with these two. The former has some problems when converting word documents with certain formatting. But should not be a problem for normal use. I have not seen any issues with pdfcreator.

www.primopdf.com

http://sourceforge.net/projects/pdfcreator/

-Ramakriya

arunk · Post by **arunk** » 08 Feb 2007, 03:35

pdfcreator works fine too (although the free version i think it puts something in the footer). Its got a slick interface.

pdf995 is what I use. Its not the greatest interface, and it does bring up the browser to throw up an innocuous of ad of themselves - it is NOT adware. Its a small price to pay for something free and which doesnt put up stuff in the footer. (but if there are other better free tools which dont put up stuff in the footer, i say ditch this one).

suji - i dont know why you got that. I have used it many times and have not seen the problem you are seeing. Perhaps you let it open the (sponsor-ad) page and THEN clicked ok on the dialog where it asks for file?

Arun

arunk · Post by **arunk** » 08 Feb 2007, 03:36

thanks ramakriya.

Suji Ram · Post by **Suji Ram** » 08 Feb 2007, 04:40

arunk wrote:suji - i dont know why you got that. I have used it many times and have not seen the problem you are seeing. Perhaps you let it open the (sponsor-ad) page and THEN clicked ok on the dialog where it asks for file?

Arun

Thanks,

got it now ... was doing something dumb.

arunk · Post by **arunk** » 08 Feb 2007, 23:19

ramakriya,

did you get a chance to look at it? If not, I can post an update which has changes adhering to above. I am ready to post it.

BTW, coming to think of it is not a major change to the scheme. In essence, it involves only things:

1. instead of always M for anuswara in EVERY context, use `n or`N for anuswara when the underlying sound is not ma. So you use `n when it represents #n, ~n and n, and `N when it represents N sound.

2. Try to avoid specifying M unless absolutely needed. This is not a new rule.

Thanks
Arun

ramakriya · Post by **ramakriya** » 08 Feb 2007, 23:32

Finally, some comments -

arunk wrote:For kannada and telugu, there are contexts which certain combinations ALWAYS use anuswara i.e. #n[kg], ~n[cj], n[td], N[TD], m[pb]. Note that for making the input easier to read, for the first two cases, the scheme allows you just n instead of #n, ~n, i.e. pankaja, panca is ok. Also, currently, you would simply use M instead of #n/~n/n/m in all these cases.

Correct

arunk wrote:But as i have noted many times, except in the last case where M represents m, it is not recommended to use this as it not as phonetic, and also can lead to misleading pronunciation for people who do not know the language. Besides, one of the aims of the scheme was to avoid script specific artifacts wherever possible, and this is definitely one place where it can be avoided for these 2 languages.

That is fine too.

arunk wrote:However, note that for kannada and telugu, there are contexts where certain combinations do NOT ALWAYS use anuswara. Example is mya, mSa etc. We decided here that user would need to explicitly the anuswara (raMya). I think for kannada and telugu, these contexts only have the anuswara implying "m" sound (and not #n/~n/N - right?)

In these cases, it is not an anusvAra ; It is the vyanjana 'm' that appears in words like ramya, tAmra, Amla etc.

A anuswara is a representaion of an anunAsika (5th letter of each varga #n, ~n, N, n, M), occuring before a letter which is a non-anunAsika vargIya vyanjana ( k c T t p vargas, leaving out the last letter)

When the letter following an anunAsika is another anunAsika, (like in amnAya, vA#nmaya, amma, haNNu, kenne) or one of the following three avargIya vyanjanas (y r l - as in ramya, tAmra, Amla) then the anunAsika is used as it is in the samyuktAkshara.

(This info may be a repetition of what DRS may have said earlier).

When an anunAiska (normally m) is followed by v, S, Sh, s, h, L -> it will be represented by anusvAra.

arunk wrote:I am thinking sanskrit should also follow the same rule but obviously in more contexts because of use of anuswara in the language. IN THE MIDDLE of a word (end of words - see below), whenever anuswara is required, it needs to e explicitly specified - else no anuswara would be rendered. Of course as per current scheme, this would mean saMgIta, saMtOsha etc. which again is not phonetic, and can mislead pronounciation for some people.

i think malayalam can follow same rule (but contexts where anuswara figures would be the least of the 4 languages).

samskrita and malayALam experts should pitch in. All these discussions have made my head dizzy and now I am doubting myself when to use the bindu in samskrita

A more phonetic explicit anuswara specifier for use inside words

arunk wrote:But what if we adopt a different more phonetically fair specifier for anuswara in places it represents #n, ~n, n and N sound? For example, one that uses n/N but with a prefix. I propose the back-tick character ` - so you have sa`ngIta, sa`ntOsha. The advantage here is the explicit anuswara specification is still phonetically quite fair - sa`ngIta is much better than saMgIta. I find this a whole lot more desirable than M in such cases. But in contexts where anuswara represents the "m" sound (ahamkAra), we still use M as ahaMkAra. So we have 3 representations for explicit anuswara: `n, `N and M.

(note: we could choose a different character than backtick - only constraint being it should not be too "visible" and intrusive that it becomes an eyesore. We could also use it as a suffix - san`gIta as opposed to sa`ngIta - this may be better representation of the internal structure of the word?

I agree that sa`ngIta is better representation than saMgIta even though I have got used to the baraha's standard saMgIta

arunk wrote:anuswara at end of words for sanskrit
This is tricky in sanskrit as it depends on end of sentence etc. I can detect many cases in logic and apply but i dont think in a reliable way - which means a user that cares need to have control. So I am just going to have three options for sanskrit:
(a) always use anuswaras end of words (regardless of m/M)
(b) never use anuswaras at end of words (regardless of m/M)
(c) use anuswaras only when M is specified explicitly at end of words. This can allow a meticulous user to get the rendition to use anuswaras (at word-endings) in middle of sentences, and not at end of sentence - but its up to the user.

Time to dust any samskrita grammar books I have or find one to borrow :/

arunk wrote:Conclusion:
I think all this basically puts the responsibility on the user to know when sanskrit requires anuswaras and when it doesnt. I think this is ok, the editor is not involved in "teaching how to write sanskrit" Besides we were ok with that rule for "my" combinations in kannada and telugu. I dont know why I forgot that

There you go ..

arunk wrote:(b) kannada,telugu krithis:

(iii) Must specify in contexts which do not automatically imply anuswara - e.g. raMya.

This, again, is not an anusvAra, but vyanjana - So the correct representation is ramya; and hey - that is your current implementation too

All this talk about anusvAras reminds me of something funny that happened at the kid's kannada class here; One of the beginner kids told his mother that he could write amma (mother) - The mother was surprised, because in the class the teacher had only covered the vowels and not yet taught any of the vyanjanas let alone samyukAksharas. When asked the kid wrote ಅಂಅ to the surprise of both the teacher and the mother

which exacty sounds like ಅಮ್ಮ

-Ramakriya

arunk · Post by **arunk** » 08 Feb 2007, 23:38

So no occurences of M in kannada EVER when preceding ya,sa,Sa, varieties? Hmm.. I thought someone mentioned otherwise a while ago, but i think i must have been confused it with sanskrit rules.

This does make it easier - no need to specify anuswara in the script for kannada and telugu, since the places it figures are places where there is no ambiguity (it always figures in those contexts).

Arun

drshrikaanth · Post by **drshrikaanth** » 08 Feb 2007, 23:43

arunk wrote:So no occurences of M in kannada EVER when preceding ya,sa,Sa, varieties? Hmm.. I thought someone mentioned otherwise a while ago, but i think i must have been confused it with sanskrit rules.

Your memory serves you right. There are exceptions her and I had mentioned earlier. sometime anuswAra does occur before y,r & l e.g saMyukta, saMyama, saMrakShaNe, saMlApa saMyOjane etc

arunk · Post by **arunk** » 08 Feb 2007, 23:45

Yes drs. I was about to post a link your post long ago

Anyway here it goes: http://www.rasikas.org/forums/viewtopic.php?pid=27669#p27669 (post #115)

Arun

drshrikaanth · Post by **drshrikaanth** » 08 Feb 2007, 23:45

Arun
I suggest you cut and paste these bits of info/rules on MSword/Notepad as and when they come up. Then you dont have to rely on memory or others will not have to repeat what they said earlier. Well-meaning comment. Not having a go at you at all

arunk · Post by **arunk** » 08 Feb 2007, 23:46

yep thanks. I should have done this before

but was lazy and I thought i could use the search facility on the forum. But separate notes is better indeed

Arun

ramakriya · Post by **ramakriya** » 08 Feb 2007, 23:50

arunk wrote:So no occurences of M in kannada EVER when preceding ya,sa,Sa, varieties? Hmm.. I thought someone mentioned otherwise a while ago, but i think i must have been confused it with sanskrit rules.

Arun

Not so fast

I made an error in making a blanket statement - For eg there are words like samyukta , samyOga etc which are written with anusvAra .. This may be influenced by how this these words are written in samskR.ta also. Let me check with a samskR.ta expert (who is also a kannaDa expert) I know of. Better still, if I can make a member of this forum, and make him contribute to the thread

-Ramakriya

arunk · Post by **arunk** » 09 Feb 2007, 01:09

ramakriya,

it shouldnt matter. For all cases where usage of anuswara is not unambiguouosly implied, explicit specifier needs to be specified - this applies to all languages.

I will upload my new version soon

Arun

arunk · Post by **arunk** » 09 Feb 2007, 02:21

Hi folks,

I have uploaded another update. This includes the following enhancements

Enhanced Anuswara support
1. Scheme now accepts `n and `N as alternate explicit specifiers for anuswara in addition to the already existing M. These should be instead of M, in contexts when anuswara represents a non-ma sound (i.e. `n when it represents #n/~n/n, and `N when it represents N).
2. Explicit anuswara specifiers should be use only when necessary depending on the language. This means for tamizh usually never, kannada/telugu only in cases like saMyukta as similar, and for sanskrit only for words that do use anuswara.
3. For sanskrit, there are 4 choices that controls use of anuswara at end of words (that end with "m"). The default is anuswara is used for words in middle of sentence but not at the end. Note that the editor tries to figure this out automatically. From my limited testing, it seems to do a fair job. But if it misses an anuswara, you can use M to specify it explicitly. The other choices are: no anuswara (whether or not M is used at end of words), always use anuswara (for all words ending in m/M), and anuswara only for words ending in M. So if you have rAgaM tALam sa`ngItam (a hypothetical and not exactly correct example), then
(a) default would treat it like rAgaM tALaM sa`ngItam
(b) "No anuswara at word endings" would treat it like rAgam tALam sa`ngItam
(c) "Always use anuswara at word endings" would treat it like rAgaM tALaM sa`ngItaM
(d) "Use anuswara only for words ending in M" would treat it like rAgaM tALam sa`ngItam

Fix text to convert to scheme button (the new button which has a spanner/hammer.
This allows you to tell the editor make some conversions so that input text conforms to scheme, and various other changes (e.g. remove unnecessary anuswara specifiers etc.)

My intention is for people to be able to copy/paste text in other "informal" schemes and be able to easily "fix" it to conform to the unified scheme (e.g. vaataapi gaNapatim => vAtApi gaNapatim, and ashaindhaadum mayiloNDRu => asaindAdum mayilonDRu). Please let me know if you find this useful.

For people who havent seen this before:
The link to the unified transliteration scheme editor is http://arunk.freepgs.com/cmtranslit
The link to the scheme is http://arunk.freepgs.com/cmtranslit/cmt ... cheme.html

Any feedback is most welcome.

Thanks
Arun