PROJECT ON Sabdabodha project
Academy of Sanskrit Research, Melkote, now attempts to overcome this situation by using the concept of several modules for the morphological analysis of different grammatical expressions in the Intermediary language – Sanskrit, a parser successfully developed by the Academy under a specific project titled ‘Sabdabodha” (funded by the then Department of Electronics, GOI, New Delhi) in the field of Semantics and Syntax, as summoned here under: (Please refer annexure for feature of each of these parsers.)
Subanta Generator / Analyzer
Tinenta Generator / Analyzer
Kridanta Generator / Analyzer
So far the Academy, which is deeply involved in the research, works of Natural Language Processing (NLP) of Sanskrit in the field of Semantics has rigorously tried to make the semantic analysis of simple sentences taking the Base Level meaning expressions at the primary level. In other words, the semantic analysis so for done by the Academy has concerned itself with Literal meaning and with denotation with a vocabulary limited to the specific roots indented in the database. Yet, much of every day language relies for its communicative force on sound, symbolism, metaphor and Connotation. The first two are relevant to creation and interpretation of many novel expressions and the last two, social and stylistic aspects of the meaning. This is just to show that there is a long way to go in the field of Semantics. Making the data base an open ended one, is also considered necessary to avoid it into becoming a stagnant pool.
Salient Features of the parsers developed by the Academy under the Project titled ‘
Sabdhabodha’ Indian best opportunity to a leap in machine translation, Anuvada.
Subanta (Noun) Generator
Nouns are the substructure on which the building of language stands. Nouns are used in various cases depending upon the needs of the message that need be conveyed. The first step towards developing a parses was begun by developing this, an important constituent in the whole process. This parser has:
- Lexicon consisting of the words of Amarakosa and the other lexicons of about 26000 nouns, with case inflicted forms of about 624000.
- The capacity to generate all forms in 7 + 1 vibhakti-s (cases) and 3 vachana-s (numbers) 24 forms or any opted form.
- The capacity to generate multiple forms also, wherever available
- Analysis the forms of any given noun from Amarakosa.
- Identifies the Anta (Ending), Linga (Gender) and Pratipadika (Base).
- Displays the multiple identification
- Lexicon consisting of 600 important roots collected from Dr.Kannan’s Dhaturupakosa.
- Handles Kevalatinanta (ordinary verbal forms), Nijanta (casusative and sannanta (desiderative) models.
- Kartari (active), Karmani (passive) and Bhave (imoersonal) are the voices handled.
- Handles 10 Lakaras – 6 Tenses and 4 Moods.
- Generates all forms (9 in number) – 3 Purusa (person) and 3 Vacana (number) or any one of the opted form (inflicted forms – 216000).
- Analyzes the forms of any given root from the lexicon.
- Identifies the Gana, Padi, Karma, It, Mode, Voice, Lakara, Purusa and Vacana.
- Displays the multiple identifications.
- Generates 11 types of Krdanta forms, and generates them in 7 + 1 cases and 3 numbers or in any selected case and number (with case inflicted forms up to 950000).
- Krdantas handled are Tavya, Aniyar and Ya in Vidhyartha, Kta, and Ktavatu in Bhutartha, Star and Sanach in Vartamana, Syasatr and Saysanac in Bhavisyat and Tumun and Ktrva in Krdavyaya.
- Analyzes the forms of any given Krdanta form based on the 600 selected roots.
- Gives details of the root. Anta (ending), linga (gender), Pratipadika (base) and type of Krdanta.
- Gives details of the multiple forms.
Apart from the above, Specific modules have also been developed for caling with the following too, but for a limited number of cases:
- Euphonic combinations
- Basics of Adjectives
- Some important Taddithas (secondary suffixes) and
- Some important Samasas (Copmounds)
Note: Despite taking into account the above, all facets of the Sanskrit Language cannot be deemed to have been dealt with. However, with this expertise, the academy has been able to make a preliminary study of all above aspects of more than a couple of Indian languages and thus is well equipped to apply this expertise to achieve a ‘Machine Translation’ package from Sanskrit to Kannada and other Indian language and vice versa.
Sequence of proposed processes of development of Machine Translation is summaries here under:
Sequence of processes
- ‘Source Language’ material presented to the Machine as input in the form of a sequence.
- The machine recognizes the grammatical form of words understands their syntactic functions.
- The translator i.e. the computer arrives as the syntactic expression contained in the ‘Source Language’ material.
- Arrives at the Tatparyartha intended by the ‘Source Language’ material in the form of the Sabdabodha
With this the recognition of the source language by the computer is completed. The further process of translation is as under:
Sl.No. Sequence of processes
5. Sabdabodha from the ‘Source Language’ material gets converted into Vivaksha in the ‘Target Language’
- Vivaksha derived from sabdabodha becomes input for the speech production in the ‘Target Language’
- Syntactic expression is formulated in the Target Language.
- Selection of proper grammatical forms of words and arranging them in the required order as required by ‘Target language’ structure.
- Actual sentence in the ‘Target Language’ is produced and presented