After a gap of almost two years, I am happy to announce the second official release (version 0.2.0) of Anubadok a free (as in freedom) machine translation system for English to Bengali. Anubadok is written in Perl and it uses Penn Treebank annotation system for natural language processing. To run Anubadok 0.2.0, you need to have Part-of-Speech tagger GPoSTTL installed in your system. The Anubadok system can be accessed online using the interface Anubadok Online run by Ankur.
First official release (ver. 0.1) of Anubadok was an experimental release which mainly served as a proof-of-concept for an open-source English to Bengali machine translation system.
With the release of version 0.2.0, I am glad to upgrade its official tag from “an experimental software” to “a software under development” with clear-and-specific implementation targets. However given the nature of the project, there are no specific time-frames for future releases. Further, given machine translation is considered an open research topic in Computational Linguistic, you should expect to see some surprises 😉 even for well implemented situations. Specially, if you are comparing results of machine translations with human translations.
In English, there are four types of sentences: Declarative, Imperative, Interrogative and Exclamatory. These sentence types further fall into four basic sentence type: Simple, Compound, Complex and Compound-Complex.
The table below gives approximate status of implementation for each sentence type in the current release and inversely it gives the targets for future implementations.
|Compound – Complex||N||N||N||N|
W: Well implemented
M: Moderately implemented
N: Not/Not-well implemented
Anubadok does not yet have any code to handle Complex or Compound-Complex sentences, not even moderately. This is where next push for development is needed.
Few other salient features of this release:
- The execution method of Anubadok system has been re-written. Anubadok itself has been implemented as Perl module. This means one can now access Anubadok in a Perl program directly by including Anubadok libraries (Perl modules) or in any other program by using appropriate Perl module wrapper.
- The notion of “testsuites” has been introduced for Anubadok. For a given English sentence, it compares a machine translated sentence with the expected Bengali sentence. This is quite an useful tool while adding new features or doing some experimentations as it would ensure that already implemented algorithm are not affected.
- Anubadok packaging has been completely reorganized to ensure that it has the basic structure of a standard Perl package. Consequently, Anubadok can be installed following the method of standard Perl module installation.
- Anubadok-0.2.0 comes with an updated dictionary having 15K+ entries in its database. This is almost double the number of entries it had in 0.1 release. Credit for this goes to all the contributors of Ankur English to Bengali dictionary project. Anubadok’s dictionary are now updated regularly using database dumps of Ankur E2B dictionary.
- Anubadok has now moved to its new website hosted by SourceForge.
Latest source codes of Anubadok can be downloaded from the “trunk” branch of its SVN repository.
- Anubadok Online, the online interface to Anubadok system, has been upgraded substantially. It runs directly using SVN version of Anubadok engine. User contributed new entries though this interface are submitted automatically to Ankur E2B dictionary project.
- A brief document is now available for download as a PDF file from its website. It describes the internal working and the algorithm used by Anubadok system by considering specific example sentence.