The modular approach has been followed to parse the input sentence through CFGs, which invokes different modules for semantic checking and English translation. Each module is designed to carry out the specific task, functioned with the help of grammatical rules and linguistic information.
The most important module in the translation phase is the one which deals with verbs. Since in Urdu, one verb can be replaced with multiple English verbs, so it is the task of this module to determine the best possible verb according to the given sentence. It also determines the type of verb with the help of available data set and logical operations for all of its kinds.
It performs the determination of pronoun as well, which is carried out with the help of leading verb in Urdu sentence.
Another key module of this phase examines the noun phrase. Different parts of speech tags like adjectives, adverbs, pronouns and cardinal numbers are covered as well. Multiple submodules are designed which performs extraction and necessary operation required for these tags. Appropriate prepositions are also set according to the semantic information present in the sentence. Negative, interrogative and imperative sentences are also covered, which requires the functioning of different sub-modules. WH questions are handled as well in the domain of elementary tenses.
If the input sentence contains any WH tag in Urdu, it performs semantic logic to set the who, why, where, what and how accordingly. It is clearly shown that Transtech gives much better and accurate results. It also shows the improvement of the Google machine translation system that has been made during the last two years. We have developed a translator for Roman Urdu to the English language, which provides the best translation with maximum accuracy. Though it was challenging since Roman Urdu language does not follow any regular grammatical pattern and can be represented in different ways.
Therefore we followed rule-based translation and developed various grammatical rules to carry out the process of translation in a tagger. Furthermore, several words in Roman Urdu can be spelt in various ways since there is no hard and fast rule for spellings in Roman Urdu grammar. Therefore, we managed a collection of the corpus in a knowledge base to accommodate maximum possible words and match the occurrence of each word in an input string with all the similar words of our knowledge base.
Some cases of natural language problem have been left for the future due to lack of time and unavailability of a large amount of data.
Text Generation : Kathleen R. McKeown :
Translation process could also be improved by involving machine learning approach, which could train the system on the basis of its current performance. Hafsa Masroor, Maryam Feroz: Contributed reagents, materials, analysis tools or data; Wrote the paper. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. National Center for Biotechnology Information , U. Journal List Heliyon v. Published online May Author information Article notes Copyright and License information Disclaimer.
Khawar Islam: kp. Abstract Advances in machine and language translation immerge new fields and research opportunities for researchers, whereas Natural Language Processing and Computational Linguistics deal with communication between natural languages and their interaction. Keywords: Computer science, Linguistics. Introduction Natural Language Processing is associated with natural languages and machine translation.
Open in a separate window. Related work We summarized all the researches and studies developed for Urdu translation.
Materials Data collection is always a challenging part of any research. Corpus collection With the help of [ 17 , 18 ], the target size for the corpus required for translation is around words and over different sentences. Knowledge base model We have built the knowledge base model for gathering and maintaining the corpus required for the translation process. Context-free grammar A context-free grammar contains a set of rules which determines the syntactic structure of any language.
Methods It is a difficult task to develop an algorithm for translation of Roman Urdu to English language and work very effective in translating into another language. Step 1: Get Roman Urdu sentence as an input from the user. Step 2: Split input sentence into words and determine its POS tag. Step 4: Find English words according to Roman Urdu words. Step 5: Tokenizing each sentence a. Methodology Translation is the process of converting source language Roman Urdu into the target language English. Scanner The scanner is the first phase of Transtech.
- How does your enterprise plan to use natural language processing?!
- 1. Introduction!
- 1. Text Classification.
- Natural Language Processing Examples in Government Data | Deloitte Insights.
- Melvil Decimal System: 420.28.
Spell checker and learning agent The spell checker is embedded along with the scanner which performs spell checking of the tokens with the assistance of data available in the knowledge base model. POS tagger Parsing is the task of determining the syntax of an input sentence.
Translator It is the third phase of Transtech which performs meaningful type checking and semantic analysis. Which market did you go Wo bohat achay kapre pehnti hai She wears nice clothes many Wear good clothes She wears very good clothes Imran waqt par ghar nahi pohanchta hai He does not come home on time Imran does not know home at time Imran do not reach home on time Ali ajkal bohat pareshan hai Many consignment Ali today Eli is a booming trend today Ali is very upset now-a-days Areeba khamoshi se apna kaam kar rahi hai Areeba quietly doing its job Aurabagh is doing his job quietly Areeba is doing work silently.
Declarations Author contribution statement Hafsa Masroor, Maryam Feroz: Contributed reagents, materials, analysis tools or data; Wrote the paper. Muhammad Saeed: Conceived and designed the experiments. Kamran Ahsan: Performed the experiments.
Khawar Islam: Analyzed and interpreted the data. Funding statement This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Competing interest statement The authors declare no conflict of interest.
Melvil Decimal System: 420.28
Additional information No additional information is available for this paper. References 1.
- Modern Poetry after Modernism!
- Compatibility Mode;
- The Wapshot Chronicle.
Artificial Intelligence Review; Urdu language processing: a survey; pp. Ahmed Tafseer, Hautli Annette. Proceedings of CLT Developing a basic lexical resource for Urdu using Hindi WordNet. Abbas Qaiser. Semi-semantic part of speech annotation and evaluation. Visweswariah K. Association for Computational Linguistics; , August. Urdu and Hindi: translation and sharing of linguistic resources; pp. Urdu and Hindi: Translation and sharing of linguistic resources.
Association for Computational Linguistics. Becker Dara, Riaz Kashif. Association for Computational Linguistics; Association for Computational Linguistics, Adeeba F. Proceedings of the 9th Workshop on Asian Language Resources. Experiences in building the UrduWordNet; pp. In: Proceedings of the 9th workshop on Asian language resources, pp Roxas R. Proceedings of the 9th workshop on asian language resources.