Konkani Text Corpus project launched to preserve language online | Mangaluru News

Mangaluru: In order to treasure the original language and ensure its presence on the digital platform, the World Konkani Centre, Shaktinagar, is undertaking the Konkani Text Corpus Project.The project, which is already underway and is likely to be completed within a few months, aims to add 2,000 small sample write-ups of 100 words each in GSB Konkani, along with equivalent translations in English. It will be available online, accessible from anywhere in the world.B Devadas Pai, coordinator of the project, explained the need to go digital with the Konkani Text Corpus. He said that before Portuguese rule began in Goa, Konkani was the spoken language of those who migrated southwards. Over the past five to six centuries, it has been losing its linguistic documentation. As a result, the GSB Konkani people here have been distancing themselves from the Konkani of Goa. This unique dialect is considered the original form of Konkani, closely related to Prakrit. The World Konkani Centre is now actively working in this direction and has undertaken a project to prepare a large-scale text corpus.The project involves collecting around 2,000 samples using sources such as Konkani traditions, the day-to-day life of GSB Konkani-speaking individuals, wedding rituals, folklore songs, and selected parts of literary works by authors. The project will also include subtopics such as cultural rituals, funeral rites, naming ceremonies for babies, and lullabies sung while rocking a cradle.“We have selected an age group above 40 to 60 years for the project, who will write and provide the sample text. The exercise will provide us with Konkani sentences as well as a large vocabulary collection and help to preserve the language,” he said.To carry out the project, selected individuals from various parts of undivided Dakshina Kannada, including Konkani graduates, postgraduates, and literature enthusiasts, have been trained over several rounds to prepare 100-word articles from various fields. Each article will have a headline with at least four words, ensuring it is easily available while browsing. These writings will be in the Devanagari script registered with the UGC, accompanied by an English translation. Nandagopal Shenoy, the president of the World Konkani Centre, has taken a special interest in this initiative, according to Pai. The corpus will assist in studying which words are most frequently used in the language, collected from various sources.