IndoWordNet

IndoWordNet is a linked lexical knowledge base of wordnets of 18 scheduled languages of India, viz., Assamese, Bangla, Bodo, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Malayalam, Meitei, Marathi, Nepali, Odia, Punjabi, Sanskrit, Tamil, Telugu and Urdu.

Background

In early 90s, the wordnet for English- called Princeton WordNet- was created in Princeton University by George Miller and Christiane Fellbaum who went on to get the prestigious Zampoli Prize in 2006. Then followed the EuroWordNet- the conglomeration of European Language wordnets- which got created in 1998. Wordnets are now essential resources for Natural Language Processing, Information Extraction, Word Sense Disambiguation and such other computations involving text.

Importance of Indian languages

Indian languages form a very significant component of the languages landscape of the world. There are 4 streams of language typology operative in the Indian subcontinent- Indo European, Dravidian, Tibeto Burman and Austro Asiatic. Many languages rank within top 10 in the world in terms of the population speaking them, e.g., Hindi-Urdu 5th, Bangla 7th, Marathi 12th and so on as per the List of languages by number of native speakers. Creating wordnets of Indian languages is therefore a highly important techno-scientific and linguistic project.

Genesis of Indian language wordnets

Such project indeed took off in 2000 with Hindi WordNet being created by the Natural Language Processing group at the Center for Indian Language Technology in the Computer Science and Engineering Department at IIT Bombay. It was made publicly available in 2006 under the GNU license. The Hindi WordNet was created with support from the TDIL project of Ministry of Communication and Information Technology, India and also partially from Ministry of Human Resources Development, India.
Wordnets of other languages of India then followed suit. The large nationwide project of building Indian language wordnets was called the IndoWordNet project. IndoWordNet is a linked lexical knowledge base of wordnets of 18 scheduled languages of India, viz., Assamese, Bangla, Bodo, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Malayalam, Meitei, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Tamil, Telugu and Urdu. The wordnets are getting created by using expansion approach from the Hindi WordNet. The Hindi WordNet was created from first principles and was the first wordnet for an Indian language. The method adopted was same as the Princeton WordNet for English.
Polish WordNet is being mapped to Princeton WordNet based on the strategy followed by IndoWordNet.

Principles of wordnet construction

The wordnets follow the principles of minimality, coverage and replaceability for the synsets. That means, there should be at least a 'core' set of lexemes in the synset that uniquely give the concept represented by the synset, e.g., standing for the concept of 'family'. Then the synset should cover ALL the words representing the concept in the language, e.g., the word 'menage' will have to appear in the 'family' synset, albeit, towards the end of the synset, since its usage is rare. Finally, the words towards the beginning of the synset should be able to replace one another in reasonable amount of corpora, e.g., 'house' and 'family' can replace each other in the sentence "she is from a noble house".

Statistics of Indian language wordnets

The number of synsets in the languages and the institutes creating the language WordNets are as below:

Language	Synsets	Institute
Assamese	14958	Guwahati University, Guwahati, Assam
Bengali	36346	Indian Statistical Institute, Kolkata, West Bengal
Bodo	15785	Guwahati University, Guwahati, Assam
Gujarati	35599	Dharamsinh Desai University, Nadiad, Gujarat
Hindi	38607	IIT Bombay, Mumbai, Maharashtra
Kannada	20033	Mysore University, Mysore, Karnataka
Kashmiri	29469	Kashmir University, Srinagar, Jammu and Kashmir
Konkani	32370	Goa University, Taleigao, Goa
Malayalam	30060	Amrita University, Coimbatore, Tamil Nadu
Marathi	29674	IIT Bombay, Mumbai, Maharashtra
Meitei	16351	Manipur University, Imphal, Manipur
Nepali	11713	Assam University, Silchar, Assam
Oriya	35284	Hyderabad Central University, Hyderabad, Andhra Pradesh
Punjabi	32364	Thapar University and Punjabi University, Patiala, Punjab
Sanskrit	23140	IIT Bombay, Mumbai, Maharashtra
Tamil	25431	Tamil University, Thanjavur, Tamil Nadu
Telugu	21925	Dravidian University, Kuppam, Andhra Pradesh
Urdu	34280	Jawaharlal Nehru University, New Delhi

Summary

IndoWordNet is highly similar to EuroWordNet. However, the pivot language is Hindi which, of course, is linked to the English WordNet. Also typical Indian language phenomena like complex predicates and causative verbs are captured in IndoWordNet.
IndoWordNet is publicly browsable. The Indian language wordnet building efforts forming the subcomponents of IndoWordNet project are: North East WordNet project, Dravidian WordNet Project and Indradhanush project all of which are funded by the TDIL project.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...