If artificial intelligence is the magic genie of the tech world, as Elon Musk put it during his interview with UK PM Rishi Sunak yesterday, then data is the magic lamp that houses the genie. Without the lamp, the genie can’t come out and grant people wishes and similarly, without high-quality data, AI models will not be able to do their job properly. Even before OpenAI’s ChatGPT started the entire AI frenzy, a 27-year-old Stanford alum Manu Chopra knew the importance of data annotation and labeling, and the potential of the industry. That’s why he started Karya, a startup that focuses on non-English data annotation with high accuracy which can be used by companies that are building AI models for Indians who don’t understand English. Let us check 10 key points about Manu and Karya.
10 key points on Karya
1. Manu Chopra is a 27-year-old computer engineer from Stanford University.
2. Chopra founded Karya in 2021. It is a data annotation company that sources, labels and annotates data.
3. Karya differentiates itself from other data vendors by offering its contractors – mostly women in rural communities – as much as 20 times the prevailing minimum wage.
4. First, of course, quality is insisted upon. Karya promises its clients high-quality Indian-language data with high accuracy that will enable AI models to learn without picking biases, misinformation, or low-quality data.
5. “Every year, big tech companies spend billions of dollars collecting training data for their AI” and machine learning models, Manu Chopra told Bloomberg. And that is where the high cost is justified and absorbed easily.
6. Microsoft has used Karya to source local speech data for its AI products. The Bill & Melinda Gates Foundation is working with Karya to reduce gender biases in data that feeds into large language models, the technology underpinning AI chatbots.
7. Google is also working with Karya and other local partners to gather speech data in 85 Indian districts. Google plans to expand to every district to include the majority language or dialect spoken and build a generative AI model for 125 Indian languages.
8. Karya employs 70 workers hired in Agara and neighboring villages to gather text, voice, and image data in India’s vernacular languages, as per Bloomberg.
9. Karya uses a user-friendly application and a work-from-anywhere model so that anyone who owns a smartphone can be a Karya worker! Karya connects digital workers to a variety of dataset demands across four areas.
10. As per the startup, it collects the requirements from clients, breaks down the requirements into bite-sized digital tasks, identifies the best-suited workers, collects the data, validates the data, and synthesizes these into high-quality AI/ML training datasets. It also claims to source the data ethically.