Bhasha Setu (pan-India language data and AI initiatives)
ARTPARK (AI & Robotics Technology Park), IISc, Bangalore
As part of an ambitious India-wide program, you will help create unique, high-quality open-source speech and text datasets spanning every district to accelerate the state-of-the-art in speech and language technology. The program also aims at building applications based on these technological developments to pioneer Indic language inclusion in the country.
You will be part of the quality program, working closely with leading NLP researchers at IISc and the leadership of ARTPARK, in addition to NLP researchers at the world’s top tech companies.
Responsibilities:
- Team Building
Hire & Create an inspiring team environment with an open communication culture
Planning
- Developing strategies for ensuring data quality at scale
- Plan for application-specific model development
- Deliberation of technical work across team members for smooth execution
- Prioritization across different workstreams to achieve business objectives
- Design and develop ETL pipelines. Set up and maintain feature stores, databases, and data catalogues.
- Execution
- Data preparation: building scripts for the preprocessing of a large amount of data.
- Training models: Model building with different ASR toolkits such as Kaldi, ESPnet, SpeechBrain etc
- Research: Implementing and experimenting for research on ASR/Signal Representation and assessment.
- MLOps: Deployment of trained ASR models