We are seeking a talented and experienced Data Scientist having 3+ year of experience with core expertise in building recommendation engines to join our dynamic team. In this role, you will be responsible for developing, maintaining, and enhancing machine learning models to parse both text-based and image-based documents. Utilizing libraries such as PyMuPDF, Tika, PDFMiner, and PyTesseract, you will contribute to the continuous improvement of our document processing capabilities
Qualifications:
- A bachelor's/master's/Ph.D. degree in Computer Science, Statistics, Mathematics, or a related field.
- 2-4 years of experience in building ML models
- 3-4 years of experience analytical skills and the ability to extract insights from large sets of data
- 4-7 years of experience with programming languages such as Python or SQL
- 3 years of experience with data-related libraries such as Pandas, NumPy, or Matplotlib
- 3-5 years of experience with database management systems such as MySQL, PostgreSQL, or MongoDB
- Strong problem-solving skills and ability to work well in a team environment
- Strong written and verbal communication skills
- Strong understanding of statistics, machine learning, deep learning, and data modeling
- Experience with cloud-based data platforms such as AWS or GCP is a plus
- Experience with big data technologies such as Hadoop and Spark is a plus
-
Key Responsibilities:
- Data Analysis and Interpretation: Collect, clean, and preprocess data from various sources, ensuring its integrity and reliability. Perform exploratory data analysis to identify patterns, trends, and relationships within the data. Apply statistical techniques and data visualization methods to effectively communicate findings to stakeholders.
- Model Development: Design and develop predictive and prescriptive models using advanced machine learning algorithms and statistical techniques. Evaluate and select appropriate modeling approaches based on the problem at hand. Optimize models for accuracy, performance, and scalability.
- Data Mining and Pattern Recognition: Identify relevant data sources and extract valuable insights by applying data mining techniques. Utilize pattern recognition algorithms to discover hidden patterns, anomalies, and trends within large datasets. Develop strategies for feature selection and dimensionality reduction.
- Machine Learning Implementation: Build and deploy machine learning models into production systems. Collaborate with software engineers and IT teams to integrate models into existing workflows or develop new applications. Ensure the models are scalable, efficient, and maintainable.