Data Scientist - GenAI Focus

Description

About the Role

We are looking for a Data Scientist specialized in Generative AI (GenAI) to help

design and implement intelligent, LLM-based systems for real-world applications. You

will work closely with backend engineers and product teams to build GenAI pipelines

that are explainable, scalable, and grounded in data integrity. This role requires both

deep technical knowledge and hands-on implementation skills.

Key Responsibilities

● Design and implement retrieval-augmented generation (RAG) pipelines using

LangChain or similar frameworks.

● Fine-tune or orchestrate large language models (LLMs) for internal knowledge,

semantic search, summarization, Q&A, or document classification.

● Build, transform, and manage custom embedding datasets, and evaluate

relevance, latency, and recall.

● Optimize prompt strategies and conduct A/B testing for response quality and

consistency.

● Collaborate with engineering to expose GenAI services via APIs and integrate

with production systems.

● Monitor drift, quality, and fairness of model predictions and recommendations.

● Contribute to establishing best practices in GenAI explainability, observability,

and safety.

Required Skills

● half a year - 2+ years of hands-on experience in data science, ML/NLP, or AI engineering

roles.

● Strong Python skills, with experience using Hugging Face Transformers,

LangChain, or LlamaIndex.

● Solid understanding of RAG architecture, prompt engineering, tokenization, and

embeddings.

● Experience with vector databases like FAISS, Weaviate, Pinecone, or Vespa.

● Familiarity with OpenAI, Claude, Gemini, or custom-hosted LLM APIs.

● Good knowledge of version control (Git), Python virtual environments, and

reproducibility tools (e.g., DVC, MLflow).

Requirements

Nice to Have

● Experience in evaluating GenAI output via human feedback loops or auto-metrics

(BLEU, ROUGE, cosine similarity).

● Exposure to Databricks, Gradio, or similar for prototyping AI apps.

● Experience deploying models via FastAPI, Flask, or serving them via API

gateways.

● Background in data privacy, hallucination detection, or model alignment.