logo
Description
About the Role 

We are looking for a Data Scientist specialized in Generative AI (GenAI) to help
design and implement intelligent, LLM-based systems for real-world applications. You
will work closely with backend engineers and product teams to build GenAI pipelines
that are explainable, scalable, and grounded in data integrity. This role requires both
deep technical knowledge and hands-on implementation skills.
Key Responsibilities
● Design and implement retrieval-augmented generation (RAG) pipelines using
LangChain or similar frameworks.
● Fine-tune or orchestrate large language models (LLMs) for internal knowledge,
semantic search, summarization, Q&A, or document classification.
● Build, transform, and manage custom embedding datasets, and evaluate
relevance, latency, and recall.
● Optimize prompt strategies and conduct A/B testing for response quality and
consistency.
● Collaborate with engineering to expose GenAI services via APIs and integrate
with production systems.
● Monitor drift, quality, and fairness of model predictions and recommendations.
● Contribute to establishing best practices in GenAI explainability, observability,
and safety.
Required Skills
● half a year - 2+ years of hands-on experience in data science, ML/NLP, or AI engineering
roles.
● Strong Python skills, with experience using Hugging Face Transformers,
LangChain, or LlamaIndex.
● Solid understanding of RAG architecture, prompt engineering, tokenization, and
embeddings.
● Experience with vector databases like FAISS, Weaviate, Pinecone, or Vespa.
● Familiarity with OpenAI, Claude, Gemini, or custom-hosted LLM APIs.
● Good knowledge of version control (Git), Python virtual environments, and
reproducibility tools (e.g., DVC, MLflow).
Requirements
Nice to Have

● Experience in evaluating GenAI output via human feedback loops or auto-metrics
(BLEU, ROUGE, cosine similarity).
● Exposure to Databricks, Gradio, or similar for prototyping AI apps.
● Experience deploying models via FastAPI, Flask, or serving them via API
gateways.
● Background in data privacy, hallucination detection, or model alignment.