logo
Description
We are looking for a Backend Engineer specialized in Generative AI infrastructure
and system design. In this role, you’ll build the backbone of our LLM-driven
applications, enabling APIs, orchestration, performance optimization, and secure,
scalable deployment of AI-powered features.
You’ll collaborate with AI/ML engineers, product managers, and frontend developers to
bring GenAI to production in real-world, multi-tenant environments.

Core Responsibilities

1. LLM Serving & API Design
● Design and implement RESTful or gRPC APIs to expose GenAI capabilities.
● Integrate with inference endpoints like OpenAI, Bedrock, Vertex AI, Hugging
Face, or host local models.
● Serve fine-tuned/local models via FastAPI, TGI, vLLM, Ray Serve, or Triton
Inference Server.

2. System Architecture
● Design backend systems for GenAI workflows: RAG pipelines, agent
orchestration, human-in-the-loop feedback.
● Architect scalable microservices or serverless backends (AWS Lambda,
Fargate, GKE, etc.).
● Ensure clean separation between model logic, business logic, and UI
integration.

3. Performance & Scalability
● Optimize model throughput and latency via batching, streaming (SSE/gRPC),
Redis caching, load balancing.
● Implement fallback strategies based on confidence scores or error modes.
● Monitor and balance GPU/CPU resources for cost and performance.

4. Data & Pipeline Integration
● Connect to vector stores (Pinecone, Weaviate, OpenSearch, Qdrant) for
retrieval workflows.
● Interface with relational and NoSQL databases for user context, metadata, and document logs.
● Integrate with object stores (S3, GCS) to manage files, chunking, embeddings, and pipeline triggers.

5. Authentication, Authorization & Rate Limiting
● Implement JWT, OAuth2, API key mechanisms.
● Enable RBAC/ABAC, tenant isolation, and quota control.
● Track usage by user/tenant and apply rate-limiting policies. Collaboration Responsibilities

6. With Data Scientists & ML Engineers
● Wrap models as scalable production APIs.
● Manage inference pipelines, versioning, and deployment.
● Implement observability and tracing for model debugging.

7. With Product & Frontend Teams
● Translate user experiences into backend APIs: chat, search, summarization, and co-pilot UX.
● Support real-time GenAI interactions by managing session state, persistent
memory, and chat history.
Monitoring & Observability
● Set up monitoring via ELK, Prometheus, Grafana.
● Implement tracing and logging with OpenTelemetry or custom middleware.
● Ensure system observability for compliance, debugging, and auditability.
Tech Stack
Layer Technologies

Languages Python, Node.js, Go, Java
Frameworks FastAPI, Flask, Express.js, Spring Boot
LLM Serving vLLM, TGI, Ray Serve, LangChain, Haystack, LlamaIndex

Databases PostgreSQL, MongoDB, Redis
Vector Stores Pinecone, Weaviate, OpenSearch, Qdrant
DevOps / Infra Docker, Kubernetes, Terraform, Helm
CI/CD & Testing

GitHub Actions, Jenkins, Postman, pytest, Cloud
Platforms

AWS (Lambda, ECS), GCP (GKE), Azure

Requirements
● 0-2 years of backend development experience, including cloud-native service
design.
● Hands-on experience deploying GenAI models, serving infrastructure, and
LLM APIs.
● Ability to own end-to-end backend architecture for scalable, multi-user, AI-driven
platforms.