Backend Engineer

Description

We are looking for a Backend Engineer specialized in Generative AI infrastructure

and system design. In this role, you’ll build the backbone of our LLM-driven

applications, enabling APIs, orchestration, performance optimization, and secure,

scalable deployment of AI-powered features.

You’ll collaborate with AI/ML engineers, product managers, and frontend developers to

bring GenAI to production in real-world, multi-tenant environments.

Core Responsibilities

1. LLM Serving & API Design

● Design and implement RESTful or gRPC APIs to expose GenAI capabilities.

● Integrate with inference endpoints like OpenAI, Bedrock, Vertex AI, Hugging

Face, or host local models.

● Serve fine-tuned/local models via FastAPI, TGI, vLLM, Ray Serve, or Triton

Inference Server.

2. System Architecture

● Design backend systems for GenAI workflows: RAG pipelines, agent

orchestration, human-in-the-loop feedback.

● Architect scalable microservices or serverless backends (AWS Lambda,

Fargate, GKE, etc.).

● Ensure clean separation between model logic, business logic, and UI

integration.

3. Performance & Scalability

● Optimize model throughput and latency via batching, streaming (SSE/gRPC),

Redis caching, load balancing.

● Implement fallback strategies based on confidence scores or error modes.

● Monitor and balance GPU/CPU resources for cost and performance.

4. Data & Pipeline Integration

● Connect to vector stores (Pinecone, Weaviate, OpenSearch, Qdrant) for

retrieval workflows.

● Interface with relational and NoSQL databases for user context, metadata, and document logs.

● Integrate with object stores (S3, GCS) to manage files, chunking, embeddings, and pipeline triggers.

5. Authentication, Authorization & Rate Limiting

● Implement JWT, OAuth2, API key mechanisms.

● Enable RBAC/ABAC, tenant isolation, and quota control.

● Track usage by user/tenant and apply rate-limiting policies. Collaboration Responsibilities

6. With Data Scientists & ML Engineers

● Wrap models as scalable production APIs.

● Manage inference pipelines, versioning, and deployment.

● Implement observability and tracing for model debugging.

7. With Product & Frontend Teams

● Translate user experiences into backend APIs: chat, search, summarization, and co-pilot UX.

● Support real-time GenAI interactions by managing session state, persistent

memory, and chat history.

Monitoring & Observability

● Set up monitoring via ELK, Prometheus, Grafana.

● Implement tracing and logging with OpenTelemetry or custom middleware.

● Ensure system observability for compliance, debugging, and auditability.

Tech Stack

Layer Technologies

Languages Python, Node.js, Go, Java

Frameworks FastAPI, Flask, Express.js, Spring Boot

LLM Serving vLLM, TGI, Ray Serve, LangChain, Haystack, LlamaIndex

Databases PostgreSQL, MongoDB, Redis

Vector Stores Pinecone, Weaviate, OpenSearch, Qdrant

DevOps / Infra Docker, Kubernetes, Terraform, Helm

CI/CD & Testing

GitHub Actions, Jenkins, Postman, pytest, Cloud

Platforms

AWS (Lambda, ECS), GCP (GKE), Azure

Requirements

● 0-2 years of backend development experience, including cloud-native service

design.

● Hands-on experience deploying GenAI models, serving infrastructure, and

LLM APIs.

● Ability to own end-to-end backend architecture for scalable, multi-user, AI-driven

platforms.