Job Description:
● Design, implement, and productionize multimodal RAG systems (text + vision) that ground LLM/VLM outputs in enterprise knowledge.
● Build agentic workflows that can perceive (images, PDFs, screenshots), reason, and act: function/tool calling, JSON‑schema actions, planning/reflect loops, and short/long‑term memory.
● Stand up and operate vector/search infrastructure for text and image embeddings (e.g., Postgres/pgvector, Elasticsearch/OpenSearch, Pinecone/Qdrant/FAISS) with hybrid retrieval and re‑rankers.
● Create robust ingestion for large and messy corpora: PDFs (scanned & born‑digital), images, HTML, logs, tables; apply OCR, layout analysis, chunking, and metadata enrichment.
● Implement online/offline evals for multimodal tasks: groundedness/faithfulness, retrieval precision/recall, VQA accuracy, OCR WER, table/diagram extraction quality, latency/cost—and wire them into CI.
● Add guardrails and safety filters: policy checks, prompt hardening, schema/output validation, image moderation, PII redaction, and defense against prompt injection/data exfiltration.
● Optimize throughput and reliability: batching, caching (request/result/embedding), retries/timeouts/fallbacks, concurrency control, and GPU utilization.
● Run rapid experiments (A/B and canaries) to iterate on retrieval, prompts, re‑rankers, tools, routing, and multimodal prompting strategies.
● Instrument systems for observability (telemetry, tracing, cost/latency dashboards) and maintain SLOs.
Qualifications:
● 2+ years building applied ML/LLM systems (production). Strong Python; TypeScript familiarity is a plus.
● Shipped features using embeddings, retrieval, and re‑ranking; comfortable with structured outputs and function/tool calling.
● Hands‑on experience with vision‑language models (VLMs) and vision pipelines (e.g., OpenAI Vision/GPT‑4o‑class, Gemini‑Vision‑class, Claude‑with‑vision, LLaVA/BLIP/CLIP, LayoutLMv3/Donut).
● Practical understanding of multimodal RAG design trade‑offs (latency, recall, cost, context limits) and evaluation beyond "vibes".
● Proficiency with one or more vector/search stacks: Postgres/pgvector, Elasticsearch/OpenSearch, Pinecone, Qdrant, FAISS.
● Familiarity with orchestration/tooling (LangChain, LangGraph, LlamaIndex) and building chains/agents for text + vision.
● Experience with OCR & document understanding (e.g., Tesseract/PaddleOCR, layout‑parser/DocAI‑style tooling) and PDF/image preprocessing.
● Solid engineering practices: Git/CI, testing, code review, observability, secure deployment (Docker/Kubernetes familiarity helpful).
این آگهی از وبسایت جاب ویژن پیدا شده، با زدن دکمهی تماس با کارفرما، به وبسایت جاب ویژن برین و از اونجا برای این شغل اقدام کنین.