AI Engineer Roadmap 2026: From LLM APIs to Production (Step-by-Step)
A free, step-by-step AI engineering roadmap for 2026. Learn to build applications on foundation models: transformers, prompt engineering, evaluation, RAG, agents, finetuning, and production deployment. Grounded in Chip Huyen's AI Engineering, Stanford CS336, and MIT 6.S191, with hands-on RAG, agent, and evaluation projects to build the portfolio that lands an AI engineer job.
This roadmap was created by data engineering professionals with 67 hands-on tasks covering production-ready skills used by companies like Netflix, Airbnb, and Spotify. Master Python, OpenAI API, Anthropic API and 5 more technologies.
How long does it take? Engineers with Python experience typically complete this roadmap in 5-8 months studying part-time (10-15 hours/week), or about 3-4 months full-time. The 13 sections contain 67 hands-on tasks.
The 13 steps: (0) Prerequisites · (1) Deep Learning and Transformer Foundations · (2) Understanding Foundation Models · (3) Working with LLM APIs · (4) Prompt Engineering · (5) Evaluation · (6) Retrieval-Augmented Generation (RAG) · (7) AI Agents · (8) Finetuning · (9) Dataset Engineering · (10) Inference Optimization · (11) Production Architecture and Observability · (12) Portfolio and Job Search.
Skills You'll Learn
- Prompt engineering
- LLM evaluation
- Retrieval-Augmented Generation (RAG)
- AI agents and tool use
- Finetuning (LoRA/QLoRA)
- Inference optimization
- Guardrails and AI safety
- Production AI architecture
Tools You'll Use
- Python
- OpenAI API
- Anthropic API
- LangChain
- Hugging Face
- Vector databases
- PyTorch
- vLLM
Projects to Build
- Production RAG System with Retrieval Evaluation
Build a retrieval-augmented generation system over a real document set: chunking, embeddings, hybrid search with a reranker, grounded answers with citations, and a retrieval + faithfulness evaluation that proves it works.
- LLM Agent with Tools and Failure-Mode Evaluation
Build an agent that plans, calls real tools (function calling), manages memory, and recovers from failures, then evaluate it on its trajectory and failure modes, not just happy-path demos.
- LLM Evaluation Pipeline with Golden Dataset and LLM-as-a-Judge
Build a reusable evaluation pipeline for LLM applications: a golden dataset, automated scoring with LLM-as-a-judge, and regression testing you can point at any prompt or model change to catch quality drops before users do.
Learning Resources
Step 0: Prerequisites
Step 1: Deep Learning and Transformer Foundations
Step 2: Understanding Foundation Models
Step 3: Working with LLM APIs
Step 4: Prompt Engineering
Step 5: Evaluation
Step 6: Retrieval-Augmented Generation (RAG)
Step 7: AI Agents
Step 8: Finetuning
Step 9: Dataset Engineering
Step 10: Inference Optimization
Step 11: Production Architecture and Observability
Step 12: Portfolio and Job Search
Curriculum Reference
A free preview of the learning material in this roadmap — the full reference for every section is available when you sign in. Click any task to expand it.
Step 0: Prerequisites
Get fluent in Python beyond the basics: functions, classes, type hints, virtual environments, and async/await for concurrent API calls
You do not need to be a Python expert to start, but AI engineering leans on a few patterns more than typical scripting.
What to be comfortable with
- Type hints:
def embed(text: str) -> list[float]:— they make LLM client code and tool schemas readable and self-documenting - Dataclasses / Pydantic: model request and response shapes; Pydantic is the de facto way to validate structured LLM outputs
- Virtual environments:
python -m venv .venvandpip install, oruvfor speed — isolate every project - async/await: LLM calls are network-bound. Running 50 eval examples sequentially is slow;
asyncio.gatherruns them concurrently - Generators / streaming: token streams from LLM APIs arrive incrementally —
for chunk in stream:
Why it matters
Most AI engineering code is glue: call a model, validate its output, retry on failure, log the result. Clean Python with types and async turns a fragile demo into something you can ship and test.
- Async IO in Python: A Complete Walkthrough (Real Python) (documentation)
Build core machine learning literacy: supervised learning, train/validation/test splits, overfitting, and what "a model" is, without needing to train one from scratch
Work confidently with REST APIs and JSON, and learn to manage API keys and secrets with environment variables
Every AI app talks to model providers over HTTP with an API key. Leaking that key is the most common, most expensive beginner mistake.
Rules
- Never hardcode keys in source. Use environment variables:
os.environ["OPENAI_API_KEY"] - Never commit
.env— add it to.gitignore. Use.env.examplewith blank values for documentation - Rotate keys if one is ever exposed in a commit, screenshot, or log
- Set spend limits in the provider dashboard so a runaway loop cannot drain your budget
Reading responses
LLM APIs return JSON. You will parse fields like choices[0].message.content, usage.total_tokens, and stop_reason. Get comfortable inspecting JSON responses before building on top of them.
- An overview of HTTP (MDN Web Docs) (documentation)
Understand what AI engineering is, how it differs from ML engineering and full-stack engineering, and the three layers of the AI stack (application development, model development, infrastructure)
Chip Huyen's framing, which the rest of this roadmap follows: AI engineering is about building applications on top of foundation models that already exist, not training models from scratch.
The shift
| Traditional ML Engineering | AI Engineering |
|---|---|
| Start from data, train a model | Start from a pre-trained foundation model |
| Feature engineering, tabular data | Prompt engineering, context construction |
| Model training is the core work | Adaptation and evaluation are the core work |
| Weeks to a first model | Minutes to a first working prototype |
The three layers of the AI stack
- Application development — prompts, context, evaluation, the product. Where most AI engineers work.
- Model development — training, finetuning, dataset engineering, inference optimization.
- Infrastructure — serving, compute, monitoring.
Why this matters for your career
Because the model is pre-built, the differentiators are no longer 'can you train a model' but 'can you evaluate, ground, and ship one reliably.' That is why this roadmap spends entire sections on evaluation, RAG, agents, and guardrails.
Frequently Asked Questions
What does an AI engineer actually do?
An AI engineer builds applications on top of foundation models (LLMs and multimodal models) rather than training models from scratch. The day-to-day is prompt engineering, retrieval-augmented generation (RAG), building and evaluating agents, designing evaluation pipelines, adding guardrails, and shipping reliable, observable AI features to production. Chip Huyen frames it as adapting foundation models to real-world problems.
What is the difference between an AI engineer and a machine learning engineer?
ML engineering builds applications on traditional models, with more tabular data, feature engineering, and model training. AI engineering builds on top of pre-trained foundation models, with more prompt engineering, context construction, retrieval, and parameter-efficient finetuning. Most AI engineering work starts from a model that already exists and focuses on adapting and evaluating it for a specific use case.
Do I need a PhD or deep math to become an AI engineer?
No. You need solid Python, comfort with APIs, and a working understanding of how transformers and foundation models behave. This roadmap teaches the deep learning foundations you need (attention, tokenization, sampling) without requiring you to train a model from scratch. The biggest skills hiring managers look for in 2026 are RAG, agents, and evaluation.
What should I learn first for AI engineering?
Start with strong Python and the ability to call LLM APIs (OpenAI, Anthropic), then learn prompt engineering and evaluation. Evaluation is the single most underrated skill: nearly every senior AI engineer job description asks for experience designing eval pipelines, golden datasets, and LLM-as-a-judge workflows before any finetuning.
Which projects should an AI engineer build for a portfolio?
Build three end-to-end projects: a production RAG system with retrieval evaluation, an LLM agent with tool use and a failure-mode evaluation, and a reusable LLM evaluation pipeline with a golden dataset and LLM-as-a-judge. These map directly to what AI engineer job postings ask for and show you can ship reliable, evaluated AI systems, not just demos.