AWS Solutions Architect Interview Questions 2026

Solutions Architect interviews test your ability to design systems that are reliable, scalable, and cost-effective — and to defend your decisions under questioning. Here's what you need to know.

What Interviewers Actually Test

Three things separate great Architect interview answers:

1. Tradeoff reasoning — Never just name a service. Explain why that service and what you gave up to choose it.

2. Failure mode thinking — Every design question has a "what if X fails?" follow-up. Address it proactively.

3. Cost awareness — Senior architects think about money. Mention cost implications of your choices.

High Availability & Fault Tolerance

1. What is the difference between High Availability and Fault Tolerance?

High Availability (HA): The system minimises downtime. It may briefly be unavailable during failover but recovers quickly. Target: 99.9% to 99.99% uptime.

Fault Tolerance: The system continues operating without any interruption even when components fail. No downtime at all. More expensive.

Examples:

Multi-AZ RDS: HA — primary fails, standby promotes in ~60 seconds (brief downtime)
Active-active multi-region: Fault tolerant — one region fails, traffic shifts instantly

When to use which: Most applications need HA. True fault tolerance is reserved for payment processing, life-critical systems, and applications where even 60 seconds of downtime is unacceptable.

2. Design a highly available three-tier web application on AWS

Presentation tier (web/CDN):

CloudFront in front of everything — caches static assets globally, absorbs DDoS
ALB across two AZs minimum, targets in multiple AZs
EC2 Auto Scaling Group or ECS/EKS for the web tier

Application tier:

Internal ALB + Auto Scaling Group
Multi-AZ deployment (minimum 2 AZs, ideally 3)
EC2 instances or containers in private subnets

Data tier:

Aurora MySQL/PostgreSQL with Multi-AZ — automatic failover
ElastiCache (Redis) for session storage and caching — also Multi-AZ
S3 for static files — inherently 11 nines durability, 3 AZ replication

Key architectural decisions to mention:

Private subnets for app and data tier — only ALB is in public subnets
NAT Gateway per AZ — single NAT Gateway is a single point of failure
Read replicas for Aurora to offload read traffic
Separate security groups per tier with minimal access

3. What is the difference between RTO and RPO?

RTO (Recovery Time Objective): How long can you be down? The maximum acceptable time to restore service after a failure.

RPO (Recovery Point Objective): How much data can you lose? The maximum acceptable age of files/data that must be recovered.

Examples:

RTO = 4 hours, RPO = 24 hours: A daily backup restored to a new server is acceptable
RTO = 5 minutes, RPO = 0: Need real-time replication and automatic failover

AWS services by RTO:

Strategy	RTO	RPO	Cost
Backup & Restore	Hours	Hours	Lowest
Pilot Light	10-30 min	Minutes	Low
Warm Standby	2-10 min	Seconds	Medium
Multi-Site Active-Active	Seconds	Zero	Highest

Storage & Database Design

4. When would you use each AWS storage service?

Service	Use Case	Key Property
S3	Objects, backups, static assets, data lake	Unlimited scale, 11 nines durability
EBS	EC2 instance storage, databases	Low latency, attached to one instance
EFS	Shared filesystem for multiple EC2	NFS, scales automatically
S3 Glacier	Long-term archival	Very cheap, hours to retrieve
FSx for Lustre	HPC, ML training	Very high throughput

Common architecture question: "Your application needs shared storage accessible by 50 EC2 instances simultaneously." → EFS, not EBS (EBS attaches to one instance) or S3 (latency too high for frequent small file access).

5. When would you use DynamoDB vs RDS?

DynamoDB:

Single-digit millisecond at any scale
Serverless, auto-scales with no configuration
Limited query patterns — must design around access patterns upfront
No joins
Use for: user sessions, leaderboards, IoT telemetry, shopping carts, real-time gaming

RDS (Aurora/PostgreSQL/MySQL):

Rich query language, complex joins, transactions
Familiar SQL
Scale by instance size + read replicas
Use for: financial transactions, CRM data, anything needing complex queries or reporting

The trick question: "Can DynamoDB replace a relational database?" → Sometimes. If your access patterns are known and simple, DynamoDB is superior. If you need ad-hoc querying or complex relationships, RDS is better. Many modern architectures use both.

Networking & Security

6. A web application needs to be accessible from the internet but the database must not be

Standard VPC design:

Internet Gateway
    ↓
Public Subnet (10.0.1.0/24)
  - ALB
  - NAT Gateway
    ↓
Private Subnet - App (10.0.2.0/24)
  - EC2 / ECS tasks
    ↓
Private Subnet - Data (10.0.3.0/24)
  - RDS
  - ElastiCache

Security Groups:

ALB SG: Allow 80/443 from 0.0.0.0/0
App SG: Allow 8080 from ALB SG only
DB SG: Allow 5432 from App SG only

Key point: Security Groups reference each other by SG ID, not IP. This is more secure and doesn't break when instances scale.

7. What is AWS WAF and when do you use it?

AWS WAF (Web Application Firewall) filters HTTP/HTTPS traffic based on rules you define — or managed rule sets from AWS and AWS Marketplace.

Use when:

SQL injection and XSS protection (OWASP rules)
Rate limiting specific paths (login endpoints)
Geo-blocking — block all traffic from specific countries
Bot management
Virtual patching while a code fix is being developed

Integration: Attach to ALB, CloudFront, or API Gateway.

Cost consideration: WAF charges per rule and per million requests. For high-traffic applications, evaluate whether Cloudflare or a dedicated WAF solution might be more cost-effective.

Cost Optimisation Architecture

8. Design a cost-optimised batch processing system on AWS

Requirement: Process 10,000 files nightly. Each file takes 2 minutes. Files arrive by midnight.

Naive approach: 24/7 EC2 fleet → pays for idle time 23 hours/day.

Optimised approach:

S3 (file uploads)
→ EventBridge scheduled rule at 00:05
→ Lambda (trigger)
→ SQS queue (10,000 messages)
→ ECS Fargate tasks (auto-scaled by SQS depth)
→ Process files → write results to S3/RDS
→ Tasks terminate when queue empty

Why this is optimal:

Fargate: Pay only for task execution time (~20,000 minutes/night = ~$3-5)
SQS: Decouples arrival from processing, handles failures with DLQ
Auto-scaling: Spins up enough tasks to process 10K files in parallel, done in 20 minutes

Alternative: AWS Batch with Spot instances — 70-90% cheaper than Fargate for large compute jobs.

Practice Architecture Questions Live

The only way to get good at design questions is to practice designing systems out loud, with someone who pushes back on your choices.

InterviewDrill.io has an AWS Solutions Architect track with real HLD scenarios. First session free → interviewdrill.io