Learn Python Series (#41) - Asynchronous Python Part 2

Repository
What will I learn?
- You will learn why semaphores matter and how they prevent resource exhaustion;
- the mental model behind rate limiting and respectful API usage;
- how async HTTP differs from synchronous requests and why that matters;
- production patterns for retry logic, timeouts, and error handling;
- when to use connection pooling and session reuse.
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- An installed Python 3(.11+) distribution;
- The ambition to learn Python programming.
Difficulty
- Intermediate, advanced
Curriculum (of the Learn Python Series):
- Learn Python Series - Intro
- Learn Python Series (#2) - Handling Strings Part 1
- Learn Python Series (#3) - Handling Strings Part 2
- Learn Python Series (#4) - Round-Up #1
- Learn Python Series (#5) - Handling Lists Part 1
- Learn Python Series (#6) - Handling Lists Part 2
- Learn Python Series (#7) - Handling Dictionaries
- Learn Python Series (#8) - Handling Tuples
- Learn Python Series (#9) - Using Import
- Learn Python Series (#10) - Matplotlib Part 1
- Learn Python Series (#11) - NumPy Part 1
- Learn Python Series (#12) - Handling Files
- Learn Python Series (#13) - Mini Project - Developing a Web Crawler Part 1
- Learn Python Series (#14) - Mini Project - Developing a Web Crawler Part 2
- Learn Python Series (#15) - Handling JSON
- Learn Python Series (#16) - Mini Project - Developing a Web Crawler Part 3
- Learn Python Series (#17) - Roundup #2 - Combining and analyzing any-to-any multi-currency historical data
- Learn Python Series (#18) - PyMongo Part 1
- Learn Python Series (#19) - PyMongo Part 2
- Learn Python Series (#20) - PyMongo Part 3
- Learn Python Series (#21) - Handling Dates and Time Part 1
- Learn Python Series (#22) - Handling Dates and Time Part 2
- Learn Python Series (#23) - Handling Regular Expressions Part 1
- Learn Python Series (#24) - Handling Regular Expressions Part 2
- Learn Python Series (#25) - Handling Regular Expressions Part 3
- Learn Python Series (#26) - pipenv & Visual Studio Code
- Learn Python Series (#27) - Handling Strings Part 3 (F-Strings)
- Learn Python Series (#28) - Using Pickle and Shelve
- Learn Python Series (#29) - Handling CSV
- Learn Python Series (#30) - Data Science Part 1 - Pandas
- Learn Python Series (#31) - Data Science Part 2 - Pandas
- Learn Python Series (#32) - Data Science Part 3 - Pandas
- Learn Python Series (#33) - Data Science Part 4 - Pandas
- Learn Python Series (#34) - Working with APIs in 2026: What's Changed
- Learn Python Series (#35) - Working with APIs Part 2: Beyond GET Requests
- Learn Python Series (#36) - Type Hints and Modern Python
- Learn Python Series (#37) - Virtual Environments and Dependency Management
- Learn Python Series (#38) - Testing Your Code Part 1
- Learn Python Series (#39) - Testing Your Code Part 2
- Learn Python Series (#40) - Asynchronous Python Part 1
- Learn Python Series (#41) - Asynchronous Python Part 2 (this post)
In episode #40, we learned async fundamentals - event loops, coroutines, concurrent task execution. Now we apply those concepts to real-world HTTP requests and production async patterns.
The problem: you need to fetch data from 1000 URLs. Synchronous code would take forever. Async seems like the answer - start all 1000 requests concurrently! But that creates new problems.
Nota bene: This episode is about controlled concurrency and production-ready async patterns, not just making things concurrent.
The problem with unlimited concurrency
Imagine starting 1000 HTTP requests simultaneously. What happens?
Your machine opens 1000 TCP connections at once. Your OS has connection limits - you might hit them. The remote server receives 1000 simultaneous requests from your IP - rate limiting kicks in, requests fail, you get banned.
Memory usage spikes - each connection consumes buffers. Network bandwidth saturates. The event loop struggles managing 1000 concurrent tasks.
Most critically: you're being a bad citizen. Hammering someone's API with 1000 simultaneous requests is abuse, even if unintentional.
The solution: controlled concurrency. Run many tasks concurrently, but limit HOW many at once.
Semaphores: concurrency throttles
A semaphore is a counter that limits concurrent access to a resource. Think of it like a parking lot with limited spaces.
The lot has 5 spaces. Car 1 enters (counter: 4 spaces left). Cars 2, 3, 4, 5 enter (counter: 0). Car 6 arrives - lot is full, must wait. Car 1 leaves (counter: 1). Now car 6 can enter.
In code: you create a semaphore with capacity 5. Tasks acquire the semaphore before proceeding. If 5 tasks already hold it, task 6 waits. When a task finishes, it releases the semaphore, allowing waiting tasks to proceed.
This limits concurrent operations to 5 at any moment, even if you have 1000 total tasks queued.
The mental model: semaphores are waiting rooms. Limited seats. When full, newcomers wait for someone to leave.
Here's a semaphore throttling 1000 tasks down to 10 concurrent:
import asyncio
import aiohttp
sem = asyncio.Semaphore(10)
async def fetch(session, url):
async with sem: # blocks if 10 tasks already inside
async with session.get(url) as resp:
return await resp.text()
async def main():
urls = [f"https://api.example.com/item/{i}" for i in range(1000)]
async with aiohttp.ClientSession() as session:
tasks = [fetch(session, url) for url in urls]
results = await asyncio.gather(*tasks)
print(f"Fetched {len(results)} pages")
asyncio.run(main())
All 1000 tasks are created immediately — asyncio.gather schedules them all. But the semaphore ensures only 10 run their HTTP request at any given moment. The other 990 wait their turn at the async with sem line. Clean, simple, effective.
Rate limiting: respecting API constraints
Most APIs have rate limits: "100 requests per minute" or "10 requests per second". Exceed these and you get errors or bans.
Semaphores control instantaneous concurrency (how many right now). Rate limiting controls throughput over time (how many per period).
The simplest rate limiting: add delays. If the API allows 10 requests/second, wait 0.1 seconds between requests. But this wastes time - you're artificially slowing yourself.
Better: token bucket algorithm. You have a bucket holding tokens. Each request consumes a token. Tokens regenerate at a fixed rate (10/second). If the bucket is empty, wait for a token to regenerate.
This allows bursts - if you haven't made requests recently, the bucket is full, you can fire off 10 immediately. Then it drains and regenerates at the allowed rate.
A simple rate limiter using asyncio.sleep:
import asyncio
import time
class RateLimiter:
def __init__(self, calls_per_second):
self.delay = 1.0 / calls_per_second
self.last_call = 0.0
async def wait(self):
now = time.monotonic()
wait_time = self.delay - (now - self.last_call)
if wait_time > 0:
await asyncio.sleep(wait_time)
self.last_call = time.monotonic()
limiter = RateLimiter(calls_per_second=10)
async def fetch_with_limit(session, url):
await limiter.wait()
async with session.get(url) as resp:
return await resp.text()
Combine this with semaphores and you control both instantaneous concurrency AND throughput over time. For most use cases, that combination is sufficient.
Connection pooling and session reuse
Creating HTTP connections is expensive. TCP handshake, DNS lookup, TLS negotiation if HTTPS. For a single request, fine. For 1000 requests, wasteful.
Session reuse solves this. Create one session object, make all requests through it. The session maintains a connection pool - reuses open connections instead of creating new ones for each request.
In aiohttp:
async with aiohttp.ClientSession() as session:
# All requests reuse connections from the session's pool
for url in urls:
async with session.get(url) as response:
data = await response.text()
Creating a session per request defeats this - you lose connection reuse. One session for many requests is the pattern.
Error handling patterns
Real networks fail. Servers return errors. Connections timeout. Production async code must handle these gracefully.
The patterns:
Timeouts: Every operation should have a timeout. Without it, a stuck request blocks resources indefinitely. Wrap operations with asyncio.wait_for(operation, timeout=10).
Retries with backoff: Transient errors (network blip, server overload) often succeed on retry. But don't retry immediately - you'll hammer the same struggling server. Wait, then retry. Wait longer, retry again. Exponential backoff: 1s, 2s, 4s, 8s...
Circuit breakers: If a service fails repeatedly, stop trying. Track failure rate - if it crosses a threshold, "open the circuit" (stop making requests). After a cooldown period, try again ("half-open"). If it succeeds, close the circuit (resume normal operation).
These patterns prevent cascading failures. One slow service doesn't bring down your entire application.
Let's implement timeouts and retries with exponential backoff:
import asyncio
import aiohttp
import random
async def fetch_with_retry(session, url, max_retries=3, timeout=10):
for attempt in range(max_retries):
try:
async with asyncio.timeout(timeout):
async with session.get(url) as resp:
if resp.status == 429: # Too Many Requests
retry_after = int(resp.headers.get("Retry-After", 5))
await asyncio.sleep(retry_after)
continue
resp.raise_for_status()
return await resp.json()
except asyncio.TimeoutError:
pass # fall through to retry logic below
except aiohttp.ClientError:
pass
if attempt < max_retries - 1:
delay = (2 ** attempt) + random.uniform(0, 1)
await asyncio.sleep(delay)
raise Exception(f"Failed after {max_retries} attempts: {url}")
The asyncio.timeout (Python 3.11+) context manager cancels the request if it takes too long — no hanging forever. The retry loop uses exponential backoff: first retry waits ~1s, second ~2s, third ~4s. The random.uniform adds jitter so multiple clients don't all retry at the exact same moment, which would just create another traffic spike.
Note how we also handle HTTP 429 (Too Many Requests) separately — the server is telling us to slow down, so we respect its Retry-After header instead of guessing.
When to use async HTTP vs sync
Async adds complexity. When is it worth it?
Use async when:
- You're making many concurrent requests (dozens to thousands)
- Requests to different services can overlap (microservices, aggregation)
- You're building a server handling many simultaneous clients
Use sync (requests library) when:
- You're making a few sequential requests
- Code simplicity matters more than speed
- You're working in a larger sync codebase
Don't use async just because it's "modern". Use it when concurrency provides meaningful benefit.
Putting it all together
Here's a production-grade fetcher combining everything from this episode — semaphore, session reuse, retries, timeouts, and rate limiting:
import asyncio
import aiohttp
import time
async def fetch_many(urls, concurrency=20, timeout=15, retries=3):
sem = asyncio.Semaphore(concurrency)
results = {}
async def worker(session, url):
async with sem:
for attempt in range(retries):
try:
async with asyncio.timeout(timeout):
async with session.get(url) as resp:
resp.raise_for_status()
results[url] = await resp.text()
return
except (asyncio.TimeoutError, aiohttp.ClientError) as e:
if attempt == retries - 1:
results[url] = f"FAILED: {e}"
else:
await asyncio.sleep(2 ** attempt)
connector = aiohttp.TCPConnector(limit=concurrency)
async with aiohttp.ClientSession(connector=connector) as session:
tasks = [worker(session, url) for url in urls]
await asyncio.gather(*tasks)
return results
TCPConnector(limit=concurrency) caps the connection pool to match our semaphore — no point opening more TCP connections than we'll use simultaneously. The semaphore controls task-level concurrency, the connector controls socket-level concurrency. Both should agree.
Each failed request retries with backoff. Each request has a timeout. All requests share one session (connection pooling). And no more than 20 run concurrently. That's a responsible, production-ready fetcher in about 25 lines.
Production checklist
For production async HTTP code:
- Limit concurrency with semaphores (protect your resources and theirs)
- Respect rate limits with delays or token buckets (stay within API constraints)
- Reuse sessions for connection pooling (performance)
- Set timeouts on all operations (prevent hangs)
- Implement retries with exponential backoff (handle transient failures)
- Log errors with context (debugging distributed systems is hard)
- Monitor metrics (request rate, error rate, latency percentiles)
These aren't optional extras - they're requirements for reliable async systems.
The important bits
In this episode, we covered production async patterns:
- Why unlimited concurrency causes problems despite being technically possible
- Semaphores as throttles limiting instantaneous concurrent operations
- Rate limiting to respect API constraints over time
- Connection pooling and session reuse for performance
- Error handling patterns: timeouts, retries with backoff, circuit breakers
- When async HTTP provides value vs when sync is simpler and sufficient
- Production checklist for reliable async applications
Async isn't just about speed. It's about efficiently managing many concurrent I/O operations while remaining a responsible system citizen.
Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!
Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).
Consider setting @stemsocial as a beneficiary of this post's rewards if you would like to support the community and contribute to its mission of promoting science and education on Hive.