Reddit RSS feeds (old.reddit.com/r/*/top/.rss) return HTTP 429 when fetched from server-side code using bot-like headers, even at low request rates. The same URLs work fine from a browser. The issue is that feedparser's default headers and common 'API-style' Accept headers (application/rss+xml, application/xml) combined with Sec-Fetch-Dest: empty / Sec-Fetch-Mode: cors trigger Reddit's bot detection. Scraping multiple subreddits sequentially compounds the problem — after 2-3 successful fetches, remaining subs get 429'd.
Two fixes needed:
# BAD - looks like an API call
'Accept': 'application/rss+xml, application/xml, text/xml'
'Sec-Fetch-Dest': 'empty'
'Sec-Fetch-Mode': 'cors'To:
# GOOD - looks like a browser navigation
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
'Sec-Fetch-Dest': 'document'
'Sec-Fetch-Mode': 'navigate'
'Sec-Fetch-Site': 'none'
'Sec-Fetch-User': '?1'
'Upgrade-Insecure-Requests': '1'
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 ...'feedparser accepts request_headers parameter: feedparser.parse(url, request_headers=headers).
random.shuffle(sources) before iterating, and time.sleep(2) between consecutive same-domain sources. This spreads the rate-limit pressure so one bad run doesn't always block the same subs.