Build a Reddit + Stack Overflow Monitor That Sends You Opportunities on Discord
I run a developer resource site with 200+ pages — cheat sheets, error fixes, tutorials, comparisons. The problem: how do I find people who actually need this content, right when they’re asking for help?
The answer: a Python script that scans Reddit, Stack Overflow, and Hacker News every 2 hours, matches posts against my content library, and sends opportunities to Discord. No API keys needed. Runs for free on GitHub Actions.
Here’s exactly how I built it.
What we’re building
A monitoring system that:
- Scans 37 subreddits, Stack Overflow, and Hacker News for new posts
- Matches posts against your content using phrase-based keyword matching
- Sends matched opportunities to a Discord channel via webhook
- Runs automatically every 2 hours using GitHub Actions
- Tracks seen posts so you never get duplicate notifications
Tech stack: Python (standard library only), GitHub Actions, Discord webhooks.
Step 1: Set up the Discord webhook
First, you need a place to receive notifications. Discord webhooks are the simplest option — no bot setup, no authentication, just a URL you POST to.
- Open Discord → go to your server → pick a channel (or create one called
#opportunities) - Click the gear icon → Integrations → Webhooks → New Webhook
- Name it something like “Content Monitor”
- Click Copy Webhook URL
That URL is all you need. Test it with curl:
curl -X POST "YOUR_WEBHOOK_URL" \
-H "Content-Type: application/json" \
-d '{"content": "Hello from the monitor!"}'
If you see the message in Discord, you’re good. That’s how webhooks work — you POST JSON data to a URL and the receiving service handles it.
Step 2: Understand the data sources
We’re pulling from three sources, all with free public APIs:
Reddit exposes every subreddit as JSON by appending .json to the URL:
https://www.reddit.com/r/learnprogramming/new.json?limit=25
No API key needed. Just set a User-Agent header (Reddit blocks requests without one). Each post has a title, selftext (body), permalink, created_utc, and num_comments.
Stack Overflow
The Stack Exchange API is free for basic usage:
https://api.stackexchange.com/2.3/questions?order=desc&sort=creation&tagged=python&site=stackoverflow&filter=withbody&pagesize=20
Returns questions with title, body, tags, answer count, and creation date. We filter for questions with 2 or fewer answers — those are the ones where your answer will actually be seen.
Hacker News
The HN Firebase API is dead simple:
https://hacker-news.firebaseio.com/v0/askstories.json # Get Ask HN story IDs
https://hacker-news.firebaseio.com/v0/item/12345.json # Get a specific story
We focus on “Ask HN” posts since those are questions where you can provide helpful answers.
Step 3: Build the content index
The script needs to know what content you have so it can match posts against it. We read all markdown files from the blog directory and extract titles and tags:
import os, re
KEYWORD_MAP = {}
def load_content_index():
blog_dir = "src/content/blog"
for f in os.listdir(blog_dir):
if not f.endswith(".md"):
continue
slug = f.replace(".md", "")
with open(os.path.join(blog_dir, f)) as fh:
head = fh.read(1500) # Only need the frontmatter
# Extract title
m = re.search(r'^title:\s*["\'](.*?)["\']', head, re.M)
title = m.group(1) if m else slug
# Extract tags
m = re.search(r'^tags:\s*\[(.*?)\]', head, re.M)
tags = [t.strip().strip('"').strip("'")
for t in m.group(1).split(",")] if m else []
KEYWORD_MAP[slug] = {
"title": title,
"tags": tags,
"url": f"https://yoursite.com/blog/{slug}/",
}
This uses regex to parse the YAML frontmatter. We only read the first 1500 characters since the frontmatter is always at the top.
Step 4: Build the matching engine
This is the most important part. Bad matching = useless notifications. The key insight: match on phrases, not single words.
Single-word matching (“python”, “docker”, “git”) matches everything and gives you garbage results. Phrase matching (“learning python”, “cannot read property of undefined”, “react vs vue”) gives you high-intent matches.
We build different matching rules based on content type:
# Error fix pages — match on the actual error message
# "cannot read property of undefined" → our error fix page
error_phrases = {
"cannot-read-property-undefined": [
"cannot read property of undefined",
"cannot read properties of undefined",
],
"git-merge-conflict": [
"merge conflict",
"resolve conflict",
],
# ... one entry per error fix page
}
# Comparison pages — match on "X vs Y" patterns
# "react vs vue" or "should I use react or vue" → our comparison
comparison_phrases = [
f"{a} vs {b}", f"{b} vs {a}",
f"{a} or {b}", f"{b} or {a}",
f"should i use {a} or {b}",
]
# Tutorial pages — match on learning intent
# "learning python" or "new to docker" → our tutorial
tutorial_phrases = [
f"what is {tech}", f"new to {tech}",
f"learning {tech}", f"{tech} for beginners",
]
The matching function checks every post against every rule:
def match_post(title, body):
text = (title + " " + body).lower()
matches = []
for slug, display_title, url, phrases, match_type in CONTENT_RULES:
matched = [p for p in phrases if p in text]
if not matched:
continue
score = len(matched)
if match_type == "error":
score += 2 # Error matches are high-intent
matches.append({
"title": display_title,
"url": url,
"score": score,
"matched": matched[:3],
})
matches.sort(key=lambda x: x["score"], reverse=True)
return matches[:3]
Error matches get a bonus score because someone pasting an error message is the highest-intent signal — they have a problem right now and need a fix.
Step 5: Deduplication
Without deduplication, you’ll get the same post every time the script runs. We track seen posts using a simple JSON file:
import hashlib, json, time
SEEN_FILE = "scripts/.seen-posts.json"
def load_seen():
try:
with open(SEEN_FILE) as f:
data = json.load(f)
# Prune entries older than 48 hours
cutoff = time.time() - 172800
return {k: v for k, v in data.items() if v > cutoff}
except (FileNotFoundError, json.JSONDecodeError):
return {}
def post_id(url):
return hashlib.md5(url.encode()).hexdigest()[:12]
Each post URL gets hashed to a short ID. We store the ID with a timestamp and auto-prune after 48 hours to keep the file small.
Step 6: Send to Discord
Discord webhooks accept up to 10 embeds per request. Embeds are richer than plain text — they support titles, links, colors, and footers:
import json, urllib.request
def send_discord(webhook_url, embeds):
for i in range(0, len(embeds), 10):
batch = embeds[i:i+10]
payload = json.dumps({"embeds": batch}).encode()
req = urllib.request.Request(
webhook_url,
data=payload,
headers={"Content-Type": "application/json"},
)
urllib.request.urlopen(req, timeout=10)
Each embed looks like this:
{
"title": "🔥 r/learnpython — Is this a good way to self-learn Python?",
"url": "https://reddit.com/r/learnpython/...",
"description": "💡 [Python Cheat Sheet](https://yoursite.com/blog/python-cheat-sheet/)\n ↳ matched: *learning python, self-learn python*",
"color": 0x6366F1, # Purple for Reddit
"footer": {"text": "💬 3 replies · 45m ago"},
}
We color-code by source: purple for Reddit, orange for Stack Overflow, another orange for Hacker News.
Step 7: Automate with GitHub Actions
The script works locally, but you want it running automatically. GitHub Actions lets you run code on a schedule using cron syntax — for free.
Create .github/workflows/reddit-monitor.yml:
name: Content Monitor
on:
schedule:
- cron: '15 */2 * * *' # Every 2 hours at :15
workflow_dispatch: # Manual trigger button
jobs:
monitor:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- run: python scripts/reddit-monitor.py
env:
DISCORD_WEBHOOK: ${{ secrets.DISCORD_WEBHOOK }}
The cron expression 15 */2 * * * means “at minute 15 of every 2nd hour” — so 0:15, 2:15, 4:15, etc. We offset by 15 minutes to avoid the rush of jobs that run at :00.
Add your Discord webhook URL as a repository secret:
- Go to your repo → Settings → Secrets and variables → Actions
- Click “New repository secret”
- Name:
DISCORD_WEBHOOK, Value: your webhook URL
Step 8: Test it
Run locally first:
DISCORD_WEBHOOK="your-webhook-url" python scripts/reddit-monitor.py
You should see output like:
Loaded 156 matching rules from 229 files
Scanning Reddit...
r/learnprogramming
r/learnpython
...
Scanning Stack Overflow...
Scanning Hacker News...
Total posts fetched: 315
New opportunities: 14
Sent 14 opportunities to Discord
Check your Discord channel — you should see color-coded embeds with matched content links.
Run it again immediately to verify deduplication:
Total posts fetched: 315
New opportunities: 0
Zero duplicates. The seen file is working.
The result
Every 2 hours, I get Discord notifications like:
🔥 r/learnpython — Is this a good way to self-learn Python for finance? 💡 Python Cheat Sheet — The Only Reference You Need ↳ matched: learning python, self-learn python 💬 3 replies · 45m ago
I click through, read the post, and if it’s a good fit, I write a helpful answer that naturally links to my content. No spam, no self-promotion — just answering questions with useful resources.
What I’d improve
- Smarter matching — NLP or embeddings instead of phrase matching would catch more nuanced questions
- Priority scoring — weight by subreddit size, post upvotes, and comment count
- Response templates — pre-generate answer drafts based on the matched content
- Analytics — track which answers drive the most traffic back to the site
But the simple version works surprisingly well. 14 quality matches from 315 posts is a solid hit rate, and each one is a genuine opportunity to help someone.
Full source code
The complete script is about 300 lines of Python with zero dependencies (standard library only). You can find it in the GitHub repo.