Internship Scanner

PythonGoogle Cloud FunctionsTelegram APIPerplexity API

A fully automated, cloud-native pipeline that continuously scans the web for finance, consulting, and aerospace engineering internship opportunities across eight European countries — with a deliberate focus on boutique and lesser-known firms where competition is lower. Built on Google Cloud Functions with Perplexity AI search and a Telegram bot interface, the system has surfaced over 200 opportunities since launch at a total infrastructure cost of $0.82.

200+

internships found

countries covered

$0.82

total cost since March 2026

Project Details

Technologies

Python, Google Cloud Functions, Telegram API, Perplexity API

Status

Active

Institution

Independent Project

GitHub

https://github.com/vvdb21

Project Overview

Manual internship hunting is slow, repetitive, and biased towards well-known firms - because those are the ones that show up in generic searches. This project replaces that process with an intelligent monitoring system that runs continuously in the background, proactively identifying and delivering relevant opportunities without any manual effort.

The bot covers management consulting, strategy consulting, investment banking, asset management, private equity, venture capital, hedge funds, and aerospace engineering roles across the UK, Belgium, Netherlands, France, Germany, Spain, Switzerland, and Italy. It targets penultimate-year undergraduate positions specifically, and deliberately prioritises boutique and SME firms alongside household names — the ones that are often overlooked but are significantly less competitive to apply to.

System Architecture

The Telegram bot interface — users receive alerts and interact with the system entirely from their phones.

The system follows a modular, serverless design built on Google Cloud Functions. Five discrete endpoints handle different parts of the pipeline:

/scan — the core scheduled function, triggered every three days by Cloud Scheduler. It queries the Perplexity AI API to search for new internship opportunities across the target sectors and geographies, processes the results, deduplicates against the existing database, and pushes alerts to all subscribers via Telegram.
/scan_firms — an admin-triggered endpoint used at the start of each internship cycle to seed the database with opportunities from specific named firms. This bulk-seeds well-known large firms upfront, so that subsequent scheduled scans can focus on surfacing boutique and lesser-known opportunities not already in the system.
/reply — handles all inbound Telegram messages. Users reply YES or NO to each alert to save or dismiss an opportunity, type APPLIED to retrieve their full saved list, and LEAVE or JOIN to manage their subscription.
/remind — checks the database daily and sends personalised opening date reminders to relevant subscribers, both one week before and on the day an application window opens.
/broadcast — an admin endpoint for sending a manual message to all subscribers.

All endpoints are secured with environment variables. The entire system runs idle between scheduled triggers, keeping infrastructure costs near zero.

Scanning Engine

The structured prompt sent to Perplexity AI on each scan cycle, requesting a strict JSON response with all required opportunity fields.

Each scan queries Perplexity AI's sonar-pro model with a structured prompt that requests a strict JSON response — no markdown, no preamble. The prompt specifies the target sectors, geographies, firm sizes, degree level, and year group, and instructs the model to source directly from company career pages, LinkedIn, Glassdoor, and Indeed. The system prompt enforces JSON-only output, and the response is parsed and validated before any data is written.

Each opportunity is returned with: firm name, role title, CV and cover letter requirements, application open and close dates, a direct URL to the application page, location, and firm size classification (large / boutique / SME).

Deduplication & Data Storage

The Google Sheets backend, which serves as both the opportunity database and the admin dashboard.

Because the system scans every three days, it frequently encounters listings that were already found in a previous cycle. A deduplication layer checks every incoming result against the full history of recorded entries in the Google Sheets database, matching on firm name and role title. Only genuinely new opportunities trigger a Telegram alert and a new database row.

Google Sheets serves as the persistent backend — storing opportunity IDs, firm and role details, application requirements, dates, URLs, firm size, status, and a per-opportunity record of which subscribers have saved it. This doubles as a lightweight admin dashboard, giving a live view of everything the system has found.

Future Improvements

Intelligent fit filtering: Integrating NLP to score job descriptions against a target profile, allowing the system to rank and filter opportunities by cultural or technical fit before sending alerts.
Custom web dashboard: Replacing the Google Sheets interface with a purpose-built dashboard offering application funnel analytics and deadline visualisation.
Expanded scope: Extending the search parameters to cover graduate schemes and entry-level full-time roles, making the system useful beyond the internship lifecycle.
Improved reliability: Adding structured logging and error handling to provide proactive monitoring and automatic recovery from failed API calls or sync interruptions.