Internship Scanner - Image 1

Internship Scanner

PythonGoogle Cloud FunctionsTelegram APIPerplexity API

An automated pipeline that continuously scans the web for finance and consulting internship opportunities across Europe, with a focus on lesser-known boutique firms where competition is lower. Built on Google Cloud Functions with a Perplexity AI search integration, the system delivers real-time alerts via Telegram, allowing subscribers to track and manage opportunities directly from their phones. Features include automated opening date reminders, two-way subscriber management, and a custom admin interface for targeted firm scanning — all running at near-zero cost.

Project Details

Technologies

Python, Google Cloud Functions, Telegram API, Perplexity API

Category

Coding

Status

Completed

Institution

Independent Project

Key Features

  • Perplexity AI integration for dynamic web scraping and opportunity discovery
  • Hosted on Google Cloud Functions for scalable, serverless execution with minimal maintenance
  • Real-time Telegram alerts for new internship postings, with interactive subscriber management
  • Google Sheets database for tracking and managing internship applications and deadlines
  • Code on my GitHub

Project Overview

The Internship Discovery & Management Bot is a fully automated, cloud-native pipeline designed to solve the inefficiency of manual job hunting in competitive finance and consulting markets. By replacing tedious, repetitive web searches with an intelligent monitoring system, the project ensures that users stay ahead of application windows—specifically targeting high-value opportunities at boutique firms that are often overlooked in mass-market searches. The system functions as a continuous intelligence layer, proactively identifying and filtering relevant roles from across the European market to deliver high-quality, actionable leads directly to the user.

Google Sheets Database

System Architecture

Telegram bot user interface

The architecture follows a modular, serverless design centered on Google Cloud Functions, which execute discrete logic for scanning, notification, and data management. The research engine leverages the Perplexity AI API to perform automated web scraping and data extraction, with results processed through a deduplication layer to ensure only unique opportunities are recorded. This data is synchronized with a Google Sheets backend, which serves as both a database and an administrative dashboard for the pipeline. Inter-service communication is facilitated via Telegram API, providing a bidirectional interface that allows for immediate status updates and interaction. The entire infrastructure is orchestrated by Cloud Scheduler, which triggers periodic, low-cost scans while maintaining minimal overhead by remaining idle between operations.

Deduplication

A significant technical challenge in the project was designing a robust deduplication logic to handle redundant job postings across multiple search iterations. Because the system performs automated scans every three days, it frequently encounters the same internship listings that have not yet been removed from company career portals.

Internship Prompt

I addressed this by implementing a custom hashing algorithm that evaluates a combination of unique identifiers—specifically the firm name and role title—against a persistent history of previously recorded entries in the Google Sheets database. This ensures that the system only alerts the user to fresh, unique opportunities, effectively filtering out thousands of redundant signals and maintaining the integrity of the application pipeline.

Future Improvements

While the current system provides a robust framework for internship discovery, several potential enhancements could further refine its utility and scalability. These are ideas currently under consideration for future development:

  • Intelligent Filtering: Integrating advanced natural language processing to perform "fit" analysis on job descriptions, allowing for more granular filtering based on specific organizational culture or technical skill requirements.
  • Dedicated Dashboard: Transitioning from the current spreadsheet-based interface to a custom web dashboard, which would provide interactive application funnel analytics and advanced performance visualization.
  • Expanded Scope: Scaling the search parameters to track full-time graduate programs and entry-level career opportunities, extending the system's value beyond the student internship lifecycle.
  • Enhanced Reliability: Implementing a more sophisticated logging and error-handling suite to provide proactive monitoring, ensuring immediate recovery from failed API calls or synchronization interruptions.