Job Notifier
E-mail new job listings from company's career Website, automatically
About this project

The Job Notifier app tracks new jobs at a handful of the company's career section. When new jobs are found, it notifies the user of the app via E-mail with related data such as the company's name, job title, and a clickable link to the job post.

My desire was to create an application in Python because the language is so versitile. My only production worthy Python app was within my internship in 2019 so I wanted more Python experience.

Moreover, I wanted more exposure to NoSQL database and event-driven, serverless computing. Amazon Web Services provide both of these in DynamoDB and Lambda.

Architecture

There are three main parts to this server - database application.

The first part was responsible for parsing companies websites to extract data related to jobs.

The second part needed to check the database if the job currently exists within the database. If it is unique, add it to the database with today's date.

Finally, check the database of newly found jobs using today's date as a filter and E-mail me a list of the new jobs.

Part 1 is wrapped in a AWS Lambda and executes at 9 AM every morning. Once it is finished gathering jobs, it invokes the other AWS Lambda to send the E-mail.

The main challenge was when there were many Websites this application had to visit and parse job related information. In a sychronous app, a visit to the second Website would only start after the first Website had been completed. One can see that a sychronous app would take a very long time to complete the task.

In my academic studies, I took a course in parallel programming in C++ and used my knowledge to research libraries in the Python space. Behold Asyncio and ThreadPoolExecutor! In short, Asyncio allowed me to write concurrent code using the async/await syntax. When combined with ThreadPoolExecutor, it allowed me to program using multiple threads the CPUs offer.

After implementing Asyncio and ThreadPoolExecutor, it reduced the running time to 1 minute VS 3+ minutes.

Technologies
  • Python
  • Flask
  • Selenium
  • REST API
  • AWS Lambda
  • AWS DynamoDB
  • Asyncio
  • ThreadPoolExecutor