There are three main parts to this server - database application.
The first part was responsible for parsing companies websites to extract data related to jobs.
The second part needed to check the database if the job currently exists within the database.
If it is unique, add it to the database with today's date.
Finally, check the database of newly found jobs using today's date as a filter and E-mail me
a list of the new jobs.
Part 1 is wrapped in a AWS Lambda and executes at 9 AM every morning. Once it is finished
gathering jobs, it invokes the other AWS Lambda to send the E-mail.
The main challenge was when there were many Websites this application had to visit and parse
job related information. In a sychronous app, a visit to the second Website would only start
after the first Website had been completed. One can see that a sychronous app would take a
very long time to complete the task.
In my academic studies, I took a course in parallel programming in C++ and used my knowledge to
research libraries in the Python space. Behold Asyncio and ThreadPoolExecutor! In short, Asyncio
allowed me to write concurrent code using the async/await syntax. When combined with ThreadPoolExecutor,
it allowed me to program using multiple threads the CPUs offer.
After implementing Asyncio and ThreadPoolExecutor, it reduced the running time to 1 minute VS 3+
minutes.