Inspiro como programador, espiro como diseñador. Disruptive Thinking!
Jobs Crawler Telegram Bot
I developed a bot for telegram using Node.js that can crawl different sites creating a custom parser javascript file for each site and populating a list of detected jobs that match some criteria. Based on filters, the jobs are added to different lists and will be published in different Telegram channels where the bot was granted with permissions.
At this time, multiple public Telegram channels are available:
Node.js is the core of the solution. It runs my Javascript and offers the NPM ecosystem to find useful packages
Node-telegram-bot-api is the package used to communicate with Telegram bot API and publish in Telegram channels.
jQuery is used to filter long HTML strings in a comfortable way by using its HTML node selector based features.
Headless Chromium is used to simulate a user session and navigate some sites that are protected against direct api calls.
SQLite is the selected metadata system as it is fast, reliable and, at the same time, offers an easy to edit format to manipulate the stored metadata.
Final notes:
It is very easy to create a bot using Telegram and NodeJS.
Crawling websites is where the proejct gets complicated as different sites implement different antibot protection.
Stacking and delaying message delivery to Telegram is crucial as Telegram block message flooding.
Create a core bot architecture that allow the use of custom parsers for each site was a great idea as it simplifies logic separation at the same time that make it more scalable.
Two different crawling strategies were developed: Async request to site directly and user session simulation using Chromium.