HCMUS-Scraper

HCMUS-News Scraper: a web scraper that crawl news from my university websites

Inspired from this github repo

Visit the Results page

Websites list that I scraping from:

https://www.ctda.hcmus.edu.vn/
https://www.fit.hcmus.edu.vn/vn/
https://hcmus.edu.vn/

Technology:

At first, I use Scrapy but then one of the page that I want to crawl has dynamic JS loaded content so I switch to Selenium.

What I have learned:

working with json, basic github ci/cd, scraping static and dynamic content, how to overcome website’s blocking objection.

This site is open source. Improve this page.