BeautifulSoup has a pretty nifty feature where it tries to fix bad HTML like replacing missing tags. So if we put BeautifulSoup in the middle then whatever we get from a site is fixed before we parse it with Scrapy.
Fortunately, all we have to do is pip install Alecxe's scrapy-beautifulsoup middleware.
pip install scrapy-beautifulsoup
Then we configure Scrapy to use it from settings.py:
DOWNLOADER_MIDDLEWARES = { 'scrapy_beautifulsoup.middleware.BeautifulSoupMiddleware': 400 }
BeautifulSoup comes with a default parser named 'html.parse'. We can change it.
BEAUTIFULSOUP_PARSER = "html5lib" # or BEAUTIFULSOUP_PARSER = "lxml"
HTML5 is the better parser IMO but it has to be installed separately.
pip install html5lib