The Principal Dev – Masterclass for Tech Leads

The Principal Dev – Masterclass for Tech Leads28-29 May

Join

New Python HTML Libraries 2026

GitHub Libraries Python HTML Libraries

html5lib/html5lib-python 1K +1

added 1 year ago

Standards-compliant library for parsing and serializing HTML documents and fragments in Python

alir3z4/html2text 2K +2

added 1 year ago

Convert HTML to Markdown-formatted text.

gawel/pyquery 2K -2

added 1 year ago

A jQuery-like library for python.

mozilla/bleach 2K +1

added 1 year ago

Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes

buriy/python-readability 2K +4

added 1 year ago

Given an HTML document, extract and clean up the main body text and title.

lxml/lxml 3K +7

added 1 year ago

lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language

scrapy/parsel 1K +3

added 1 year ago

Parsel lets you extract data from XML/HTML/JSON documents using XPath or CSS selectors.

psf/requests-html 13K -6

added 1 year ago

This library intends to make parsing HTML as simple and intuitive as possible.

Join libs.tech

...and unlock some superpowers

GitHub

We won't share your data with anyone else.