Java Clean Architecture Masterclass

Java Clean Architecture MasterclassNov 20-21

Join

Ksoup: Kotlin Multiplatform HTML & XML Parser

Ksoup is a Kotlin Multiplatform library for working with real-world HTML and XML. It's a port of the renowned Java library, jsoup, and offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM and CSS selectors.

Kotlin MIT License Maven Central

badge-android badge-ios badge-mac badge-tvos badge-jvm badge-linux badge-windows badge-js badge-wasm

🚨 Deprecation Notice

The following extension libraries are deprecated and will be removed in a future release:

Recommendation:

Ksoup implements the WHATWG HTML5 specification, parsing HTML to the same DOM as modern browsers do, but with support for Android, JVM, and native platforms.

Features

Ksoup is adept at handling all varieties of HTML found in the wild.

Getting started

Library Structure

Ksoup follows a modular architecture:

Installation

Include the dependencies in your commonMain. Latest version Maven Central

1. Core Library

Start with the core library. This is all you need if you're only parsing HTML/XML from strings.

// Required core library
implementation("com.fleeksoft.ksoup:ksoup:<version>")

2. I/O Extensions (Optional)

Add one of these extensions only if you need to parse HTML/XML from files or other sources.

Choose one of the following I/O libraries:

  1. kotlinx-io (Recommended)

    // Optional: Add this if you need file parsing capabilities
    // Provides Ksoup.parseFile, Ksoup.parseSource & Other InputStream APIs
    implementation("com.fleeksoft.ksoup:ksoup-kotlinx:<version>")
    
  2. okio

    // Optional: Add this if you need file parsing capabilities
    // Provides Ksoup.parseFile, Ksoup.parseSource & Other InputStream APIs
    implementation("com.fleeksoft.ksoup:ksoup-okio:<version>")
    
  3. korlibs-io (DEPRECATED: Use kotlinx-io instead)

    // Deprecated: Not recommended for new projects
    // Provides Ksoup.parseFile, Ksoup.parseStream & Other InputStream APIs
    implementation("com.fleeksoft.ksoup:ksoup-korlibs:<version>")
    

3. Network Extensions (Optional)

Add one of these extensions only if you need to fetch and parse HTML/XML directly from URLs.

Choose one of the following network libraries:

  1. Ktor 3 (Recommended)

    // Optional: Add this if you need to fetch HTML/XML from URLs
    // Provides Ksoup.parseGetRequest, Ksoup.parseSubmitRequest, Ksoup.parsePostRequest
    implementation("com.fleeksoft.ksoup:ksoup-network:<version>")
    
  2. Ktor 2 (DEPRECATED: Use Ktor 3 instead)

    // Deprecated: Not recommended for new projects
    // Provides Ksoup.parseGetRequest, Ksoup.parseSubmitRequest, Ksoup.parsePostRequest
    implementation("com.fleeksoft.ksoup:ksoup-network-ktor2:<version>")
    
  3. korlibs-io Network (DEPRECATED: Use Ktor 3 instead)

    // Deprecated: Not recommended for new projects
    // Provides Ksoup.parseGetRequest, Ksoup.parseSubmitRequest, Ksoup.parsePostRequest
    implementation("com.fleeksoft.ksoup:ksoup-network-korlibs:<version>")
    

Ksoup supports Charsets

Parsing HTML from a String with Ksoup

For API documentation you can check Jsoup. Most of the APIs work without any changes.

val html = "<html><head><title>One</title></head><body>Two</body></html>"
val doc: Document = Ksoup.parse(html = html)

println("title => ${doc.title()}") // One
println("bodyText => ${doc.body().text()}") // Two

This snippet demonstrates how to use Ksoup.parse for parsing an HTML string and extracting the title and body text.

Fetching and Parsing HTML from a URL using Ksoup

//Please note that the com.fleeksoft.ksoup:ksoup-network library is required for Ksoup.parseGetRequest.
val doc: Document = Ksoup.parseGetRequest(url = "https://en.wikipedia.org/") // suspend function
// or
val doc: Document = Ksoup.parseGetRequestBlocking(url = "https://en.wikipedia.org/")

println("title: ${doc.title()}")
val headlines: Elements = doc.select("#mp-itn b a")

headlines.forEach { headline: Element ->
    val headlineTitle = headline.attr("title")
    val headlineLink = headline.absUrl("href")

    println("$headlineTitle => $headlineLink")
}

Parsing XML

    val doc: Document = Ksoup.parse(xml, parser = Parser = Parser.xmlParser())

Parsing Metadata from Website

//Please note that the com.fleeksoft.ksoup:ksoup-network library is required for Ksoup.parseGetRequest.
val doc: Document = Ksoup.parseGetRequest(url = "https://en.wikipedia.org/") // suspend function
val metadata: Metadata = Ksoup.parseMetaData(element = doc) // suspend function
// or
val metadata: Metadata = Ksoup.parseMetaData(html = HTML)

println("title: ${metadata.title}")
println("description: ${metadata.description}")
println("ogTitle: ${metadata.ogTitle}")
println("ogDescription: ${metadata.ogDescription}")
println("twitterTitle: ${metadata.twitterTitle}")
println("twitterDescription: ${metadata.twitterDescription}")
// Check com.fleeksoft.ksoup.model.MetaData for more fields

In this example, Ksoup.parseGetRequest fetches and parses HTML content from Wikipedia, extracting and printing news headlines and their corresponding links.

Ksoup Public functions

Ksoup I/O Public functions

Ksoup Network Public functions

For further documentation, please check here: Jsoup

Ksoup vs. Jsoup Benchmarks: Parsing & Selecting 448KB HTML File test.tx

Ksoup vs Jsoup

Open source

Ksoup is an open source project, a Kotlin Multiplatform port of jsoup, distributed under the MIT License, Version 2.0. The source code of Ksoup is available on GitHub.

Development and Support

For questions about usage and general inquiries, please refer to GitHub Discussions.

If you wish to contribute, please read the Contributing Guidelines.

To report any issues, visit our GitHub issues, Please ensure to check for duplicates before submitting a new issue.

License

Ksoup is open source software licensed under the MIT License.

This project is a Kotlin Multiplatform port of Jsoup, created by Jonathan Hedley.
Portions of this library are derived from jsoup and retain their original MIT License,
© 2009–2025 Jonathan Hedley.

Join libs.tech

...and unlock some superpowers

GitHub

We won't share your data with anyone else.