public class Cleaner extends Object
The HTML cleaner parses the input as HTML and then runs it through a white-list, so the output HTML can only contain HTML that is allowed by the whitelist.
It is assumed that the input HTML is a body fragment; the clean methods only pull from the source's body, and the canned white-lists only allow body contained tags.
Rather than interacting directly with a Cleaner object, generally see the clean
methods in Jsoup
.
Modifier and Type | Class and Description |
---|---|
private class |
Cleaner.CleaningVisitor
Iterates the input and copies trusted nodes (tags, attributes, text) into the destination.
|
private static class |
Cleaner.ElementMeta |
Constructor and Description |
---|
Cleaner(Whitelist whitelist)
Create a new cleaner, that sanitizes documents using the supplied whitelist.
|
Modifier and Type | Method and Description |
---|---|
Document |
clean(Document dirtyDocument)
Creates a new, clean document, from the original dirty document, containing only elements allowed by the whitelist.
|
private int |
copySafeNodes(Element source,
Element dest) |
private Cleaner.ElementMeta |
createSafeElement(Element sourceEl) |
boolean |
isValid(Document dirtyDocument)
Determines if the input document bodyis valid, against the whitelist.
|
boolean |
isValidBodyHtml(String bodyHtml) |
public Document clean(Document dirtyDocument)
body
are used.dirtyDocument
- Untrusted base document to clean.public boolean isValid(Document dirtyDocument)
head
.
This method can be used as a validator for user input. An invalid document will still be cleaned successfully
using the clean(Document)
document. If using as a validator, it is recommended to still clean the document
to ensure enforced attributes are set correctly, and that the output is tidied.
dirtyDocument
- document to testpublic boolean isValidBodyHtml(String bodyHtml)
private int copySafeNodes(Element source, Element dest)
private Cleaner.ElementMeta createSafeElement(Element sourceEl)
WebARTS Library Licensed Under the GNU - General Public License. Other Libraries licensed under their respective Open Source Licenses