HTML Entity Encoding: Preventing XSS and Displaying Special Characters
HTML entities encode special characters so they display correctly in browsers and do not break HTML structure. Learn which characters must be escaped, how entity encoding prevents XSS attacks, and how to encode and decode in JavaScript.
What Are HTML Entities?
An HTML entity is a text representation of a character using an ampersand (&), a name or number, and a semicolon (;). It allows you to display characters that have special meaning in HTML without the browser interpreting them as markup.
For example, the less-than sign < has special meaning in HTML (it starts a tag). To display it as the literal character <, write <.
Named and Numeric Entities
HTML entities come in two forms:
Named entities — readable names defined by the HTML specification:
| Character | Named entity | Description |
|---|---|---|
< | < | Less-than sign |
> | > | Greater-than sign |
& | & | Ampersand |
" | " | Double quotation mark |
' | ' | Apostrophe (HTML5) |
| | Non-breaking space |
© | © | Copyright sign |
® | ® | Registered trademark |
™ | ™ | Trademark sign |
€ | € | Euro sign |
£ | £ | Pound sign |
Numeric entities — decimal or hexadecimal code points, supporting any Unicode character:
<
<!-- decimal: < -->
<
<!-- hexadecimal: < -->
♥
<!-- decimal: ♥ -->
♥
<!-- hexadecimal: ♥ -->
The Critical Five: XSS Prevention
The five characters that must always be escaped when inserting user-supplied content into HTML:
| Character | Entity | Why it must be escaped |
|---|---|---|
< | < | Starts an HTML tag |
> | > | Ends an HTML tag |
& | & | Starts an entity reference |
" | " | Closes an attribute value (double-quoted) |
' | ' | Closes an attribute value (single-quoted) |
Failing to escape these characters allows Cross-Site Scripting (XSS) attacks — an attacker injects malicious HTML or JavaScript by including it in user-controlled content.
<!-- UNSAFE: user input ""><script>alert(1)</script>" injected directly -->
<div class="user-name">
">
<script>
alert(1);
</script>
"
</div>
<!-- SAFE: after encoding -->
<div class="user-name">"><script>alert(1)</script>"</div>
HTML Entity Encoding in JavaScript
Escaping for HTML output
function escapeHtml(str) {
return str
.replace(/&/g, "&")
.replace(/</g, "<")
.replace(/>/g, ">")
.replace(/"/g, """)
.replace(/'/g, "'");
}
escapeHtml('<script>alert("xss")</script>');
// "<script>alert("xss")</script>"
Note: always escape & first — if you escaped < first, a subsequent & replacement would double-encode the &.
Using the DOM (browser only)
The browser's own parser handles escaping:
function escapeHtml(str) {
const div = document.createElement("div");
div.textContent = str; // textContent is automatically escaped
return div.innerHTML;
}
This is safe and handles all edge cases the browser's parser handles.
Unescaping
function unescapeHtml(str) {
const doc = new DOMParser().parseFromString(str, "text/html");
return doc.documentElement.textContent;
}
Context Matters: Attribute vs Text Content
The same character may need different escaping depending on where it appears:
- Text content → escape
<,>,& - Double-quoted attribute → escape
<,>,&," - Single-quoted attribute → escape
<,>,&,' - Unquoted attribute (avoid) → many more characters need escaping
Always use quoted attributes and always escape the quote character used.
JavaScript Context (Different Rules)
If you are injecting data into a <script> block (rather than HTML), HTML entity encoding is wrong — the JavaScript engine does not decode HTML entities. Use JSON.stringify to safely embed data in JavaScript:
<!-- SAFE: JSON-encoded data for JavaScript -->
<script>
const userData = JSON.parse(document.getElementById("data").textContent);
</script>
<script id="data" type="application/json">
{ "name": "Alice \u003cScript\u003e Engineer" }
</script>
Note: \u003c and \u003e are Unicode escape sequences for < and > — safe inside JSON within a script tag.
Frameworks Handle This for You
Modern frameworks (React, Vue, Angular, Svelte) automatically escape HTML content in templates. React's JSX renders text as safe text nodes by default. The escape hatch (dangerouslySetInnerHTML in React) should be used with extreme caution and only with content that has been sanitised by a trusted library (like DOMPurify).