DevGizmo
Back to Blog
string·

Text Analysis: Word Count, Character Count, and Readability Metrics Explained

Learn how word counters and text analysis tools work under the hood. Understand the difference between word, character, sentence, and paragraph counts — and what metrics like reading time are based on.

word-counttext-analysisreadabilityjavascript

What Makes Counting Words Non-Trivial?

Counting the words in a string seems trivial — just split on spaces. But real text has edge cases: multiple consecutive spaces, tabs, newlines, punctuation attached to words, hyphenated compounds, and contractions. Different tools give different counts for the same text because they handle these cases differently.

Word Count

The most common approach is to split on whitespace and filter empty segments:

function countWords(text) {
  return text.trim().split(/\s+/).filter(Boolean).length;
}

This handles multiple spaces and tabs but counts hyphenated words like well-known as one word. Whether that is correct depends on your use case — editors often count them as one, while some academic tools split them.

An alternative uses a word-boundary pattern:

function countWords(text) {
  const matches = text.match(/\b\w+('\w+)?\b/g);
  return matches ? matches.length : 0;
}

This treats contractions like don't as one word and ignores standalone punctuation.

Character Count

There are two variants: with spaces and without.

const withSpaces = text.length;
const withoutSpaces = text.replace(/\s/g, "").length;

For Unicode text, JavaScript's .length returns the number of UTF-16 code units, not the number of visible characters. Emoji and characters outside the Basic Multilingual Plane are encoded as surrogate pairs and count as 2:

"😀".length; // 2 (not 1)
[..."😀"].length; // 1 — spread operator counts Unicode code points

For accurate character counts, use the spread operator or Array.from().

Sentence Count

Sentences are harder to count because of abbreviations (Dr., e.g., vs.) and ellipses. A simple heuristic:

function countSentences(text) {
  const matches = text.match(/[^.!?]*[.!?]+/g);
  return matches ? matches.length : text.trim() ? 1 : 0;
}

This will miscount some sentences with abbreviations, but is accurate enough for most prose.

Paragraph Count

Paragraphs are separated by one or more blank lines:

function countParagraphs(text) {
  return text.split(/\n\s*\n/).filter((p) => p.trim().length > 0).length;
}

Reading Time

Most reading time estimates use an average of 200–250 words per minute for typical adult readers. Technical content with code is often closer to 150 wpm.

function readingTimeMinutes(text, wpm = 225) {
  const words = countWords(text);
  return Math.ceil(words / wpm);
}

Medium, dev.to, and other blogging platforms use roughly this formula. For a 1,000-word article, the estimate is about 4–5 minutes.

Unique Word Count

Useful for vocabulary analysis or identifying repetitive writing:

function uniqueWordCount(text) {
  const words = text.toLowerCase().match(/\b\w+('\w+)?\b/g) ?? [];
  return new Set(words).size;
}

This is case-insensitive — "The" and "the" count as the same word.

Practical Application: SEO and Writing Tools

Word count is important in SEO content strategy — search engines tend to rank longer, more comprehensive content higher for informational queries. The common guidance for a "pillar" article is 1,500–2,000+ words.

Character limits are relevant for:

  • Twitter/X: 280 characters per tweet
  • Meta descriptions: 150–160 characters (longer gets truncated in search results)
  • H1 tags: ideally under 60 characters
  • SMS: 160 characters per segment (multi-segment messages cost more)

Newline Handling

Different operating systems use different newline conventions:

  • Unix/Linux/macOS: \n (LF)
  • Windows: \r\n (CRLF)
  • Old Mac OS: \r (CR)

When counting lines, normalise first:

const lineCount = text.split(/\r\n|\r|\n/).length;

This is also relevant for character counts — do you count \r\n as 1 character or 2? Most tools count it as 1 (one newline), but raw .length in JavaScript counts it as 2.

Try it yourself

Put these concepts into practice with the free online tool on DevGizmo.