Word Break II — string segmentation using a dictionary
Given a string `s` and a dictionary, return all ways to segment `s` into dictionary words. DFS/backtracking with memoization on the suffix: for each position, try every dictionary word that matches the prefix and recurse on the remainder; cache results by start index. Time worst-case exponential; memoization makes practical inputs tractable.
Word Break II asks for all valid segmentations — the count is worst-case exponential, but with memoization on the suffix it's tractable for realistic inputs.
Approach — DFS + memo
For each starting index, try every dictionary word that matches s from that index, recurse on the remainder, and prepend the word to each returned segmentation.
function wordBreak(s, wordDict) {
const dict = new Set(wordDict);
const memo = new Map();
function dfs(start) {
if (start === s.length) return [""];
if (memo.has(start)) return memo.get(start);
const results = [];
for (let end = start + 1; end <= s.length; end++) {
const word = s.slice(start, end);
if (dict.has(word)) {
for (const rest of dfs(end)) {
results.push(rest ? `${word} ${rest}` : word);
}
}
}
memo.set(start, results);
return results;
}
return dfs(0);
}Why memoize on the suffix
The set of segmentations of s[start..] depends only on start. So caching by start index turns repeated work (different prefixes leading to the same suffix) into a single computation per suffix.
Pre-check with Word Break I
If only some prefixes can lead to a complete segmentation, an upfront DP pass that marks which suffixes are reachable lets you prune dead branches:
const canBreak = new Array(s.length + 1).fill(false);
canBreak[s.length] = true;
for (let i = s.length - 1; i >= 0; i--) {
for (let j = i + 1; j <= s.length; j++) {
if (canBreak[j] && dict.has(s.slice(i, j))) { canBreak[i] = true; break; }
}
}
// then DFS, but skip recursion when canBreak[end] is falseThis avoids the pathological case ("aaaaaaaab" with dict ["a", "aa", "aaa"...] but no "b").
Complexity
- Worst case: exponential — the number of segmentations can itself be exponential.
- With memo + pre-check: polynomial in the number of distinct suffixes reached × dict scan; practical inputs run fast.
- Space: O(n²) memo plus output size.
Edge cases
- Empty string →
[""](or[]depending on spec). - No segmentation possible →
[]. - Dictionary words longer than
s— skip.
Interview framing
"DFS with memoization keyed on the start index. At each position, try every dictionary word that matches the prefix, recurse on the remainder, and prepend. Memoize the segmentations of each suffix — that's where the work is shared. A Word Break I pre-pass marks which positions can reach the end, pruning dead branches. Worst case is exponential because the answer itself can be exponential, but the memo + pre-check makes typical inputs fast."
Follow-up questions
- •Why memoize on the suffix (start index) rather than the prefix?
- •How does the Word Break I pre-pass help?
- •What's the worst-case complexity and why?
Common mistakes
- •Forgetting to memoize → TLE.
- •Joining words wrong (extra/missing space at the boundary).
- •Not pruning dead-end branches.
Performance considerations
- •Memo + pre-check is the difference between TLE and AC on pathological inputs. Output itself can be exponential — return a generator if memory matters.
Edge cases
- •Empty string.
- •No valid segmentation.
- •Dictionary with duplicates (use a Set).
- •Exponential output size — caller may want a generator.
Real-world examples
- •Compound-word segmentation, search-query tokenization, agglutinative-language word splitting.