Simple software things that are actually very complicated

Official Construct Team Post
Ashley's avatar
  • 12 May, 2022
  • 2,623 words
  • ~10-17 mins
  • 58,250 visits
  • 1 favourites

I've previously blogged about the unexpected complications of minor features, which covers how adding a seemingly small software change can turn out to be much more difficult than anticipated. There's also a related phenomenon in the software world: complex and sophisticated pieces of software that are so robust and easy-to-use, that they create an impression of being simple. A text input field is a good example. It looks simple - just a box you type text in to, right? It's a basic thing everyone who uses computers is familiar with. Yet it includes a huge amount of hidden complexity.

This sometimes comes up with our HTML5 game engine Construct. Most content renders in to a canvas, including text. It's possible to use HTML content on top of that, but sometimes it's useful to actually have things render in to the canvas instead, so you can do things like apply WebGL effects to them, draw other content on top, and so on. So sometimes people ask us: can't you just take <some feature> from HTML, and have it render in a canvas? Even in just a simple form?

It's usually not remotely simple - and we already have a good example of this: wrapping text.

Wrapping text in a canvas

When you write text in HTML, the browser handles wrapping text for you. When the text reaches the right edge of its box, it instead moves down a line and then carries on. How hard can it be?

Very hard.

The canvas 2D context can draw text. However it can only draw a single line of text. If you want to have text wrap across multiple lines, you have to implement your own custom word wrap logic in JavaScript. At least there's also a method to measure text, which is the main thing you need to calculate the text layout. So you can write code that measures bits of text checking if they fit on a line, and when it no longer fits you can move down a line and carry on.

That's the fundamental principle behind our custom canvas text word wrap engine. The original code was written about 10 years ago. We're still making tweaks to it. We learned the hard way how it seems simple, but has endless hidden complexity. Here are a few of the less obvious issues we've run in to.

  • Word break is complicated:
    • Not all languages have space-separated words. You probably also need a per-character word wrap mode.
    • What is a character? It's not what you get if you index a JavaScript string. It's also not even what you get if you spread a string to an array. It's really an extended grapheme cluster, in Unicode speak.
    • How do you get graphemes? You can use Intl.Segmenter, but it's not currently supported in Firefox. There's a third party library that you can use though.
    • Splitting strings to graphemes is actually quite slow, so you can't do it too often during word wrap, or you'll tank performance.
    • Words can be too long to fit on a single line. Words can be split with hyphenation. But there are certain places it's allowed to split a word, and other places it's not allowed. And those rules depend on the language.
    • Words that don't fit on a line can also just move the entire word to the next line. But you don't want to do that if it's the first word of the first line.
    • If a single word cannot fit in to an entire line, you have to either hyphenate it (extremely complicated), force a mid-character wrap (which can change the meaning of the text!), put in an ellipsis or just truncate it (both of which hide content).
    • If you want a "typewriter text" effect where text is written out over time, you cannot word break as you go, otherwise words jump down lines as they get longer and no longer fit on the line. You have to handle that specially.
    • Text can go right-to-left as well. The default direction depends on the language.
    • Text can be written in a vertical writing direction as well, which is normal in some languages. Then line wrapping means moving horizontally, not vertically.
  • Handling space characters, also used to separate words for word wrap, is tricky:
    • You can wrap a line on spaces. But what is a space? You'll have to look up a list of characters with the right unicode categorization.
    • But wait, do you want to break on a non-breaking space? Nope! There are also two different non-breaking space characters in Unicode.
    • Are consecutive spaces the same as one space? It depends, and probably depends on if there's a mix of breaking and non-breaking spaces.
    • Do you draw spaces? Presumably not... But you do if you have non-breaking spaces with a background color set.
    • Do you wrap the line if the spaces at the end don't fit? Yes, but then you don't start the line with spaces.
    • Do spaces left at the end of the line matter? No - unless you right-align the text, in which case you want to trim them off so they don't affect the alignment. Trimming the text then also changes the measured width of the line. But if you have spaces at the start of the line for left-aligned text, that's probably an intentional indent.
    • Line breaks are \n. Unless someone imported some old data from Windows and used \r\n.
  • Measuring text is hard:
    • If you measure string A, and string B, is the width of A + B the sum of the widths? No - there's things like kerning that changes the width depending on character combinations.
    • Some fragments of text can be right-to-left, or in a vertical writing direction, in which case you may want to measure them differently.
    • Text can change size on the same line, in which case the line height may be different, and you better have started drawing text at the right position before you got to larger text that made the line taller.
    • You can measure the width of text with canvas, but it doesn't tell you the height. You can figure that out from the point size, if you know about some history of the assumed DPI of displays, based on decisions made decades ago.
    • Text can have fractional sizes, which means in some edge cases you get floating point precision errors around tests like checking if you're inside/outside the box.
    • Measuring the vertical height of wrapped text usually involves some spacing between lines, but not the spacing after the last line.
  • Vertically aligning text is much more complicated than it seems:
    • Which baseline are you drawing text from? Top, alphabetic, something else? It affects where you position text.
    • If you align text at the top of a box, where do you position it to guarantee the top of the text does not go outside the box? It turns out there's no good answer: "zalgo text" can extend vertically an indefinite amount. Some languages position accents higher than others. Hopefully you can make a good guess.
    • Similarly if you align text at the bottom of a box, is there enough space for all the descending parts of the text, like the bottom of "y"? And all other possible symbols in other languages?
    • If you center align text vertically, where is the vertical mid-point of a line of text? Is it between the bottom descender and the top ascender? Or is it above the alphabetic baseline? There are different ways you can calculate that, which affects where the text will end up.
    • Fonts can tell you how far they need for the descender. But some fonts specify the wrong value. So you can't always rely on it.
    • The canvas API can now give you some of these vertical measurements (via TextMetrics), except they're not yet supported in Firefox... and even Chrome and Safari still don't agree on where things are measured from (e.g. this Chrome issue).
    • If text changes size - or font - on the same line, you also need to take in to account the possibly varying ascender and descender heights on the line.
  • We also support text formatting, using a simple form of BBcode to do things like have bold text, change the color, and so on. But:
    • Word wrap must not break a word just because formatting changed.
    • Changing formatting half-way through text will cause each fragment to be rendered separately, which also changes how the text is rendered. It breaks kerning, ligatures and separates otherwise joined-up text in languages like Arabic.
    • Different styles like outline, fill, strikethrough and underline all need to be drawn in the right order to get the correct visual layering. With certain style combinations, fill-then-stroke is not enough.
  • Construct also has support for SpriteFonts, where each character is drawn from an image. This uses the same word wrap engine but has a few extra quirks:
    • By "character", I meant "grapheme".
    • We provide an extra character spacing property, which can be negative, which affects measurements at the end of a line.
    • The concept of an alphabetic baseline doesn't really exist, and it effectively draws from a top baseline, which can vary along a line if the scale changes.

There are probably some more edge cases I forgot about. That's just a few. Still think wrapping text is simple?

We learned all that the hard way, by writing our own word wrap engine for canvas and having it battle-tested in the real world in production software by lots of users. In the end our implementation covers most, but not all, of the above - some problems are just so difficult to solve it doesn't seem feasible, especially when it comes to bidirectional/vertical layout and changing formatting with joined text characters. Even with the might of a full browser engine, Chrome only fairly recently solved some of those issues with a completely rewritten layout engine named LayoutNG.

We're somewhat reluctant to go through all this again for anything else that seems simple - like a text input field. Why not make one of those too?

Text input fields

In this case we have not implemented a text input field in the canvas. For Construct we still create an input HTML element and place it on top of the canvas. I think this is wise, as it is another area where there is a lot of hidden complexity. If we did our own implementation I know we'd need to cover:

  • Many of the above issues with text rendering, such as handling bidirectional text, various unicode control characters...
  • Placeholder text
  • Tooltips
  • Disabled and read-only modes
  • Different input types and formats such as password (which has various security implications), number, URL, telephone number...
  • Support for bringing up the Virtual Keyboard on mobile, with a keyboard style to match the input type.
  • If handling multi-line textareas, then throw in all of the above word wrap issues too.
  • Many users expect a spell-check feature on inputs now. So you need a spell check engine too if you want to cover that. For all languages.
  • There is a blinking caret to indicate where text input will go. That animates and is controllable by mouse, touch and keyboard input.
  • Text can scroll if you move the caret along text that is longer than the text field, even with single-line text fields. Multi-line textareas have vertical scrolling with scrollbars too. This also implies clipping to the text field area.
  • Clicking and dragging text, or using certain keyboard shortcuts, will select text. Selections have different styling, and note that changing styling half-way through text can affect how it renders.
  • Many OSs now support bringing up an emoji picker for text fields, such as the Windows + . shortcut.
  • Some languages require an Input Method Editor (IME), allowing for typing a wider range of characters than the hardware directly provides.
  • Support for undo and redo.
  • Cut, copy and paste of selected text allows integration with the system clipboard. (In browsers this requires a permission prompt.)
  • Pressing tab usually cycles through available form controls. The text field should have a tab index.
  • Usually there is also a different appearance when focused, such as displaying an outline.
  • Accessibility tools may have features like being able to read out text in text fields. Will your custom implementation support that? Things like tab index and focus highlight are also accessibility features.
  • Some input fields allow manually specifying the writing direction.
  • Text inputs can usually be right-clicked (or tap-and-hold) to access a context menu providing access to many of these features. It can have submenus too. Therefore supporting this means implementing a full context menu control as well, which opens up another can of worms. I could write another list like this just on that.
  • People like to customise the appearance, including text, border, background, selection styles, and caret.

Don't do it

In short it's not worth the trouble to make a canvas equivalent. Using HTML for a text input may have its limitations on how it integrates with a canvas, but replacing it is an enormous engineering challenge. Some might say "but I only need a few of those things". Even that is a challenge, with lots of gotchas along the way, as we discovered with wrapping text. And then if you have lots of users using your software in different ways, they'll all want to use a different small subset. So to cover everyone, you'll probably need a near-enough complete implementation in the end anyway.

This comes up in other cases too. For example one suggestion for Construct involved a "simple" version of a flexbox layout engine. I am sure similar lists exist for how to design even a straightforward-sounding layout engine. Layout is really hard, and browsers do it well. It's far better to use the existing browser layout engine if at all possible.

If the browser - or in other environments, the operating system - can do something for you, chances are you definitely want to let it handle that for you. Otherwise you'll learn the hard way just how much work goes in to it. We did it for word wrap as displaying text is so important, but even then, it's not a complete implementation and has its limitations. At least we allow using custom HTML elements so you have a way to just use the browser's layout engine for text, with full support for every single gnarly edge case in text layout. Apparently there's a Canvas formatted text spec proposal which looks like it would also let us use the browser layout engine for wrapped text in a canvas - which would be great if all browsers supported it, and none have shipped it yet.

In conclusion, it's easy to look at mature, comprehensive pieces of software engineering and think it's simple. That is because it is the pinnacle of success in software engineering to do a great deal of complicated work, and have it work so smoothly that people think it's simple. Things like text input fields achieve this level in part because they are so fundamental to computing, and get so finely crafted over decades of intensive use that they come close to perfection. Familiarity probably also lends to the illusion, as things like text are so ubiquitous in computing that everyone feels like they know what it involves. But there are endless hidden depths to the implementation of these things. And as ever they change over time too. So don't do it: don't think you can implement that. Let the browser or OS do its job. Far more work has gone in to them than you probably appreciate.


Get emailed when there are new posts!


  • Order by
Want to leave a comment? Login or Register an account!
  • I wrote my own word wrapping for my game and I agree it was a huge challenge to get something that works even most the time.

  • I thought most comments about this wa sabout taking a "snapshot" of the rendered element/layer and being able to paste it to canvas in themiddle of the other sprites.

  • Is this the right TLDR?

    If you do text, just use HTML elements and beautify it with CSS. Not worth the headache.