DEV Community

Cover image for When the white space became a beast
Alexandru Trandafir
Alexandru Trandafir

Posted on

When the white space became a beast

This article was originally posted at HeavyDots Blog_

The legend of the white space

Every now and then, on the World Wide Web, someone meets the "white space" or for us: "the beast".

When we meet the beast, it drives us crazy, we get scared, we can’t understand what our eyes see, we cannot explain what’s going on, we wonder if something affected our perception. A supernatural phenomenon is in front of our eyes and overwhelms us.

We also got into it twice! Once on a website and once even inside of a Excel sheet! It’s unbelievable the places where this creature summons and the way it got there is an even bigger mystery!

The brave who went after it

Some give up, run and hide from it. Or just accept it without questioning its existence. But others, the brave ones, start on a journey through the darkness and don’t give up until they know the truth.

This is what someone who has been there and survived confessed (with shaking hands):

To the people in the future like myself that had to debug this from a high level all the way down to the character codes, I salute you.

crclayton - Sep 1 '16 at 21:22 - stackoverflow.com

We've also been there and we returned with the truth ready to share it with you.

How the creature looks and where does it come from

The strange space is actually the   entity (Wikipedia), pretty well known by HTML coders but in this case it is not represented/encoded in HTML.

So basically most of us knew the creature, but we didn’t know it could exist in another shape and dimension.

Wikipedia explains the different representations of the beast:

Wikipedia explains the different representations of the beast

And also how the creature is born and mutates:

And also how the creature is born and mutates

The tools and techniques to find it and get rid of it

For the even braver who want to go after it and make it disappear we’ve got some tools and techniques to help locating it and still come back alive from the journey.

Web tool that looks for it in the code of a webpage:

Here’s a tool with a web interface where you can enter a URL and it searches the contents of that page in order to find the beast:

http://tools.heavydots.com/nbsp-space-char-detect/

Manual cleanup technique:

But if you find yourself in the dark code dimension and cannot use the web, and you have to get very close to the beast in order to kill it, here is a way to exorcise your demonized code:

Only for those who are skilled with PHP spells, run a search of chr(194).chr(160) and replace it with an ordinary space. This will extract the demon out of it and will restore back its clean white space soul.

Take this scroll with you, it contains the spell you will need when you face the beast:

// Define the white beast
$white_beast=chr(194).chr(160);

// Count how many of them are living in your text
$count=substr_count($string, $white_beast);
print_r($count);

// Replace it with a normal space
$string=str_replace($white_beast, ' ', $string);
Enter fullscreen mode Exit fullscreen mode

THE END of the story

So dear reader of this legend, if you never faced the beast, beware!

But if you have also had to deal with it, please share with us your story in the comments section!

I'll try to post new stuff here, and also invite you to drop by at HeavyDots Blog

Top comments (4)

Collapse
 
antonfrattaroli profile image
Anton Frattaroli

Lol. Excel has gotten a coworker of mine with this too. He emailed the offending code to me and and I couldn't find an issue... because the email program converted it to a normal space during copy/paste.

Collapse
 
atrandafir profile image
Alexandru Trandafir

Yeah that's the scairy thing about it that it can disappear! :D I found it in Excel in a date field that would not get converted when importing the Excel into a custom PHP app. The date's format looked right.. but on both left and right side it had this strange char.. who knows when it got there!

Collapse
 
mortoray profile image
edA‑qa mort‑ora‑y

What problem is the non-breaking-space creating?

Surely the myriad of other Unicode spacing characters would also create similar issues?

Collapse
 
tbodt profile image
tbodt

There are dozens and dozens of Unicode characters that show up as blank space. You might think that \s in a regex would find all of them, since it matches characters with the "separator, space" unicode property. But not all blank characters have the "separator, space" property, including (with the characters between parentheses):
U+3164 HANGUL FILLER (ㅤ)
U+1D173 MUSICAL SYMBOL BEGIN BEAM (𝅳) (there are 7 other similar musical symbols)
U+200D ZERO WIDTH JOINER (‍)
U+180E MONGOLIAN VOWEL SEPARATOR (᠎) (only shows up as blank in some fonts)
There's even one character,   (U+1680 OGHAM SPACE MARK) that has the "separator, space" property and doesn't display as whitespace. Hilariously enough, you can use this character as whitespace in JavaScript.