In Javascript we just call length
on String object will return the length you want to get.
But when you get emoji length from javascript become more trouble let show you what i found.
"ihavecoke".length // 9
"๐".length // 2
As you can see, when you call length
on 'ihavecoke'
you got length 9 it's ture and make sense.
The line number 3 you just got length 2. what ? a emoji char just 2 length of string?
The "๐จโ๐จโ๐งโ๐ฆ".length
more strange that return 11 why ? emoji char not return 2 always?
So how to caculate emoji length to 1 length? you can use lodash method toArray
it's simple and useful
_.toArray("๐").length // 1
_.toArray("๐จโ๐จโ๐งโ๐ฆ").length // 1
_.toArray("ihavecoke ๐คฉ").length // 11
So we got the 1 length emoji lols
Hope it can help you :)
Top comments (6)
I would runes2, but it doesn't hide the reality than beyond UTF-8 is complex.
If you just want to match unicode symbols, a better and non-cryptic idea, is to use XRegExp. For other symbols, it is
XRegExp('\\p{So}')
.As to why this happens. The web is typically in UTF8 (read 99% of all internet users use UT8) which is a way to encode characters. Basically encoding is assigning a number to a letter. Like you could encode "A" as 1 "B" as 2 etc. Back in the day there was a thing created called ASCII en.wikipedia.org/wiki/ASCII but it only used 7 bits to encode the characters. Which means a max of 128 total characters. Well that's fine for english but what if you need more? So there were a bunch of different ways text got encoded. Like lots and lots and some of them incompatible with ascii. Eventually the web settled on a way to make letters as numbers called UTF8. UTF8 is interesting because it can use a variable number of bits to represent a character (up to 32 currently). This makes it compatible with the old ASCII. But also allows for a huge number of different characters and languages. So the scheme looks at the first 8 bits and if it's in a certain range it will look at the next 8 bits etc until it can make a character.
Well to put a wrinkle in it. Although web pages and code are all in utf8, internally javascript stores strings as utf16. UTF16 is like UTF8 but instead of using a minimum size of 8 bits it uses 16 or 32 bits to represent a letter. So when you ask javascript how long a string is, it breaks it up into 16 bit chunks and tells you how many 16 bit chunks there are. BUT some characters (and emoji) are encoded as two 16 bit chunks so javascript will tell you that the length is 2
So that's part 1. Part 2 is emoji. Emoji are interesting. What you see on screen is not necessarily the full truth. Emoji have a way to be joined together. For instance the pride flag ๐ณ๏ธโ๐ is ACUALLY a white flag ๐ณ and a rainbow ๐ mashed together with an invisible emoji that says "hey mash these two together". So on systems that don't know about the pride flag you just get ๐ณ ๐. Well what does that tell us about length? Well ๐ณ is 2 and ๐ is 2 and ๐ณ๏ธโ๐ is 6. 6 because of the invisible "mash these two together" character. So what is it about ๐จโ๐จโ๐งโ๐ฆ that returns 11? Well it's a super mashup emoji it's ๐จโ and ๐จโ and ๐งโ and ๐ฆ all put together with the "mash these two together character" it actually makes it possible to have a huge variety of family emojis because we are combining them. So why 11 and not 14? (length 2 for each man length 2 for each for the children and 3 mash together characters) well man emoji are only length 1 not length 2 and the girl emoji is length 1 not two so we can subtract 3 from 14 netting 11 length. (176 total bits for just that emoji! Compared to just 8 for the letter A)
Does not work
๐
HAHAHAH, Never thought about this.
Indeed a great question while I was digging I found an answer on SO stackoverflow.com/a/46085147