Eric / NYC / Milky Way

Visual

Blog Index

Random 5

Stripe and Tap

Multi-tab Websockets

Ruby vs. Ruby

Cookies

Web Accounts

Emoji Bits

Honeypot Bots

Bad Content

JS Data Chutzpah

Sendgrid Ban

Clean URLs

Git

Concerns

Rails UJS

Extending Devise

ENV Variables

See All

Emojis to Bits via Unicode

June 2020

Officially, Emojis have meanings. 😐 means neutral face.

But for me, the most important feature is their ambiguity. Something is being said, but no one is quite sure.

🤖😐🧜

I wanted to randomly display some emojis on a website. Because, if they’re ambiguous, they may as well be random.

In the process of setting this up, it seemed that a few worthwhile concepts arose, so here we go.

Unicode

Informally, Unicode is a shared language among computers.

Basically all operating systems have decided to support Unicode, so if you’re trying to make a website, we can use Unicode references and can be confident that it’ll appear as we want everywhere.

Unicode references look like this: U+1CC1. This renders to ᳁ , which, if you’re curious, represents the SUNDANESE PUNCTUATION BINDU PANGLONG

And it turns out that emojis are part of the Unicode standard - so we don’t have to fumble with images if we want to use emojis on a webpage.

An emoji Unicode reference looks like U+1F643, and renders to 🙃.

We can put these emoji Unicode references inside of HTML Entities to render on our websites.

HTML Entities

If you’re writing HTML, you might notice that if you try to use < as regular text, your browser might freak out a bit. That’s because < looks like you’re opening an HTML tag.

So < is considered a reserved character in HTML. This seems to make some sense, not all characters we type need to be rendered directly as text.

You might also notice a similar scenario if you try to type multiple spaces in HTML. Turns out spaces are also reserved (after the first one). This is because we don’t want all the whitespace in our HTML rendered literally. We have to explicitly tell our document to render extra spaces using an HTML Entity.

So how do we tell HTML when we want to use a reserved character? We prefix the character with & and suffix it with ;. Great, so something like &<;. Not so fast — instead of using the actual character, we use what’s called a character entity.

Character entities can be either numbers or names.

Names look something like mdash (em dash), lt (less than), or nbsp (space) and HTML will render &mdash; as —, &lt; as <, and &nbsp; as   (you probably couldn’t see this last one).

Numbers can be written as either hexidecimal or decimal. Both require an additional preceding #, and hex also requires an x.

The decimal number60 turned into a character entity &#60; renders <. The equivalent of 60 in hexidecimal is 003C. So &#x003C; should render as <, also.

So there are 3 ways to represent the same character < : &lt; (named) or &#60; (number / decimal), or &#x003C; (number / hex).

Why does this matter?

Unicode code points are written in hexadecimal! We can render them using the HTML Entity Numbers written in Hex form.

Let’s try this out. The Unicode code point U+1F643 (the U+ just means Unicode) can become the HTML Entity &#x1F643 which renders as 🙃. (Remember, hexs need a preceding x.)

Hex Detour

Remember how we said there were multiple ways to refer to the same character — well, &#x1F643 is the hexadecimal character entity, but what about the decimal version?

Hexidecimals are base 16, so converting 1F643 to decimal is just 1*16^4 + 15*16^3 + 6*16^2 + 4*16^1 + 3*16^0.

How did the F become 15? Since hex is base 16, we need 16 digits instead of the usual 10, so we start using letters after we get to 9: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F.

The previous calculation becomes: 65536 + 61440 + 1536 + 64 + 3 = 128579

The decimal version would be &#128579; which also renders to 🙃.

Deeper detour

Since we see how to move between hex (base 16) and decimal (base 10), why don’t we convert our 🙃 (&#x1F643) into binary (base 2).

Instead of laying out our binary digit places and chipping away at a solution (i.e. … 1*2^9 + 1*2^8 + 1*2^7 + 1*2^6 + 1*2^5 …), we can do a little trick using some high school math.

16^4 = (2^4)^4

Then we can use the exponent power rule: (x^a)^b = x^(a*b)

16^4 = 2^16

This means we can pretty easily convert our hex code into binary.

Hex: &#x1F643 is 1*16^4 + 15*16^3 + 6*16^2 + 4*16^1 + 3*16^0

Intermediate step: 1*2^16 + 15*2^12 + 6*2^8 + 4*2^4 + 3*2^0

Now we just have to convert our coefficients to binary.

Binary: 0001 1111 0110 0100 0011

So, in binary, 🙃 is 00011111011001000011. We always hear that computers are just made up of 1s and 0s — well, we’ve made it.

Return to Emojis

Apologies for those detours if you even took them, but let’s return to the important topic of random emojis.

There are a lot of emojis, so it’s easiest to represent them by range. According to this source, most emojis live in these ranges:

  • U+1F601 - U+1F64F
  • U+1F680 - U+1F6C0

As we can see, emojis are Unicode, written in hex, which is a not easy to loop through (which we’ll be doing in a second). So it’s easiest if we convert them from hex into decimal, just as we did above.

In decimal, the ranges become:

  • 128513 - 128591
  • 128640 - 128704

Here’s our random emoji code in Ruby:

  def random_emoji
    e_list = []
    emojRange = [[128513, 128591], [128640,128704]]
    emojRange.each_with_index do |val, i|
      (val[0]..val[1]).each do |v|
        emoji = "&##{v};"
        e_list << emoji
      end
    end
    e_list.sample
  end

For each range, (val[0]..val[1]), we loop through and construct the emoji.

Since we’re using decimal values instead of hex, we need to use numbered HTML character entities without a preceding x. Just a quick refresher:

HTML Character entity:

  • Decimal: &#60;
  • Hex: &#x003C; (hex has the extra x)
  • Name: &lt;

This is how we arrive on emoji = "&##{v};". The #{v} is used for string interpolating the actual decimal, and the preceding &# is used to indicate it’s a decimal HTML character entity.

$ random_emoji
> &#128574;

$ random_emoji
> &#128643;

Conclusion

That is enough for now.