Emojis and the Power of Unicode


One of the big announcements that Apple made this year was a new list of emojis that will be arriving on iOS 9.1 late this year. Developer versions already have access to the new emoji set. If you aren’t a millennial, you may be wondering what all the fuss is about with emojis and why adding a few pictures is creating so much buzz. Here’s why emojis are so popular and how they came to be in the first place. And for that, we have to go back in time.


Emojis have actually been around a long time, since at least 2000, but their use was mostly limited to Japan. Why there? It has to do with how computers encode information. During this time, Western computers encoded information using a format called ASCII, the American Standard Code for Information Interchange. It was a way to translate human­readable symbols into numbers that computers could understand. Your font might show you how a letter or a number looked, but the actual representation of the symbol to the computer was in ASCII.

Yet there was one major flaw. ASCII really only had enough codes to handle Latin­-alphabet languages. If your language had a Latin­based alphabet you were fine, but if you had any other writing system, you had to learn a language that used it, like English, to start using computers or use a special computer that had a different way of encoding information.

ASCII information was stored in a seven-bit number, which allowed for around 127 characters. But computers in this era could understand eight bits at a time. In the original specification, that extra bit was just unused, but a bunch of people got the bright idea that if they used that extra bit, they could increase the number of characters that could be printed by 128.

The problem was that a bunch of ASCII variants all came out at once and there was no standardization until ANSI was developed. This standard fixed the original ASCII set in place and then created code pages, sheets of characters, that could be called to fill in the remaining half of the set. So if you had a Hebrew version of DOS, you could make sure that the code sheet for Hebrew loaded when you started your computer and use a keyboard in your own language. Of course, even with 256 characters that weren’t nearly enough for many Asian languages.

Computers could only load one code sheet at a time. Opening a document that expected one code sheet on a computer that used a different one resulted in gibberish. In a time before the Internet, this wasn’t such a bad thing. But as soon as we could send strings of information all over the place, this wasn’t going to work.


While all this ASCII and ANSI stuff was going on, another group of programmers was trying to solve the problem once and for all. They wanted to use a form of encoding that could handle many more characters, enough to handle all the writing systems in the world. That effort was called Unicode, and practically all computers now use it as the method for encoding symbols.

Unicode uses 1­4 bytes in memory for each character, as compared to ANSI’s single byte. The number of bytes necessary to represent the character shows which “plane” of Unicode it belongs to. It breaks down (extremely roughly) like this:

1 byte — Good old ASCII, kept this way for backward compatibility
2 bytes — Most other alphabets that aren’t English
3 bytes — Asian alphabets
4 bytes — Everything else

Using some fancy programming, including the code sheet concept from ANSI, Unicode can a little over 1.1 million different characters. Now at least 85% of all information is encoded in some form of Unicode, usually UTF­8. It is the standard encoding for exchanging information online.


While all this was happening over in the West, Asia was developing its own forms of encoding information. They took to online communication like a fish to water, but there’s a problem with using writing to communicate. Most of our communication is nonverbal. Even with the rich character sets of Chinese and Japanese, it is difficult to communicate the inflections of voice. To compensate, Japan developed the first emoji (picture character in Japanese) set for smartphone users in the late 1990s and they boomed in popularity. But because Asia used their own encoding systems, emojis remained an Asian phenomenon.

Google and other tech companies in the West saw the potential in emojis and decided to petition the Unicode Consortium, the organization that controls Unicode, to include emojis in that four­byte “everything else” section above. In 2010, the first emojis were added to Unicode, and more have been added with each major release, the latest being June of this year.

Now emoji are ubiquitous in smartphone communications and social media all over the world. Even if you’ve never used a smartphone, many social media and chat programs will automatically translate emoticons, like the old :), into an equivalent emoji.

Getting a whole new set of icons is like adding another set of words into a language. Some of the new icons include a face with thermometer and prayer beads. One special feature with this update is a way to change the skin tone of human emoji into five different shades.

Whether you think they help or hinder communication, emojis are here to stay as long as Unicode is around, and it’s not very likely that a replacement will come around anytime soon!

This is a guest contribution by Nick, who is the SEO Manager with Adficient and has been working in the SEM field for over 8 years. A graduate of Arizona State University, Nick regularly posts to the Adficient site along with other online marketing sites.

[image source: flickr.com/photos/taylorherringpr]

This post may contain affiliate links that allow us to earn commissions at no additional cost to you. We are reader-supported so when you buy through the affiliate links, you are also helping or supporting us. 

Leave a Reply

Your email address will not be published. Required fields are marked *