I'm currently participating in the #100DaysOfCode challenge and documenting my journey on Twitter. So far, I've been reviewing the holy trifecta of web development: HTML, CSS, and JavaScript. On Day 4, I shared that one of the things I reviewed was the importance of including <meta charset="utf-8">
in an HTML file.
I got a response asking to explain why. As I was typing my answer, I found that I had a lot to say to fit into one tweet, and it would be easier to write up a blog post.
What is <meta charset="utf-8">
?
Let's break down the line <meta charset="utf-8">
to derive its meaning:
-
<meta>
is a HTML tag that contains metadata about a web page, or more specifically, descriptors that tell search engines what type of content a web page contains that is hidden from display. -
charset
is an HTML attribute that defines the character encoding for your browser to use when displaying the website content. -
utf-8
is a specific character encoding.
In other words, <meta charset="utf-8">
tells the browser to use the utf-8
character encoding when translating machine code into human-readable text and vice versa to be displayed in the browser.
Why 'utf-8'?
Today, more than 90% of all websites use UTF-8. Before UTF-8 became the standard, ASCII was used. Unfortunately, ASCII only encodes English characters, so if you used other languages whose alphabet does not consist of English characters, the text wouldn't be properly displayed on your screen.
For example, say I wanted to display some Arabic text that says "Hello World!" on a screen using the following snippet of code with the charset
set equal to ascii
:
html
<!DOCTYPE html>
<html>
<head>
<meta charset="ascii"> <!-- char encoding is set equal to ASCII -->
</head>
<body>
<h1>!مرحبا بالعالم</h1>
</body>
</html>
Now, if you go to your browser, you'll see that the text is displayed as gibberish 🥴:
However, if we change the charset
to utf-8
, the code is as follows:
html
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"> <!-- char encoding is set equal to UTF-8 -->
</head>
<body>
<h1>!مرحبا بالعالم</h1>
</body>
</html>
The text is now displayed properly 🥳:
Thus, UTF-8 was created to address ASCII's shortcomings and can translate almost every language in the world. Because of this and its backward compatibility with ASCII, almost all browsers support UTF-8.
What if I forget to include <meta charset="utf-8">
in my HTML file?
Don't worry — HTML5 to the rescue! 🦸
The default character encoding used in HTML5 is UTF-8. This means if you include <!DOCTYPE html>
at the top of your HTML file (which declares that it's an HTML5 file), it'll automatically use UTF-8 unless specified otherwise.
Furthermore, most browsers use UTF-8 by default if no character encoding is specified. But because that's not guaranteed, it's better to just include a character encoding specification using the <meta>
tag in your HTML file.
There you have it. 🎉 Feel free to leave any comments or thoughts below. If you want to follow my #100DaysOfCode journey, follow me on Twitter at @maggiecodes_. Happy coding!
Top comments (9)
One thing I've always been told was important about this was to make it the very first tag in the
<head>
section to prevent browsers needing to stop and reparse the html if they guessed the encoding wrongly.Good point!
The meta charset element is only about the characters that you can enter in the HTML file. If you have <meta charset="ascii">, it means you should only enter ASCII characters in your HTML file. You can still display Arabic text or any other language using HTML entities, although this is cumbersome. For example, with <meta charset="ascii">, you can use this instead for your Arabic text.
<h1>!مرحبا بالعالم</h1>
While I don't recommend this since it's not readable, just note that some older editors might not support UTF-8 files.
Very helpful, thanks
Nicely explained, thanks for taken your time. Understood now 🙂
While searching for information about "," I found your account and appreciated the explanations, which were far clearer than those I found on Wikipedia.
Is it important to be big or small UTF-8?
So, if I'm understanding correctly... this is more of a compatibility thing?
Yep. Not sure if this answers your question but utf-8 solves one of the limitations of ASCII of not being able to encode non-English characters (like those in Arabic and Korean for example). This way languages that don't use English characters can still be understood by computers and also display the text properly to users.