A recent Reddit post about Unicode characters in Go identifiers sparked my interest to dive into the Go spec and look things up properly:
According to the spec, the syntax for valid identifiers is
identifier = letter { letter | unicode_digit }
with
letter = unicode_letter | "_"
unicode_letter = /* a Unicode code point classified as "Letter" */ .
unicode_digit = /* a Unicode code point classified as "Number, decimal digit" */ .
The "Letter" category consists of the Unicode categories Lu
(uppercase letters), Ll
(lowercase letters), Lt
(titlecase letters), Lm
(modifier letters), and Lo
(other letters), where "Number, decimal digit" refers to the Unicode category Nd
.
So an identifier has to start with either a "letter" or an underscore ("_"), and must contain only "letters", "decimal digits" and "underscores" - according to what's defined as letters and digits in Unicode.
The set of letters is not only the usual A
-Z
, a
-z
, but also letters from other scripts, like greek letters (e.g. Σ
, or CJK characters (e.g. 㭪
). The same holds for digits - not only 0
-9
, but also digits from other scripts are allowed: e.g. ୩
, ٣
, etc.
Valid identifiers:
abc_123
_myidentifier
-
Σ
(U+03A3 GREEK CAPITAL LETTER SIGMA) -
㭪
(someCJK
character from theLo
category) -
x٣३߃૩୩3
(x
+ decimal digits3
from various scripts)
Invalid identifiers:
-
42
(does not start with a letter) -
😀
(not a letter, butSo / Symbol, other
) -
⽔
(not a letter, butSo / Symbol, other
) -
x🌞
(starts with a letter, but contains non-letter/digit characters)
Although Go considers identifiers valid that contain other characters than A
-Z
, a
-z
, 0
-9
, and _
, it's generally not advisable to use those - because of readability, accessibility, or even to avoid rendering issues.
Top comments (0)