Unicode General Categories
Every Unicode character is assigned a general category — a high-level classification like Letter, Number, or Symbol.
Categories are used by regular expressions (\p{L},
\p{N}) and text processing algorithms.
Letter
Ll
Lowercase Letter
Letters that are lowercase, such as a, b, c.
2,283
chars
Lm
Modifier Letter
Non-combining characters used to modify preceding letters.
410
chars
Lo
Other Letter
Letters that are not uppercase, lowercase, titlecase, or modifier letters.
141,062
chars
Lt
Titlecase Letter
Letters that are titlecase, used at the start of words in certain scripts.
31
chars
Lu
Uppercase Letter
Letters that are uppercase, such as A, B, C.
1,886
chars
Mark
Number
Punctuation
Pc
Connector Punctuation
Punctuation marks that connect words, such as the underscore.
10
chars
Pd
Dash Punctuation
Punctuation marks that separate words or clauses, such as hyphens and dashes.
27
chars
Pe
Close Punctuation
Closing punctuation marks, such as brackets and parentheses.
77
chars
Pf
Final Punctuation
Closing quotation marks.
10
chars
Pi
Initial Punctuation
Opening quotation marks.
12
chars
Po
Other Punctuation
Punctuation marks that are not connectors, dashes, brackets, or quotes.
641
chars
Ps
Open Punctuation
Opening punctuation marks, such as brackets and parentheses.
79
chars
Symbol
Sc
Currency Symbol
Currency symbols such as $, £, €, and ¥.
64
chars
Sk
Modifier Symbol
Modifier symbols that are not spacing combining marks.
125
chars
Sm
Math Symbol
Mathematical symbols such as +, =, <, and >.
960
chars
So
Other Symbol
Symbols that are not math, currency, or modifier symbols.
7,468
chars
Separator
Other
Cc
Control
Control characters such as carriage return, tab, and null.
65
chars
Cf
Format
Non-visible formatting characters such as the zero-width joiner.
170
chars
Co
Private Use
Code points reserved for private use by applications.
137,468
chars
Cs
Surrogate
High and low surrogate code points used in UTF-16 encoding.
2,048
chars