Each unicode character is assigned a category. This is the complete list of categories.
Code | Description |
[Cc] | Other, Control |
[Cf] | Other, Format |
[Cn] | Other, Not Assigned (no characters in the file have this property) |
[Co] | Other, Private Use |
[Cs] | Other, Surrogate |
[LC] | Letter, Cased |
[Ll] | Letter, Lowercase |
[Lm] | Letter, Modifier |
[Lo] | Letter, Other |
[Lt] | Letter, Titlecase |
[Lu] | Letter, Uppercase |
[Mc] | Mark, Spacing Combining |
[Me] | Mark, Enclosing |
[Mn] | Mark, Nonspacing |
[Nd] | Number, Decimal Digit |
[Nl] | Number, Letter |
[No] | Number, Other |
[Pc] | Punctuation, Connector |
[Pd] | Punctuation, Dash |
[Pe] | Punctuation, Close |
[Pf] | Punctuation, Final quote (may behave like Ps or Pe depending on usage) |
[Pi] | Punctuation, Initial quote (may behave like Ps or Pe depending on usage) |
[Po] | Punctuation, Other |
[Ps] | Punctuation, Open |
[Sc] | Symbol, Currency |
[Sk] | Symbol, Modifier |
[Sm] | Symbol, Math |
[So] | Symbol, Other |
[Zl] | Separator, Line |
[Zp] | Separator, Paragraph |
[Zs] | Separator, Space |