Module:Unicode data/doc
This Lua module is used on approximately 953,000 pages, or roughly 326% of all pages. To avoid major disruption and server load, any changes should be tested in the module's /sandbox or /testcases subpages, or in your own module sandbox. The tested changes can be added to this page in a single edit. Consider discussing changes on the talk page before implementing them. |
Usage
[edit source]This module provides functions that access information on Unicode code points. The information is retrieved from data modules generated from the Unicode Character Database, or derived by rules given in the Unicode Specification. It and its submodules were copied from English Wiktionary and then modified; see there for more information.
Parameters and functions
[edit source]code point
[edit source]The code point is to be entered as hexadecimal value. For example, U+00A9 © COPYRIGHT SIGN:
- |A9hex
- |0xA9hex
- |0x00A9hex
- |0x00a9hex
{{#invoke:Unicode data|lookup|name|0x00A9}}
→ COPYRIGHT SIGN
Incorrect or unintended results:
- 169dec:
{{#invoke:Unicode data|lookup|name|169}}
→ LATIN SMALL LETTER U WITH TILDE Template:Nay —"U+00A9"expected; but is read as 00A9hex (that is, 361decXML Example©
- U+00A9
{{#invoke:Unicode data|lookup|name|U+00A9}}
Template:Nay —do not use "U+" prefix - غ
{{#invoke:Unicode data|lookup|name|غ}}
Template:Nay —cannot enter a character as codepoint
"lookup" and "is" functions
[edit source]- XML Example
lookup, is
- Template-invokable functions that allow access to the functions starting with
lookup
andis
.For most of the functions, add the code point in hexadecimal base as the next parameter. Foris"|Latin
,is|rtl
, andis|valid_pagename
, add character string. HTML character references in the text are decoded by the module into code points. - For example, Template:Tnull → true.
- Internally, in modules, these functions are named using underscore: ←XML Example
lookup_name|code point
XML Examplelookup_name
- For
&A9;
©: Template:Tnull → COPYRIGHT SIGN
Functions overview
[edit source]- Code points: enter hexadecimal value, for example |0x0061 or |61; not Template:!mxt.
Topic | Function | Parameter type (string=by character(s); c.p. by 0xHex value) |
Example | Returns | Character |
---|---|---|---|---|---|
Unicode character name | XML Example |lookup|name
|
code point |
|
| |
Scripts | XML Example |lookup|script
|
code point | Template:Tnull | Yiii |
|
Blocks | XML Example |lookup|block
|
code point | Template:Tnull | Yi Syllables |
|
Planes | XML Example |lookup|plane
|
code point |
|
| |
General Category | XML Example |lookup|category
|
code point |
|
| |
Controls | XML Example |is|control
|
code point |
|
| |
Latin script | XML Example |is|Latin
|
string |
|
||
WP:Article title (WP:NCTR) | XML Example |is|valid_pagename
|
string | Template:Ubli |
|
|
Bidirectionality, right-to-left scripts | XML Example |is|rtl
|
string | Template:Ubli |
| |
Combining character | XML Example |is|combining
|
code point |
|
| |
Character assignation | XML Example |is|assigned
|
code point |
|
| |
Printable | XML Example |is|printable
|
code point |
|
| |
Whitespace character § Unicode | XML Example |is|whitespace
|
code point |
|
| |
Hangul | XML Example |Hangul
|
[application unknown] |
| ||
Alias names | XML Example |aliases
|
[application unknown] |
| ||
Combining class | XML Example |
|
[application unknown] |
| ||
Age | XML Example |
|
[application unknown] | |||
get_best_script | XML Example |get_best_script
|
[application unknown] |
Data modules
[edit source]The data used by functions in this module is found in submodules. Some are generated by AWK scripts shown at User:Kephir/Unicode on English Wiktionary, others by Lua scripts on the /make
subpages of the submodules.
- Module:Unicode data/age: 'Age' of a character, that is: version introduction number.
- Module:Unicode data/aliases: the formal name aliases for characters (from NameAliases.txt)
- Module:Unicode data/blocks: the list of Unicode blocks (from Blocks.txt)
- Module:Unicode data/category: data mapping characters to their General Category (from DerivedGeneralCategory.txt)
- Module:Unicode data/combining: data mapping characters to their Combining Classes (from DerivedCombiningClass.txt)
- Module:Unicode data/control: data for identifying characters that belong to the General Categories of Separator and Other (from DerivedGeneralCategory.txt)
- Module:Unicode data/derived core properties:
- Module:Unicode data/Hangul: data used to generate the names of Hangul syllables (from Jamo.txt)
- Module:Unicode data/names/* (000hh, .., 0E0hh; eg ../names/000): names
- Module:Unicode data/scripts: data mapping characters to their Unicode script properties (from Scripts.txt).
The name data modules (Module:Unicode data/names/xxx) were compiled from UnicodeData.txt. Each one contains, at maximum, code points U+xxx000 to U+xxxFFF.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
00x | U+0000– U+0FFF |
U+1000– U+1FFF |
U+2000– U+2FFF |
U+3000– U+3FFF |
U+4000– U+4FFF |
U+A000– U+AFFF |
U+D000– U+DFFF |
U+F000– U+FFFF | ||||||||
01x | U+10000– U+10FFF |
U+11000– U+11FFF |
U+12000– U+12FFF |
U+13000– U+13FFF |
U+14000– U+14FFF |
U+16000– U+16FFF |
U+18000– U+18FFF |
U+1A000– U+1AFFF |
U+1B000– U+1BFFF |
U+1C000– U+1CFFF |
U+1D000– U+1DFFF |
U+1E000– U+1EFFF |
U+1F000– U+1FFFF | |||
0Ex | U+E0000– U+E0FFF |
Copyright
[edit source]The Unicode database is released by Unicode Inc. under the following terms:
Copyright © 1991-2018 Unicode, Inc. All rights reserved. Distributed under the Terms of Use in https://www.unicode.org/copyright.html.
Permission is hereby granted, free of charge, to any person obtaining a copy of the Unicode data files and any associated documentation (the "Data Files") or Unicode software and any associated documentation (the "Software") to deal in the Data Files or Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, and/or sell copies of the Data Files or Software, and to permit persons to whom the Data Files or Software are furnished to do so, provided that either (a) this copyright and permission notice appear with all copies of the Data Files or Software, or (b) this copyright and permission notice appear in associated Documentation.
THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THE DATA FILES OR SOFTWARE.
Except as contained in this notice, the name of a copyright holder shall not be used in advertising or otherwise to promote the sale, use or other dealings in these Data Files or Software without prior written authorization of the copyright holder.
Known issues
[edit source]- Reading data like
Module:Unicode data/aliases
not provided nor documented - Test fail:
lookup_category
U+FFFF (<noncharacter-FFFF>) expected: Cn.
{{#invoke:Unicode data|lookup|category|0xFFFF}}
→ [Nil]
See also
[edit source]- Named entities: for example, U+22C1 ⋁ N-ARY LOGICAL OR:
{{#invoke:LoadData|Numcr2namecr|0x22C1}}
→ ⋁, ⋁, ⋁
This is a documentation subpage for Module:Unicode data It may contain usage information, categories and other content that is not part of the original module page. |