CLDR 47 Release Note
No. | Date | Rel. Note | Data | Charts | Spec | Delta | GitHub Tag | Delta DTD | CLDR JSON |
---|---|---|---|---|---|---|---|---|---|
47 | 2025-04- |
v47 | Charts47 | LDML47 | Δ47 | release-47-alpha2 |
ΔDtd47 | 47.0.0-ALPHA2 |
This is an alpha version of CLDR v47.
Overview
Unicode CLDR provides key building blocks for software supporting the world’s languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.
CLDR 47 focused on MessageFormat 2.0 and tooling for an expansion of DDL support. It was a closed cycle: locale data changes were limited to bug fixes and the addition of new locales, mostly regional variants.
Changes
The most significant changes in this release are:
- New locales:
- Core data for Coptic (cop), Haitian Creole (ht)
- Locale data for 11 English locales and Cantonese (Macau) (yue_Hant_MO)
- Updated time zone data to tzdata 2025a
- RBNF (Rule Based Number Formatting): Number spellout data improvements for multiple languages
- Assorted transforms improvements
- Updated and revised population data
- Incorporates all changes from CLDR v46.1.
- CLDR v46.1 was a special release, which many users of CLDR (including ICU) have not updated to. So the listed changes are relative to CLDR v46.0. v46.1 included the following:
- Message Format 2.0 (Final Candidate)
- More explicit well-formedness and validity constraints for unit of measurement identifiers
- Addition of derived emoji annotations that were missing: emoji with skin tones facing right
- Fixes to make the ja, ko, yue, zh datetimeSkeletons useful for generating the standard patterns
- Improved date/time test data
For more details, see below.
Locale Coverage Status
Count | Level | Usage | Examples |
---|---|---|---|
97 | Modern | Suitable for full UI internationalization | čeština, Ελληνικά, Беларуская, ᏣᎳᎩ, Ქართული, Հայերեն, עברית, اردو, አማርኛ, नेपाली, অসমীয়া, বাংলা, ਪੰਜਾਬੀ, ગુજરાતી, ଓଡ଼ିଆ, தமிழ், తెలుగు, ಕನ್ನಡ, മലയാളം, සිංහල, ไทย, ລາວ, မြန်မာ, ខ្មែរ, 한국어, 中文, 日本語, … |
16 | Moderate | Suitable for “document content” internationalization, eg. in spreadsheet | Akan, Balóchi [Látin], brezhoneg, Cebuano, føroyskt, IsiXhosa, Māori, sardu, veneto, Wolof, татар, тоҷикӣ, कांगड़ी, … |
55 | Basic | Suitable for locale selection, eg. choice of language on mobile phone | Basa Sunda, emakhuwa, Esperanto, eʋegbe, Frysk, Malti, босански (ћирилица), କୁୱି (ଅଡ଼ିଆ), కువి (తెలుగు), ᱥᱟᱱᱛᱟᱲᱤ, ᓀᐦᐃᓇᐍᐏᐣ, ꆈꌠꉙ, … |
* Note: Each release, the number of items needed for Modern and Moderate increases. So locales without active contributors may drop down in coverage level.
For a full listing, see Coverage Levels
Specification Changes
NOTE: The specification changes will be completed by the specification beta. Only a sample is listed here, and the Modifications section is not yet complete.
The following are the most significant changes to the specification (LDML).
- Don’t produce “Unknown City Time” for VVV and VVVV, use localized offset format instead CLDR-18237
- TBD (including Message Format, Part 9)
There are many more changes that are important to implementations, such as changes to certain identifier syntax and various algorithms. See the Modifications section of the specification for details.
Data Changes
TBD: Flesh out overview items
- Updated language matching for Afrikaans to English (en) from Dutch (nl) CLDR-18198
- Ordered scripts in
<languageData>
in descending order of usage per locale CLDR-18155 - Fixed certain invalid codes CLDR-18129
DTD Changes
Most of the DTD changes were in 46.1. One additional change was to order currency values in TBD get ticket number
For a full listing, see Delta DTDs.
Supplemental Data Changes
- Ordered scripts in decending order of usage per locale CLDR-18155
- Updated language matching for Afrikaans to English (en) from Dutch (nl) CLDR-18198
- Fixed invalid codes CLDR-18129
- TBD
For a full listing, see ¤¤BCP47 Delta and ¤¤Supplemental Delta
Locale Changes
- Cleanups for current pattern variants
alt="alphaNextToNumber"
andalt="noCurrency"
: These were introduced in CLDR 42 (per CLDR-14336) to provide a cleaner way of adjusting currency patterns when an alphabetic currency symbol is used, or when a currency-style pattern is desired without a currency symbol (as for use in a table). Gaps in the data coverage showed up, because the translators weren’t shown the right values. Fixes were made in CLDR-17879. - As noted below in Migration, number
<symbols>
elements and format elements (<currencyFormats>
,<decimalFormats>
,<percentFormats>
,<scientificFormats>
) should all have anumberSystem
attribute, and such elements without anumberSystem
attribute will be deprecated in CLDR 48. To prepare for this, in CLDR 47, all such elements were either removed (if redundant) or correct by adding anumberSystem
attribute. (CLDR-17760)
For a full listing, see Delta Data
Collation Data Changes
- Two old
zh
collation variants are removed: big5han and gb2312. They are no longer typically used, and only cover a fraction of the CJK ideographs. (CLDR-16062)
Number Spellout Data Changes
- Number spellout rules are added for Gujarati. (CLDR-18111)
- Number spellout rules are improved for several other languages:
- Bulgarian: Improve usage of ‘и’ (“and”). (CLDR-17818)
- Catalan: Add plural ordinal rules for both masculine and feminine, other fixes. (CLDR-15972)
- Dutch: Add the alternate spellout-cardinal-stressed rule for specific Dutch scenarios. (CLDR-17187)
- Hindi: Add the spellout-ordinal-masculine-oblique rule. (CLDR-15278)
- Indonesian: Add missing semicolon that caused all ordinals to be prefixed with “pertama 2:”. (CLDR-17730)
- Lithuanian: Add all of the grammatical cases, genders and grammatical numbers for cardinals and ordinals (no pronomial forms, and only the positive degree). (CLDR-18110)
- Russian: Fix grammatcial case names in rules, and other issues. (CLDR-17386)
- Ukrainian: Add digits-ordinal rules. (CLDR-16096)
Segmentation Data Changes
- The word break tailorings for
fi
andsv
are removed to align with recent changes to the root collation and recent changes to ICU behavior. (CLDR-18272)
Transform Data Changes
- A new
Hant-Latn
transform is added, andHans-Latn
is added as an alias for the existingHani-Latn
transform. When the Unihan datakMandarin
field has two values, the first is preferred for aCN
/Hans
context, and is used by theHani-Latn
/Hans-Latn
transform; the second is preferred for aTW
/Hant
context, and is now used by the newHant-Latn
transform. (CLDR-18080)
JSON Data Changes
- CLDR-11874 New package with subdivision data (Note: see known issues).
- CLDR-17176 Timezone data now has a
_type: "zone"
attribute indicating which objects are leaf timezones (America/Argentina/Catamarca
is a timezone,America/Argentina
is not.) - CLDR-18133 Currency data preserves the priority order (highest to lowest) of preferred currencies. This was an error in the DTD.
- CLDR-18277 Package name for transforms was incorrectly generated.
File Changes
In v47.0, but not 46.0:
- common/
- main/
- cop.xml, cop_EG.xml, en_CZ.xml, en_ES.xml, en_FR.xml, en_GS.xml, en_HU.xml, en_IT.xml, en_NO.xml, en_PL.xml, en_PT.xml, en_RO.xml, en_SK.xml, ht.xml, ht_HT.xml, yue_Hant_MO.xml
- rbnf/
- gu.xml
- testData/
- messageFormat/
- tests/
- bidi.json, fallback.json
- functions/
- currency.json, math.json
- u-options.json
- tests/
- transforms/
- und-Latn-t-und-hans.txt, und-Latn-t-und-hant.txt, Hant-Latin.xml keyboards/
- messageFormat/
- abnf/
- transform-from-required.abnf, transform-to-required.abnf
- main/
In 46.0, but not in 47.0:
- common/
- segments/
- fi.xml, sv.xml
- segments/
Tooling Changes
There were various SurveyTool improvements targeting expansion of DDL support and error detection, such as the following:
- Added a CLA check CLDR-17612
- Improved validity checks for codes CLDR-18129
- Improved ability to detect invalid URLs in the site and spec CLDR-16526
Keyboard Changes
Note: for the v48 timeframe, additional processes are being developed for broad intake of keyboards.
- CLDR-16836 Added EBNF for keyboard transform format to the spec, and ABNF data files. This provides rigorous definition of the allowed keyboard transform format, as well as programmatic validation of the keyboard transform format.
Migration
- Removal of number data without
numberSystem
attributes.- Number
<symbols>
elements and format elements (<currencyFormats>
,<decimalFormats>
,<percentFormats>
,<scientificFormats>
) should all have anumberSystem
attribute. In CLDR v48 such elements without anumberSystem
attribute will be deprecated, and the corresponding entries in root will be removed; these were only intended as a long-ago migration aid. See the relevant sections of the LDML specification: Number Symbols and Number Formats.
- Number
- V48 advance warnings
- Any locales that are missing Core data by the end of the CLDR 48 cycle will be removed CLDR-16004
- The default week numbering will change to ISO instead being based on the calendar week starting in CLDR 48 CLDR-18275.
Known Issues
- CLDR-18219
common/subdivisions
data files contained additional values that should not be present. These will be removed in the future, but note that they may be present in the new JSON data:- Non-subdivisions such as
AW
: Use the region codeAW
instead for translation. - Overlong subdivisions such as
fi01
: Use the region codeAX
instead for translation.
- Non-subdivisions such as
Acknowledgments
Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing.
The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.
For web pages with different views of CLDR data, see charts.