This navigation UI is temporary, just to give access to the pages.

CLDR 46 Release Note

No. Date Rel. Note Data Charts Spec Delta GitHub Tag Delta DTD CLDR JSON
46 2024-10-XX v46 CLDR46 Charts46 LDML46 Δ46 release-46 ΔDtd46 46.0.0

This is an alpha version of CLDR v46.

It only covers the data, which is available at release-46-alpha3. An update targeted at September 25 will include includes specification changes and fix other TBDs. Feedback is welcome via tickets. (The CLDR site is undergoing a migration to Markdown, so the UI for navigation is temporary.)

Overview

Unicode CLDR provides key building blocks for software supporting the world’s languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

The most significant changes in this release were:

Locale Coverage Status

Current Levels

Count Level Usage Examples
97 Modern Suitable for full UI internationalization čeština, Ελληνικά‎, Беларуская‎, ‎ᏣᎳᎩ‎, Ქართული‎, ‎Հայերեն‎, ‎עברית‎, ‎اردو‎, አማርኛ‎, ‎नेपाली‎, অসমীয়া‎, ‎বাংলা‎, ‎ਪੰਜਾਬੀ‎, ‎ગુજરાતી‎, ‎ଓଡ଼ିଆ‎, தமிழ்‎, ‎తెలుగు‎, ‎ಕನ್ನಡ‎, ‎മലയാളം‎, ‎සිංහල‎, ‎ไทย‎, ‎ລາວ‎, မြန်မာ‎, ‎ខ្មែរ‎, ‎한국어‎, 中文, 日本語‎, … ‎
16 Moderate Suitable for “document content” internationalization, eg. in spreadsheet Akan, Balóchi [Látin], brezhoneg, Cebuano, føroyskt, IsiXhosa, Māori, sardu, veneto, Wolof, татар, тоҷикӣ, कांगड़ी‎, …
55 Basic Suitable for locale selection, eg. choice of language on mobile phone Basa Sunda, emakhuwa, Esperanto, eʋegbe, Frysk, Malti, босански (ћирилица), କୁୱି (ଅଡ଼ିଆ), కువి (తెలుగు), ᱥᱟᱱᱛᱟᱲᱤ, ᓀᐦᐃᓇᐍᐏᐣ‬, ꆈꌠꉙ‎, …

Changes

± New Level Locales
📈 Modern Nigerian Pidgin, Tigrinya
📈 Moderate Akan, Baluchi (Latin), Kangri, Tajik, Tatar, Wolof
📈 Basic Ewe, Ga, Kinyarwanda, Konkani (Latin), Northern Sotho, Oromo, Sichuan Yi, Southern Sotho, Tswana
📉 Basic* Chuvash, Anii

* Note: Each release, the number of items needed for Modern and Moderate increases. So locales without active contributors may drop down in coverage level.

For a full listing, see Coverage Levels

Specification Changes

TBD: Add the specification changes by Sept 25

Data Changes

DTD Changes

  1. Added alt='official' to represent cases where an official value differs from the customary value. Currently added for a small number of language names, decimal separators, and grouping separators.
  2. Added new numbering systems from Unicode 16.0.

For a full listing, see Delta DTDs.

Supplemental Data Changes

  1. Currency
    1. New currency code ZWG added — because it was late in the cycle, many locales will just support the code (no symbol or name).
  2. Dates & Times
    1. Added a new calendar type, iso8601. This is not the same as the ISO 8601 standard format, which is designed just for data interchange: it is all ASCII, doesn’t have all the options for fields (like “Sunday”, “BC”, or “AM”), and does not contain spaces. The CLDR iso8601 calendar uses patterns in the order: era, year, month, day, day-of-week, hour, minute, second, day-period, timezone
    2. Changed the metazone for Kazakhstan to reflect removal of Asia/Almaty, thus dropping the distinction among different regions in Kazakhstan.
    3. Added support for deprecated timezone codes by remapping: CST6CDT → America/Chicago, EST → America/Panama, EST5EDT → America/New_York, MST7MDT → America/Denver, PST8PDT → America/Los_Angeles.
  3. Units
    1. Added units: portion-per-1e9 (aka per-billion), night (for hotel stays), light-speed (as an internal prefix for light-second, light-minute, etc.)
    2. Changed preferred wind speed preference for some locales to meter-per-second. More preference changes are planned for the next release.
  4. Minimization for likelySubtags removes many additional redundant mappings.
    • For example, the mapping acy_Grek → acy_Grek_CY is unnecessary, because the mapping acy → acy_Latn_CY is sufficient. For the reason why, see the algorithm in Likely Subtags.
    • The ordering in the file is more consistent; first the main mappings, then the mapping from region and/or script to likely language, then the data contributed by SIL.
    • The regions have been cleaned up: there are no entries with ZZ, and 001 is limited to artifical languages such as Interlingua. The only other macroregion code is in und_419 → es_Latn_419 (Spanish‧Latin‧Latin America)
  5. Language matching
    • Dropped the fallback mapping desired="uk" → supported="ru" (so that Ukrainian (uk) doesn’t fall back to Russian (ru)).
      • Note: A fallback language is used when the user’s primary language is unavailable, and either the user doesn’t have any secondaries language in their settings (as on Android or iOS) or those secondary languages are also not available. As a result of this change, when the primary and secondary languages are not available, the fallback language would be the system default instead of Russian.
    • Added the mapping desired="scn" → supported="it" (Sicilian → Italian).
    • Changed the deprecated code Goan Konkani (gom) to Konkani (kok).
  6. Transforms
    1. Major update to Han → Latn, reflecting new data in Unicode 16.0
    2. Fixes for Arabic numbers, and a Farsi vowel
  7. Other Unicode 16.0 changes
    1. Additional numbering systems
    2. Additional scripts and script identifiers
    3. ScriptMeta has been expanded for Unicode 16.0
  8. Other updates
    1. The subdivision identifiers have been updated to the latest available from ISO
      • The removed identifiers have been deprecated
      • Missing names have been added (from Wikidata)
    2. The language subtags, script subtags, and variant subtags have been updated to the latest from IANA
      • Some codes have been deprecated
    3. Parent and defaultContent mappings have been added for Kara-Kalpak (kaa) and Konkani (kok); defaultContent mappings have bee added for Kazakh (kk), Ladin (lld), Latgalian (ltg), Mócheno (mhn), and Chinese (Latin, China) (zh_Latn_CN).
    4. Territory Info (gdp, population, languages) has been updated from World Bank and other sources.
    5. LanguageGroup info has been updated from Wikidata
    6. Plural rules have been added for some new locales
    7. Week data
      • The first day of the week has been changed for AE
      • Hour preferences (12 v 24) have been added for English as used in Hong Kong, Malaysia, and Israel (en_HK, en_MY, en_IL)

For a full listing, see ¤¤BCP47 Delta and ¤¤Supplemental Delta

Locale Changes

  1. Major changes to emoji search keywords and short names (see below)
  2. Major changes to Chinese collation, reflecting new data in Unicode 16.0
  3. Added iso8601 patterns to root. These will use localized months, days of the week, day periods, and timezones. In this first version, the separators are not localized, and will use “-“ within numeric dates, “:” within times, and “ “ or “, “ between major elements. Full localization will await the next submission phase for CLDR.
  4. Other changes
    1. Various locales also had smaller improvements agreed to by translators.
    2. Additional test files have been added.

For a full listing, see Delta Data

Emoji Search Keywords

The usage model for emoji search keywords is that

In this release WhatsApp emoji search keyword data has been incorporated. In the process of doing that, the maximum number of search keywords per emoji has been increased, and the keywords have been simplified in most locales by breaking up multi-word keywords. An example would be white flag (🏳️), formerly having 3 keyword phrases of [white waving flag | white flag | waving flag], now being replaced by the simpler 3 single keywords [white | waving | flag]. The simpler version typically works as well or better in practice.

Collation Data Changes

There are two significant changes to the CLDR root collation (CLDR default sort order).

Realigned With DUCET

The DUCET is the Unicode Collation Algorithm default sort order. The CLDR root collation is a tailoring of the DUCET. These sort orders have differed in the relative order of groups of characters including extenders, currency symbols, and non-decimal-digit numeric characters.

Starting with CLDR 46 and Unicode 16.0, the order of these groups is the same. In both sort orders, non-decimal-digit numeric characters now sort after decimal digits, and the CLDR root collation no longer tailors any currency symbols (making some of them sort like letter sequences, as in the DUCET).

These changes eliminate sort order differences among almost all regular characters between the CLDR root collation and the DUCET. See the CLDR root collation documentation for details.

Improved Han Radical-Stroke Order

CLDR includes data for sorting Han (CJK) characters in radical-stroke order. It used to distinguish traditional and simplified forms of radicals on a higher level than sorting by the number of residual strokes. Starting with CLDR 46, the CLDR radical-stroke order matches that of the Unicode Radical-Stroke Index (large PDF). Its sorting algorithm is defined in UAX #38. Traditional vs. simplified forms of radicals are distinguished on a lower level than the number of residual strokes. This also has an effect on alphabetic indexes for radical-stroke sort orders, where only the traditional forms of radicals are now available as index characters.

JSON Data Changes

  1. Separate modern packages were dropped [CLDR-16465]
  2. Adding transliteration rules [CLDR-16720] (In progress)

Markdown

The CLDR site is in the process of being moved to markdown source (GFM), which will regularize the formatting and make it easier to maintain and extend than with Google Sites. The URLs will remain the same. This process should be completed before release.

File Changes

Most files added in this release were for new locales. There were the following new test files:

TBD*

Tooling Changes

TBD

Migration

  1. Databases that use collation keys are sensitive to any changes in collation, and will need reindexing. This can happen with any CLDR release (especially those for a new version of Unicode), but more characters are affected in this release: see above.

TBD

Known Issues

  1. CLDR-17095. The region-based firstDay value (see weekData) is currently used for several different purposes. In the future, some of these functions will be separated out:
    • The day that should be shown as the first day of the week in a calendar view.
    • The first day of the week (day 1) for weekday numbering.
    • The first day of the week for week-of-year calendar calculations.
  2. CLDR-17505. Blocking items are obsolete: the spec needs to be corrected to use @ORDERED

Acknowledgments

Many people have made significant contributions to CLDR and LDML; see the Acknowledgments page for a full listing. We’d also like to acknowledge the work done by interns this release: TBD

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.