Others
 

String Length

01/05/2024

       

TOC

 

1.Meaning of the "String Length"

String.Length etc can count correctly only characters that are in the Unicode Plane 0 (BMP)

Almost string counting functions in .NET are rely on String.Length.

In case of count characters in the Unicode Plane 1 and beyond or count apparent number of characters, you can count the number of Runes or use the features in StringInfo class.

Characters used in everyday life, including those in Japan and China, are included in the Unicode Plane 0 (BPM). Normally, they don't use Unicode Plane 1 or later characters or combining characters.

Many emojis like 🗿 are in the Unicode Plane 1, and I feel recently It increase the people who often use these emojis. To count these kind of Emoji as 1 character, you must not use String.Length.

In Japan and China, in rare cases, proper nouns such as place names and personal names may use characters from the Unicode Plane 0 onwards.

In addition, There are many variants in Japanese Ideographs (kanji) with slightly different shapes such as the position of the dots, and it is possible to express such differences using combining characters. However, this usage is hardly widespread as of 2024.

Ancient scripts, such as ancient Egyptian hieroglyphs, or characters used for special purposes, may be included in the Unicode Plane 1 and beyond.

Here are the characters used in the image above so that you can test them.

A1&あ각★☃ΩⒽ山竜你𩸽𓀉𫛣🗿🗻🗼

あ゙あ゚あ̰👩‍👨‍👦‍👧🏴‍☠⚔️

 

 

2.Sample Program

I wrote a sample program written as Console App(.NET 8) to count characters by three ways.

Depending on your environment, the characters may not be displayed properly. Because the font you use may not contain the characters or the software you use may not be fully compatible to Unicode specifications.

C#

namespace CsConsole
{
    internal class Program
    {
        static void Main(string[] args)
        {
            PrintLength("A");
            PrintLength("あ");
            PrintLength("☃");
            PrintLength("山");
            PrintLength("𩸽");
            PrintLength("𓀉");
            PrintLength("🗿");
            PrintLength("あ゙");
            PrintLength("👩‍👨‍👦‍👧");
            PrintLength("⚔️");
        }

        private static void PrintLength(string text)
        {
            int len1 = text.Length; //String.Length
            int len2 = text.EnumerateRunes().Count(); //Count of Runes
            int len3 = EnumerateTextElements(text).Count(); //Count of TextElements
            System.Diagnostics.Debug.WriteLine($"{text} Length:{len1} Runes:{len2} Text Elements:{len3}");
        }

        private static IEnumerable<string> EnumerateTextElements(string text)
        {
            var enumerator = System.Globalization.StringInfo.GetTextElementEnumerator(text);
            while (enumerator.MoveNext())
            {
                yield return (string)enumerator.Current;
            }
        }
    }
}

Where Debug.WriteLine output to

 

VB

Module Program
    Sub Main(args As String())
        PrintLength("A")
        PrintLength("あ")
        PrintLength("☃")
        PrintLength("山")
        PrintLength("𩸽")
        PrintLength("𓀉")
        PrintLength("🗿")
        PrintLength("あ゙")
        PrintLength("👩‍👨‍👦‍👧")
        PrintLength("⚔️")
    End Sub

    Private Sub PrintLength(text As String)

        Dim len1 As Integer = text.Length 'String.Length
        Dim len2 As Integer = text.EnumerateRunes().Count() 'Number of Runes
        Dim len3 As Integer = EnumerateTextElements(text).Count() 'Number of TextElements
        Debug.WriteLine($"{text} Length:{len1} Runes:{len2} Text Elements:{len3}")

    End Sub

    Private Iterator Function EnumerateTextElements(text As String) As IEnumerable(Of String)
        Dim enumerator = Globalization.StringInfo.GetTextElementEnumerator(text)
        While enumerator.MoveNext()
            Yield CStr(enumerator.Current)
        End While
    End Function
End Module

Where Debug.WriteLine output to

 

 

Executing the program, you will see following result in the output window.

A Length:1 Runes:1 Text Elements:1
あ Length:1 Runes:1 Text Elements:1
☃ Length:1 Runes:1 Text Elements:1
山 Length:1 Runes:1 Text Elements:1
𩸽 Length:2 Runes:1 Text Elements:1
𓀉 Length:2 Runes:1 Text Elements:1
🗿 Length:2 Runes:1 Text Elements:1
あ゙ Length:2 Runes:2 Text Elements:1
👩‍👨‍👦‍👧 Length:11 Runes:7 Text Elements:1
⚔️ Length:2 Runes:2 Text Elements:1

 

Reference

Example: count char, Rune, and text element instances

https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-encoding-introduction#example-count-char-rune-and-text-element-instances

 

The world of the Unicode ~Emoji Combinations~ (This article is Japanese)

https://qiita.com/noritsune/items/46134cb7a50236540be5

 

 


日本語版