Handle string in Golang
December 04, 2024
Background
Studying Golang, I face some things using strings. It is different from JavaScript.
for ... range
supported in Golang works out of my expectation.
So I will introduce some examples in this post.
string in Go
In Go, string is a type consisting of char type. It just a array of char. And Go encode string into UTF-8 internally.
On other hand, there is a type rune
. rune
has a character in UTF-8.
So it can be 1 byte or 4 bytes at most.
My first language, Korean is encoded into 3 bytes.
For ... Range
You usually use for ... range
syntax to iterate all elements of one array.
Of coursley, It can be applied in string case.
k_str := "이 세상에 온 것을 환영해"
for _, c := range k_str {
fmt.Printf("%c(%d) ", c, c)
}
// result: 이(51060) (32) 세(49464) 상(49345) 에(50640) (32) 온(50728) (32) 것(44163) 을(51012) (32) 환(54872) 영(50689) 해(54644)
And you should know c
in iteration is converted into rune internally.
I will show you index instead of binary data.
k_str := "이 세상에 온 것을 환영해"
for i, c := range k_str {
fmt.Printf("%c(%d) ", c, i)
}
// result: 이(0) (3) 세(4) 상(7) 에(10) (13) 온(14) (17) 것(18) 을(21) (24) 환(25) 영(28) 해(31)
You can find it has different index from string where the character is real located.
It is related to UTF-8 encoding.
Korean occupies 3 bytes in UTF-8, so indices point the character's start within the bytes sizes.
Slice of string
fmt.Printf("%c", k_str[0])
// result: ì
So if you just try to take slice of string like above, you can get wrong character.
Solution
k_str_in_rune := []rune(k_str)
for i, c := range k_str_in_rune {
fmt.Printf("%c(%d) ", c, i)
}
// result: 이(0) (3) 세(4) 상(7) 에(10) (13) 온(14) (17) 것(18) 을(21) (24) 환(25) 영(28) 해(31) 이(0) (1) 세(2) 상(3) 에(4) (5) 온(6) (7) 것(8) 을(9) (10) 환(11) 영(12) 해(13)
You can format string into array of runes.
And this case show that indices match each location in string.
fmt.Printf("%c", k_str_in_rune[0])
// result: 이
It seems to be not wrong in slice of runes.
It can be more easy to compare character in each two strings actually.