Class

TextEncoding


Description

Used to specify the text encoding of a String.

Properties

Name

Type

Read-Only

Shared

Base

Integer

Code

Integer

Format

Integer

InternetName

String

Variant

Integer

Methods

Name

Parameters

Returns

Shared

Chr

codePoint As Integer

String

Equals

otherEncoding As TextEncoding

Boolean

IsValidData

text As String

Boolean

Operator_Compare

rhs As TextEncoding

Integer

Property descriptions


TextEncoding.Base

Base As Integer

The type of encoding. The entry for TextConverter contains the possible values of Base.

This property is read-only.

This example returns the Base, Format, and Variant values for the text in TextArea1.

Var t As TextEncoding
t = Encoding(TextArea1.Text)
If t <> Nil Then
  Label1.Text = "Base=" + t.Base.ToString
  Label2.Text = "Format=" + t.Format.ToString
  Label3.Text = "Variant=" + t.Variant.ToString
End If

TextEncoding.Code

Code As Integer

The Mac OS TextEncoding value, useful for Declares.

This property is read-only.

You can also use it to compare two encodings: If the Code properties of two TextEncoding objects are equal, then they represent the same text encoding (including base, variation, and format).

This code gets the value of Code for the text in TextArea1.

Var t As TextEncoding
t = Encoding(TextArea1.Text)
If t <> Nil Then
  Label4.Text = t.Code.ToString
End If

TextEncoding.Format

Format As Integer

A format for the Base text encoding.

This property is read-only.

Used by Unicode for defining which format of Unicode you wish to use.

This example returns the Base, Format, and Variant values for the text in TextArea1.

Var t As TextEncoding
t = Encoding(TextArea1.Text)
If t <> Nil Then
  Label1.Text = "Base=" + t.Base.ToString
  Label2.Text = "Format=" + t.Format.ToString
  Label3.Text = "Variant=" + t.Variant.ToString
End If

TextEncoding.InternetName

InternetName As String

Internet Text Encoding name, such as UTF-8.

This property is read-only.

This example gets the internet name of the text encoding in TextArea1.

Var t As TextEncoding
t = Encoding(TextArea1.Text)
If t <> Nil Then
  Var s As String
  s = t.InternetName
  MessageBox(s)
End If

TextEncoding.Variant

Variant As Integer

A variant of the Base text encoding.

This property is read-only.

The entry for TextConverter contains the possible values of Variant.

This example returns the Base, Format, and Variant values for the text in TextArea1.

Var t As TextEncoding
t = Encoding(TextArea1.Text)
If t <> Nil then
  Label1.Text = "Base=" + t.Base.ToString
  Label2.Text = "Format=" + t.Format.ToString
  Label3.Text = "Variant=" + t.Variant.ToString
End If

Method descriptions


TextEncoding.Chr

Chr(codePoint As Integer) As String

Returns the character in the given encoding specified by the Unicode codePoint.


TextEncoding.Equals

Equals(otherEncoding As TextEncoding) As Boolean

Compares the given encoding to the passed encoding. Returns a Boolean.

This example compares the encodings of the contents of two TextAreas.

Var t1 As TextEncoding
Var t2 As TextEncoding
t1 = Encoding(TextArea1.Text)
t2 = Encoding(TextArea3.Text)
Label1.Text = t1.InternetName
Label2.Text = t2.InternetName
If t1.Equals(t2) Then
  TextField1.Text = "Equals"
Else
  TextField1.Text = "Not equals"
End If

TextEncoding.IsValidData

IsValidData(text As String) As Boolean

Checks to see whether the passed string is valid for that encoding.

If Encodings.UTF8.IsValidData(myText) Then
  ' do whatever you want here...
End If

TextEncoding.Operator_Compare

Operator_Compare(rhs As TextEncoding) As Integer

Compares the given encoding to the passed encoding. Currently, only the equals comparison is implemented; relational testing is undefined. As a result, it currently calls the Equals function.

This example compares the encodings of the contents of two TextAreas using Operator_Compare.

Var t1 As TextEncoding
Var t2 As TextEncoding
t1 = Encoding(TextArea1.Text)
t2 = Encoding(TextArea3.Text)
Label1.Text = t1.InternetName
Label2.Text = t2.InternetName
If t1 = t2 Then
  TextField1.Text = "Equals"
Else
  TextField1.Text = "Not equals"
End If

Notes

When a computer stores text, it encodes each character as a numeric value and stores the byte (or bytes) associated with that number. When it needs to display or print that character, it consults the encoding scheme to determine which character the number represents.

The first computers used the encoding scheme called "ASCII", which stands for American Standard Code for Information Interchange. It specified 128 values and includes codes for upper and lower case letters, numbers, the common symbols on a keyboard, and some "invisible" control codes that were heavily used in early computers.

As computers became more sophisticated and were introduced in non-English speaking countries, the limitations of the ASCII encoding scheme became apparent. It didn't include codes for accented characters and had no chance of handling idiographic languages, such as Japanese or Chinese, which require thousands of characters.

As a result, extensions to the ASCII encoding scheme were developed. Outside the range of 0-127, the schemes, in general, do not agree. For example, in the US macOS and Windows computers use different encodings for codes 128-255. Many other encoding schemes for handling languages that use non-ASCII characters have been developed.

The most general solution to the problem is an encoding called Unicode. It is designed to handle every character in every language. It also enables you to represent a mixture of languages within one text stream. However, not all strings that you may encounter use Unicode.

When you encounter a string, you need to know its encoding in order to interpret the sequence of bytes (or double-bytes) that make up the string's content. By default, every string contains both the bytes (content) and the encoding (if it is known; it is Nil if not known). Two different formats of Unicode are supported: UTF-8 and UTF-16. All strings in your project are compiled as UTF-8. This is a Unicode encoding that uses one byte for ASCII characters and up to four bytes for non-ASCII characters.

If you work only with strings that are created and managed within your own application, you probably don't need to deal with encodings directly, as the issues are taken care of by everything using UTF-8. However, if you receive strings from an outside source such as via the internet, an external database (that is, not SQLite), or a text file, you should let specify what encoding is used. If the string is a Memoryblock, the encoding will be Nil.

You can assign an encoding to a string in several ways. For example, if you are reading the string using the TextInputStream class, you use the Encoding property. The Encodings module gives you access to all known encodings. Here is an example that reads a text file that uses the UTF8 encoding:

Var f As FolderItem
Var t As TextInputStream
f = FolderItem.ShowOpenFileDialog("text") ' file type defined as as File Type
If f <> Nil Then
  t = TextInputStream.Open(f)
  t.Encoding = Encodings.UTF8 ' specify encoding of input stream
  TextArea1.Text = t.ReadAll
  t.Close
End If

Also, the Read, ReadLine, and ReadAll methods take an optional parameter that lets you specify the encoding.

If you need to output a string in a specific encoding, you can use the ConvertEncoding function to do so. For example, this code converts the text in a DesktopTextField to the WindowsANSI encoding:

Var s As String
s = TextField1.Text.ConvertEncoding(Encodings.WindowsANSI)

You will find text encoding helpful if you develop:

  • Internet applications, such as web browsers or e-mail applications

  • Applications that transfer text across different platforms

  • Applications based in Unicode

The Encoding function makes it easy to obtain the TextEncoding of any string. Use the Encodings module to obtain a specified text encoding. Some of the most useful are UTF8, UTF16, UTF32, ASCII, MacRoman, MacJapanese, and WindowsLatin1. Use the Autocomplete feature of the Code Editor to view the complete list.


Ascii codes

The following table presents the ASCII character codes. It presents the Decimal, Hex, and Octal values for ASCII codes (0 to 127).

Decimal

Hex

Octal

Result

Decimal

Hex

Octal

Result

0

0

0

NUL

32

20

40

SP

1

1

1

SOH

33

21

41

!

2

2

2

STX

34

22

42

"

3

3

3

ETX

35

23

43

#

4

4

4

EOT

36

24

44

$

5

5

5

ENQ

37

25

45

%

6

6

6

ACK

38

26

46

&

7

7

7

BEL

39

27

47

'

8

8

10

BS

40

28

50

(

9

9

11

HT

41

29

51

)

10

A

12

LF

42

2A

52

*

11

B

13

VT

43

2B

53

+

12

C

14

FF

44

2C

54

,

13

D

15

CR

45

2D

55

-

14

E

16

SO

46

2E

56

.

15

F

17

SI

47

2F

57

/

16

10

20

DLE

48

30

60

0

17

11

21

DC1

49

31

61

1

18

12

22

DC2

50

32

62

2

19

13

23

DC3

51

33

63

3

20

14

24

DC4

52

34

64

4

21

15

25

NAK

53

35

65

5

22

16

26

SYN

54

36

66

6

23

17

27

ETB

55

37

67

7

24

18

30

CAN

56

38

70

8

25

19

31

EM

57

39

71

9

26

1A

32

SUB

58

3A

72

:

27

1B

33

ESC

59

3B

73

;

28

1C

34

FS

60

3C

74

<

29

1D

35

GS

61

3D

75

=

30

1E

36

RS

62

3E

76

>

31

1F

37

US

63

3F

77

?

64

40

100

@

96

60

140

'

65

41

101

A

97

61

141

a

66

42

102

B

98

62

142

b

67

43

103

C

99

63

143

c

68

44

104

D

100

64

144

d

69

45

105

E

101

65

145

e

70

46

106

F

102

66

146

f

71

47

107

G

103

67

147

g

72

48

110

H

104

68

150

h

73

49

111

I

105

69

151

i

74

4A

112

J

106

6A

152

j

75

4B

113

K

107

6B

153

k

76

4C

114

L

108

6C

154

l

77

4D

115

M

109

6D

155

m

78

4E

116

N

110

6E

156

n

79

4F

117

O

111

6F

157

o

80

50

120

P

112

70

160

p

81

51

121

Q

113

71

161

q

82

52

122

R

114

72

162

r

83

53

123

S

115

73

163

s

84

54

124

T

116

74

164

t

85

55

125

U

117

75

165

u

86

56

126

V

118

76

166

v

87

57

127

W

119

77

167

w

88

58

130

X

120

78

170

x

89

59

131

Y

121

79

171

y

90

5A

132

Z

122

7A

172

z

91

5B

133

[

123

7B

173

{

92

5C

134

\

124

7C

174

|

93

5D

135

]

125

7D

175

}

94

5E

136

^

126

7E

176

~

95

5F

137

_

127

7F

177

DEL

Sample code

The following example obtains the TextEncoding of the string passed to the Encoding function.

Var t As TextEncoding
t = Encoding(TextArea1.Text)
If t <> Nil then
  Label1.Text = "Base=" + t.Base.ToString
  Label2.Text = "Format=" + t.Format.ToString
  Label3.Text = "Variant=" + t.Variant.ToString
End If

The following statement uses the Encodings module to obtain the UTF8 text encoding for text in a DesktopTextField.

TextField2.Text = TextField1.Text.DefineEncoding(Encodings.UTF8)

The following example uses the Chr method to obtain the character corresponding to the code point of 165 for the MacRoman encoding, the bullet character (&#8226;):

Var s As String
s = Encodings.MacRoman.Chr(165)

Compatibility

All project types on all supported operating systems.