CharCast 的坑

CharCast是定义在 StringConv.h 的模板函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
/**
* Casts one fixed-width char type into another.
*
* @param Ch The character to convert.
* @return The converted character.
*/
template <typename To, typename From>
FORCEINLINE To CharCast(From Ch)
{
To Result;
FPlatformString::Convert(&Result, 1, &Ch, 1, (To)UNICODE_BOGUS_CHAR_CODEPOINT);
return Result;
}

就是对 FPlatformString::Convert 的转发调用。

PS:UNICODE_BOGUS_CHAR_CODEPOINT 宏定义为'?'

FPlatformString::Convert有两个版本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
/**
* Converts the [Src, Src+SrcSize) string range from SourceChar to DestChar and writes it to the [Dest, Dest+DestSize) range.
* The Src range should contain a null terminator if a null terminator is required in the output.
* If the Dest range is not big enough to hold the converted output, NULL is returned. In this case, nothing should be assumed about the contents of Dest.
*
* @param Dest The start of the destination buffer.
* @param DestSize The size of the destination buffer.
* @param Src The start of the string to convert.
* @param SrcSize The number of Src elements to convert.
* @param BogusChar The char to use when the conversion process encounters a character it cannot convert.
* @return A pointer to one past the last-written element.
*/
template <typename SourceEncoding, typename DestEncoding>
static FORCEINLINE typename TEnableIf<
// This overload should be called when SourceEncoding and DestEncoding are 'compatible', i.e. they're the same type or equivalent (e.g. like UCS2CHAR and WIDECHAR are on Windows).
TAreEncodingsCompatible<SourceEncoding, DestEncoding>::Value,
DestEncoding*
>::Type Convert(DestEncoding* Dest, int32 DestSize, const SourceEncoding* Src, int32 SrcSize, DestEncoding BogusChar = (DestEncoding)'?')
{
if (DestSize < SrcSize)
return nullptr;

return (DestEncoding*)Memcpy(Dest, Src, SrcSize * sizeof(SourceEncoding)) + SrcSize;
}


template <typename SourceEncoding, typename DestEncoding>
static typename TEnableIf<
// This overload should be called when the types are not compatible but the source is fixed-width, e.g. ANSICHAR->WIDECHAR.
!TAreEncodingsCompatible<SourceEncoding, DestEncoding>::Value && TIsFixedWidthEncoding<SourceEncoding>::Value,
DestEncoding*
>::Type Convert(DestEncoding* Dest, int32 DestSize, const SourceEncoding* Src, int32 SrcSize, DestEncoding BogusChar = (DestEncoding)'?')
{
const int32 Size = DestSize <= SrcSize ? DestSize : SrcSize;
bool bInvalidChars = false;
for (int I = 0; I < Size; ++I)
{
SourceEncoding SrcCh = Src[I];
Dest[I] = (DestEncoding)SrcCh;
bInvalidChars |= !CanConvertChar<DestEncoding>(SrcCh);
}

if (bInvalidChars)
{
for (int I = 0; I < Size; ++I)
{
if (!CanConvertChar<DestEncoding>(Src[I]))
{
Dest[I] = BogusChar;
}
}

LogBogusChars<DestEncoding>(Src, Size);
}

return DestSize < SrcSize ? nullptr : Dest + Size;
}

其中关键的是第二个实现, 通过判断 CanConvertChar 来检测是否能够转换字符,如果不能转换就把转换结果设置为 BogusChar,默认也就是?,这也是把不同编码的数据转换为 FString 有些会显示一堆? 的原因。

1
2
3
4
5
6
7
8
9
10
11
/**
* Tests whether a particular character can be converted to the destination encoding.
*
* @param Ch The character to test.
* @return True if Ch can be encoded as a DestEncoding.
*/
template <typename DestEncoding, typename SourceEncoding>
static bool CanConvertChar(SourceEncoding Ch)
{
return IsValidChar(Ch) && (SourceEncoding)(DestEncoding)Ch == Ch && IsValidChar((DestEncoding)Ch);
}

所以:类似 LoadFileToString 去读文件如果编码不支持,那么读出来的数据和原始文件里是不一样的。