Function Selector and Argument Encoding

Article Summary

GPT 4

在 Ethereum 生态系统中，ABI (Application Binary Interface，应用二进制接口) 是从区块链外部与合约进行交互以及合约与合约间进行交互的一种标准方式。数据会根据其类型按照这份手册中说明的方法进行编码。

Function Selector

原理

某个函数签名的 Keccak (SHA-3) 哈希的前 4 字节，指定了要调用的函数，形如 bytes4(keccak256(‘balanceOf(address)’)) == 0x70a08231 这种形式，0x70a08231 便是 balanceOf(address) 的 Function Selector

基础原型即是函数名称加上由括号括起来的参数类型列表，参数类型间由一个逗号分隔开，且没有空格
对于 uint 类型，要转成 uint256 进行计算，比如 ownerOf(uint256) 其 Function Selector = bytes4(keccak256(‘ownerOf(uint256)’)) == 0x6352211e
函数参数包含结构体，相当于把结构体拆分成单个参数，只不过这些参数用 () 扩起来

Argument Encoding

从第5字节开始是被编码的参数。这种编码方式也被用在其他地方，比如，返回值和事件的参数也会被用同样的方式进行编码，而用来指定函数的4个字节则不需要再进行编码。

类型编码

以下是基础类型：

uint<M>： M 位的无符号整数， 0 < M <= 256、 M % 8 == 0。例如： uint32， uint8， uint256。
int<M>：以 2 的补码作为符号的 M 位整数， 0 < M <= 256、 M % 8 == 0。
address：除了字面上的意思和语言类型的区别以外，等价于 uint160。在计算和函数选择器Function Selector 中，通常使用 address。
uint、 int： uint256、 int256 各自的同义词。在计算和函数选择器Function Selector 中，通常使用 uint256 和 int256。
bool：等价于 uint8，取值限定为 0 或 1 。在计算和函数选择器Function Selector 中，通常使用 bool。
fixed<M>x<N>： M 位的有符号的固定小数位的十进制数字 8 <= M <= 256、 M % 8 == 0、且 0 < N <= 80。其值 v 即是 v / (10 ** N)。（也就是说，这种类型是由 M 位的二进制数据所保存的，有 N 位小数的十进制数值。译者注。）
ufixed<M>x<N>：无符号的 fixed<M>x<N>。
fixed、 ufixed： fixed128x18、 ufixed128x18 各自的同义词。在计算和函数选择器Function Selector 中，通常使用 fixed128x18 和 ufixed128x18。
bytes<M>： M 字节的二进制类型， 0 < M <= 32。
function：一个地址（20 字节）之后紧跟一个函数选择器Function Selector （4 字节）。编码之后等价于 bytes24。

以下是定长数组类型：

<type>[M]：有 M 个元素的定长数组， M >= 0，数组元素为给定类型。

注解

尽管此ABI规范可以表示零个元素的定长数组，但编译器不支持它们。

以下是非定长类型：

bytes：动态大小的字节序列。
string：动态大小的 unicode 字符串，通常呈现为 UTF-8 编码。
<type>[]：元素为给定类型的变长数组。

可以将若干类型放到一对括号中，用逗号分隔开，以此来构成一个元组tuple：

(T1,T2,...,Tn)：由 T1，…， Tn， n >= 0 构成的元组tuple。

用元组tuple 构成元组tuple、用元组tuple 构成数组等等也是可能的。另外也可以构成“零元组（zero-tuples）”，就是 n = 0 的情况。

Solidity 到 ABI 类型映射

Solidity 支持上面介绍的所有同名称的类型，除元组外。另一方面，一些 Solidity 类型不被 ABI 支持。下表在左栏显示了不支持 ABI 的 Solidity 类型，以及在右栏显示可以代表它们的 ABI 类型。

Solidity	ABI
address payable	`address`
contract	`address`
enum	`uint8`
user defined value types	its underlying value type
struct	`tuple`

Function Selector and Argument Encoding

动态类型的数据，比如动态数组，结构体，变长字节，其编码后存储其 offset、length、data
- 先把参数顺序存储：如果是定长数据类型，直接存储其 data，如果是变长数据类型，先存储其 offset
- 顺序遍历变长数据：先存储 offset，对于第一个变长数据，先存储其 offset = 0x20 * number ( number 是函数参数的个数 )；对于下一个变长数据，其 offset = offset_of_prev + 0x20 + 0x20 * number (第一个 0x20 是存储前一个变长数据的长度占用的大小，number 是前一个变长数据的元素个数)
- 顺序遍历变长数据：存储完 offset ，接着就是遍历每个变长数据，分别存储其 length 和 data
- ( ps: 对于结构体这样的类型，存储的时候可把结构体内元素看成是一个新函数的参数，这样的话，对于结构体中的第一个变长数据，其 offset = 0x20 * num ，num 是结构体元素的个数 )

test7([[1, 2], [3]], ["one", "two", "three"])

同理进行由内向外的拆分，首先是[[1, 2], [3]]动态数组中的[1, 2]和[3]两个动态数组
0 - a                                                                  // offset of [1, 2]
1 - b                                                                  // offset of [3]
2 - 0x0000000000000000000000000000000000000000000000000000000000000002 // count for [1, 2]
3 - 0x0000000000000000000000000000000000000000000000000000000000000001 // encoding of 1
4 - 0x0000000000000000000000000000000000000000000000000000000000000002 // encoding of 2
5 - 0x0000000000000000000000000000000000000000000000000000000000000001 // count for [3]
6 - 0x0000000000000000000000000000000000000000000000000000000000000003 // encoding of 3
a指向[1, 2]的开始，所以a=0x20*2=0x40
b指向[3]的开始，所以b=0x20*5=0xa0

然后是[[1, 2], [3]]动态数组本身的encoding
0 - c                                                                  // offset of [[1, 2], [3]]
1 - 0x0000000000000000000000000000000000000000000000000000000000000002 // count for [[1, 2], [3]]
2 - 0x0000000000000000000000000000000000000000000000000000000000000040 // offset of [1, 2]
3 - 0x00000000000000000000000000000000000000000000000000000000000000a0 // offset of [3]
4 - 0x0000000000000000000000000000000000000000000000000000000000000002 // count for [1, 2]
5 - 0x0000000000000000000000000000000000000000000000000000000000000001 // encoding of 1
6 - 0x0000000000000000000000000000000000000000000000000000000000000002 // encoding of 2
7 - 0x0000000000000000000000000000000000000000000000000000000000000001 // count for [3]
8 - 0x0000000000000000000000000000000000000000000000000000000000000003 // encoding of 3
c指向[[1, 2], [3]]的开始，所以a=0x20*2=0x40

其次是["one", "two", "three"]动态数组中每个string的encoding
0 - d                                                                  // offset for "one"
1 - e                                                                  // offset for "two"
2 - f                                                                  // offset for "three"
3 - 0x0000000000000000000000000000000000000000000000000000000000000003 // count for "one"
4 - 0x6f6e650000000000000000000000000000000000000000000000000000000000 // encoding of "one"
5 - 0x0000000000000000000000000000000000000000000000000000000000000003 // count for "two"
6 - 0x74776f0000000000000000000000000000000000000000000000000000000000 // encoding of "two"
7 - 0x0000000000000000000000000000000000000000000000000000000000000005 // count for "three"
8 - 0x7468726565000000000000000000000000000000000000000000000000000000 // encoding of "three"
d指向“one”的开始，所以d=0x20*3=0x60
e指向“two”的开始，所以e=0x20*5=0xa0
f指向“three”的开始，所以f=0x20*7=0xe0

然后是["one", "two", "three"]动态数组本身的encoding
0 - g                                                                  // offset of ["one", "two", "three"]
1 - 0x0000000000000000000000000000000000000000000000000000000000000003 // count for ["one", "two", "three"]
2 - 0x0000000000000000000000000000000000000000000000000000000000000060 // offset for "one"
3 - 0x00000000000000000000000000000000000000000000000000000000000000a0 // offset for "two"
4 - 0x00000000000000000000000000000000000000000000000000000000000000e0 // offset for "three"
5 - 0x0000000000000000000000000000000000000000000000000000000000000003 // count for "one"
6 - 0x6f6e650000000000000000000000000000000000000000000000000000000000 // encoding of "one"
7 - 0x0000000000000000000000000000000000000000000000000000000000000003 // count for "two"
8 - 0x74776f0000000000000000000000000000000000000000000000000000000000 // encoding of "two"
9 - 0x0000000000000000000000000000000000000000000000000000000000000005 // count for "three"
10- 0x7468726565000000000000000000000000000000000000000000000000000000 // encoding of "three"
这里g先不进行计算，因为涉及到函数参数整体的一个encoding

上面就已经把最后就是[[1, 2], [3]]和["one", "two", "three"]分析完毕，最后就是其作为一个整体进行encoding
0 - 0x0000000000000000000000000000000000000000000000000000000000000040 // offset of [[1, 2], [3]]
1 - g                                                                  // offset of ["one", "two", "three"]
2 - 0x0000000000000000000000000000000000000000000000000000000000000002 // count for [[1, 2], [3]]
3 - 0x0000000000000000000000000000000000000000000000000000000000000040 // offset of [1, 2]
4 - 0x00000000000000000000000000000000000000000000000000000000000000a0 // offset of [3]
5 - 0x0000000000000000000000000000000000000000000000000000000000000002 // count for [1, 2]
6 - 0x0000000000000000000000000000000000000000000000000000000000000001 // encoding of 1
7 - 0x0000000000000000000000000000000000000000000000000000000000000002 // encoding of 2
8 - 0x0000000000000000000000000000000000000000000000000000000000000001 // count for [3]
9 - 0x0000000000000000000000000000000000000000000000000000000000000003 // encoding of 3
10- 0x0000000000000000000000000000000000000000000000000000000000000003 // count for ["one", "two", "three"]
11- 0x0000000000000000000000000000000000000000000000000000000000000060 // offset for "one"
12- 0x00000000000000000000000000000000000000000000000000000000000000a0 // offset for "two"
13- 0x00000000000000000000000000000000000000000000000000000000000000e0 // offset for "three"
14- 0x0000000000000000000000000000000000000000000000000000000000000003 // count for "one"
15- 0x6f6e650000000000000000000000000000000000000000000000000000000000 // encoding of "one"
16- 0x0000000000000000000000000000000000000000000000000000000000000003 // count for "two"
17- 0x74776f0000000000000000000000000000000000000000000000000000000000 // encoding of "two"
18- 0x0000000000000000000000000000000000000000000000000000000000000005 // count for "three"
19- 0x7468726565000000000000000000000000000000000000000000000000000000 // encoding of "three"
g指向字符串数组的开始，所以g=0x20*10=140

所以总的selector+encoding如下所示
0xcc80bc65                                                             // function selector
0 - 0x0000000000000000000000000000000000000000000000000000000000000040 // offset of [[1, 2], [3]]
1 - 0x0000000000000000000000000000000000000000000000000000000000000140 // offset of ["one", "two", "three"]
2 - 0x0000000000000000000000000000000000000000000000000000000000000002 // count for [[1, 2], [3]]
3 - 0x0000000000000000000000000000000000000000000000000000000000000040 // offset of [1, 2]
4 - 0x00000000000000000000000000000000000000000000000000000000000000a0 // offset of [3]
5 - 0x0000000000000000000000000000000000000000000000000000000000000002 // count for [1, 2]
6 - 0x0000000000000000000000000000000000000000000000000000000000000001 // encoding of 1
7 - 0x0000000000000000000000000000000000000000000000000000000000000002 // encoding of 2
8 - 0x0000000000000000000000000000000000000000000000000000000000000001 // count for [3]
9 - 0x0000000000000000000000000000000000000000000000000000000000000003 // encoding of 3
10- 0x0000000000000000000000000000000000000000000000000000000000000003 // count for ["one", "two", "three"]
11- 0x0000000000000000000000000000000000000000000000000000000000000060 // offset for "one"
12- 0x00000000000000000000000000000000000000000000000000000000000000a0 // offset for "two"
13- 0x00000000000000000000000000000000000000000000000000000000000000e0 // offset for "three"
14- 0x0000000000000000000000000000000000000000000000000000000000000003 // count for "one"
15- 0x6f6e650000000000000000000000000000000000000000000000000000000000 // encoding of "one"
16- 0x0000000000000000000000000000000000000000000000000000000000000003 // count for "two"
17- 0x74776f0000000000000000000000000000000000000000000000000000000000 // encoding of "two"
18- 0x0000000000000000000000000000000000000000000000000000000000000005 // count for "three"
19- 0x7468726565000000000000000000000000000000000000000000000000000000 // encoding of "three"

Function Selector

原理

Argument Encoding

类型编码

Solidity 到 ABI 类型 映射

Function Selector and Argument Encoding

Solidity 到 ABI 类型映射