Function Selector and Argument Encoding
Article Summary
GPT 4
在 Ethereum 生态系统中,ABI (Application Binary Interface,应用二进制接口) 是从区块链外部与合约进行交互以及合约与合约间进行交互的一种标准方式。数据会根据其类型按照这份手册中说明的方法进行编码。
Function Selector
原理
某个函数签名的 Keccak (SHA-3) 哈希的前 4 字节,指定了要调用的函数,形如 bytes4(keccak256(‘balanceOf(address)’)) == 0x70a08231 这种形式,0x70a08231 便是 balanceOf(address) 的 Function Selector
- 基础原型即是函数名称加上由括号括起来的参数类型列表,参数类型间由一个逗号分隔开,且没有空格
- 对于 uint 类型,要转成 uint256 进行计算,比如 ownerOf(uint256) 其 Function Selector = bytes4(keccak256(‘ownerOf(uint256)’)) == 0x6352211e
- 函数参数包含结构体,相当于把结构体拆分成单个参数,只不过这些参数用
()
扩起来
Argument Encoding
从第5字节开始是被编码的参数。这种编码方式也被用在其他地方,比如,返回值和事件的参数也会被用同样的方式进行编码,而用来指定函数的4个字节则不需要再进行编码。
类型编码
以下是基础类型:
uint<M>
:M
位的无符号整数,0 < M <= 256
、M % 8 == 0
。例如:uint32
,uint8
,uint256
。int<M>
:以 2 的补码作为符号的M
位整数,0 < M <= 256
、M % 8 == 0
。address
:除了字面上的意思和语言类型的区别以外,等价于uint160
。在计算和 函数选择器Function Selector 中,通常使用address
。uint
、int
:uint256
、int256
各自的同义词。在计算和 函数选择器Function Selector 中,通常使用uint256
和int256
。bool
:等价于uint8
,取值限定为 0 或 1 。在计算和 函数选择器Function Selector 中,通常使用bool
。fixed<M>x<N>
:M
位的有符号的固定小数位的十进制数字8 <= M <= 256
、M % 8 == 0
、且0 < N <= 80
。其值v
即是v / (10 ** N)
。(也就是说,这种类型是由 M 位的二进制数据所保存的,有 N 位小数的十进制数值。译者注。)ufixed<M>x<N>
:无符号的fixed<M>x<N>
。fixed
、ufixed
:fixed128x18
、ufixed128x18
各自的同义词。在计算和 函数选择器Function Selector 中,通常使用fixed128x18
和ufixed128x18
。bytes<M>
:M
字节的二进制类型,0 < M <= 32
。function
:一个地址(20 字节)之后紧跟一个 函数选择器Function Selector (4 字节)。编码之后等价于bytes24
。
以下是定长数组类型:
-
<type>[M]
:有M
个元素的定长数组,M >= 0
,数组元素为给定类型。注解
尽管此ABI规范可以表示零个元素的定长数组,但编译器不支持它们。
以下是非定长类型:
bytes
:动态大小的字节序列。string
:动态大小的 unicode 字符串,通常呈现为 UTF-8 编码。<type>[]
:元素为给定类型的变长数组。
可以将若干类型放到一对括号中,用逗号分隔开,以此来构成一个 元组tuple:
(T1,T2,...,Tn)
:由T1
,…,Tn
,n >= 0
构成的 元组tuple。
用 元组tuple 构成 元组tuple、用 元组tuple 构成数组等等也是可能的。另外也可以构成“零元组(zero-tuples)”,就是 n = 0
的情况。
Solidity 到 ABI 类型 映射
Solidity 支持上面介绍的所有同名称的类型,除元组外。 另一方面,一些 Solidity 类型不被 ABI 支持。下表在左栏显示了不支持 ABI 的 Solidity 类型,以及在右栏显示可以代表它们的 ABI 类型。
Solidity | ABI |
---|---|
address payable | address |
contract | address |
enum | uint8 |
user defined value types | its underlying value type |
struct | tuple |
Function Selector and Argument Encoding
- 动态类型的数据,比如动态数组,结构体,变长字节,其编码后存储其
offset
、length
、data
- 先把参数顺序存储:如果是定长数据类型,直接存储其
data
,如果是变长数据类型,先存储其offset
- 顺序遍历变长数据:先存储
offset
,对于第一个变长数据,先存储其offset = 0x20 * number
(number
是函数参数的个数 );对于下一个变长数据,其offset = offset_of_prev + 0x20 + 0x20 * number
(第一个0x20
是存储前一个变长数据的长度占用的大小,number
是前一个变长数据的元素个数) - 顺序遍历变长数据:存储完
offset
,接着就是遍历每个变长数据,分别存储其length
和data
- (
ps:
对于结构体这样的类型,存储的时候可把结构体内元素看成是一个新函数的参数,这样的话,对于结构体中的第一个变长数据,其offset = 0x20 * num
,num
是结构体元素的个数 )
- 先把参数顺序存储:如果是定长数据类型,直接存储其
test7([[1, 2], [3]], ["one", "two", "three"])
同理进行由内向外的拆分,首先是[[1, 2], [3]]动态数组中的[1, 2]和[3]两个动态数组
0 - a // offset of [1, 2]
1 - b // offset of [3]
2 - 0x0000000000000000000000000000000000000000000000000000000000000002 // count for [1, 2]
3 - 0x0000000000000000000000000000000000000000000000000000000000000001 // encoding of 1
4 - 0x0000000000000000000000000000000000000000000000000000000000000002 // encoding of 2
5 - 0x0000000000000000000000000000000000000000000000000000000000000001 // count for [3]
6 - 0x0000000000000000000000000000000000000000000000000000000000000003 // encoding of 3
a指向[1, 2]的开始,所以a=0x20*2=0x40
b指向[3]的开始,所以b=0x20*5=0xa0
然后是[[1, 2], [3]]动态数组本身的encoding
0 - c // offset of [[1, 2], [3]]
1 - 0x0000000000000000000000000000000000000000000000000000000000000002 // count for [[1, 2], [3]]
2 - 0x0000000000000000000000000000000000000000000000000000000000000040 // offset of [1, 2]
3 - 0x00000000000000000000000000000000000000000000000000000000000000a0 // offset of [3]
4 - 0x0000000000000000000000000000000000000000000000000000000000000002 // count for [1, 2]
5 - 0x0000000000000000000000000000000000000000000000000000000000000001 // encoding of 1
6 - 0x0000000000000000000000000000000000000000000000000000000000000002 // encoding of 2
7 - 0x0000000000000000000000000000000000000000000000000000000000000001 // count for [3]
8 - 0x0000000000000000000000000000000000000000000000000000000000000003 // encoding of 3
c指向[[1, 2], [3]]的开始,所以a=0x20*2=0x40
其次是["one", "two", "three"]动态数组中每个string的encoding
0 - d // offset for "one"
1 - e // offset for "two"
2 - f // offset for "three"
3 - 0x0000000000000000000000000000000000000000000000000000000000000003 // count for "one"
4 - 0x6f6e650000000000000000000000000000000000000000000000000000000000 // encoding of "one"
5 - 0x0000000000000000000000000000000000000000000000000000000000000003 // count for "two"
6 - 0x74776f0000000000000000000000000000000000000000000000000000000000 // encoding of "two"
7 - 0x0000000000000000000000000000000000000000000000000000000000000005 // count for "three"
8 - 0x7468726565000000000000000000000000000000000000000000000000000000 // encoding of "three"
d指向“one”的开始,所以d=0x20*3=0x60
e指向“two”的开始,所以e=0x20*5=0xa0
f指向“three”的开始,所以f=0x20*7=0xe0
然后是["one", "two", "three"]动态数组本身的encoding
0 - g // offset of ["one", "two", "three"]
1 - 0x0000000000000000000000000000000000000000000000000000000000000003 // count for ["one", "two", "three"]
2 - 0x0000000000000000000000000000000000000000000000000000000000000060 // offset for "one"
3 - 0x00000000000000000000000000000000000000000000000000000000000000a0 // offset for "two"
4 - 0x00000000000000000000000000000000000000000000000000000000000000e0 // offset for "three"
5 - 0x0000000000000000000000000000000000000000000000000000000000000003 // count for "one"
6 - 0x6f6e650000000000000000000000000000000000000000000000000000000000 // encoding of "one"
7 - 0x0000000000000000000000000000000000000000000000000000000000000003 // count for "two"
8 - 0x74776f0000000000000000000000000000000000000000000000000000000000 // encoding of "two"
9 - 0x0000000000000000000000000000000000000000000000000000000000000005 // count for "three"
10- 0x7468726565000000000000000000000000000000000000000000000000000000 // encoding of "three"
这里g先不进行计算,因为涉及到函数参数整体的一个encoding
上面就已经把最后就是[[1, 2], [3]]和["one", "two", "three"]分析完毕,最后就是其作为一个整体进行encoding
0 - 0x0000000000000000000000000000000000000000000000000000000000000040 // offset of [[1, 2], [3]]
1 - g // offset of ["one", "two", "three"]
2 - 0x0000000000000000000000000000000000000000000000000000000000000002 // count for [[1, 2], [3]]
3 - 0x0000000000000000000000000000000000000000000000000000000000000040 // offset of [1, 2]
4 - 0x00000000000000000000000000000000000000000000000000000000000000a0 // offset of [3]
5 - 0x0000000000000000000000000000000000000000000000000000000000000002 // count for [1, 2]
6 - 0x0000000000000000000000000000000000000000000000000000000000000001 // encoding of 1
7 - 0x0000000000000000000000000000000000000000000000000000000000000002 // encoding of 2
8 - 0x0000000000000000000000000000000000000000000000000000000000000001 // count for [3]
9 - 0x0000000000000000000000000000000000000000000000000000000000000003 // encoding of 3
10- 0x0000000000000000000000000000000000000000000000000000000000000003 // count for ["one", "two", "three"]
11- 0x0000000000000000000000000000000000000000000000000000000000000060 // offset for "one"
12- 0x00000000000000000000000000000000000000000000000000000000000000a0 // offset for "two"
13- 0x00000000000000000000000000000000000000000000000000000000000000e0 // offset for "three"
14- 0x0000000000000000000000000000000000000000000000000000000000000003 // count for "one"
15- 0x6f6e650000000000000000000000000000000000000000000000000000000000 // encoding of "one"
16- 0x0000000000000000000000000000000000000000000000000000000000000003 // count for "two"
17- 0x74776f0000000000000000000000000000000000000000000000000000000000 // encoding of "two"
18- 0x0000000000000000000000000000000000000000000000000000000000000005 // count for "three"
19- 0x7468726565000000000000000000000000000000000000000000000000000000 // encoding of "three"
g指向字符串数组的开始,所以g=0x20*10=140
所以总的selector+encoding如下所示
0xcc80bc65 // function selector
0 - 0x0000000000000000000000000000000000000000000000000000000000000040 // offset of [[1, 2], [3]]
1 - 0x0000000000000000000000000000000000000000000000000000000000000140 // offset of ["one", "two", "three"]
2 - 0x0000000000000000000000000000000000000000000000000000000000000002 // count for [[1, 2], [3]]
3 - 0x0000000000000000000000000000000000000000000000000000000000000040 // offset of [1, 2]
4 - 0x00000000000000000000000000000000000000000000000000000000000000a0 // offset of [3]
5 - 0x0000000000000000000000000000000000000000000000000000000000000002 // count for [1, 2]
6 - 0x0000000000000000000000000000000000000000000000000000000000000001 // encoding of 1
7 - 0x0000000000000000000000000000000000000000000000000000000000000002 // encoding of 2
8 - 0x0000000000000000000000000000000000000000000000000000000000000001 // count for [3]
9 - 0x0000000000000000000000000000000000000000000000000000000000000003 // encoding of 3
10- 0x0000000000000000000000000000000000000000000000000000000000000003 // count for ["one", "two", "three"]
11- 0x0000000000000000000000000000000000000000000000000000000000000060 // offset for "one"
12- 0x00000000000000000000000000000000000000000000000000000000000000a0 // offset for "two"
13- 0x00000000000000000000000000000000000000000000000000000000000000e0 // offset for "three"
14- 0x0000000000000000000000000000000000000000000000000000000000000003 // count for "one"
15- 0x6f6e650000000000000000000000000000000000000000000000000000000000 // encoding of "one"
16- 0x0000000000000000000000000000000000000000000000000000000000000003 // count for "two"
17- 0x74776f0000000000000000000000000000000000000000000000000000000000 // encoding of "two"
18- 0x0000000000000000000000000000000000000000000000000000000000000005 // count for "three"
19- 0x7468726565000000000000000000000000000000000000000000000000000000 // encoding of "three"
This piece of writing is an original article, utilizing theCC BY-NC-SA 4.0Agreement. For complete reproduction, please acknowledge the source as Courtesy ofxiaocai