2020-12-22 15:38:39 +08:00
|
|
|
# Toy Implementation
|
|
|
|
Currently the rough implementation is done, works remain are the code for
|
|
|
|
checking a real python script, getting the type variables, some implementation
|
|
|
|
details etc.
|
|
|
|
|
|
|
|
These features are considered in the proposal, but would not be implemented here
|
|
|
|
for simplicity reasons:
|
|
|
|
* Referencing Python Variables.
|
2020-12-22 16:56:40 +08:00
|
|
|
* Most of the types, only the following types are implemented:
|
|
|
|
* `int32`
|
|
|
|
* `int64`
|
2020-12-22 16:59:36 +08:00
|
|
|
* `float`
|
2020-12-22 16:56:40 +08:00
|
|
|
* `bool`
|
|
|
|
* `list[T]`
|
|
|
|
* `tuple[T1,...]`
|
|
|
|
* `virtual[T]`
|
2020-12-22 15:38:39 +08:00
|
|
|
* Storing large constants as `uint32`, `int64` or `uint64`.
|
|
|
|
* AugAssign, `a += b` etc.
|
|
|
|
* `with`, `try except`, etc.
|
2020-12-22 16:56:40 +08:00
|
|
|
* const indexing with tuple.
|
|
|
|
* method override check modulo variable renaming.
|
2020-12-23 11:22:17 +08:00
|
|
|
* more complicated type guard
|
2020-12-22 15:38:39 +08:00
|
|
|
|
2020-12-23 16:53:01 +08:00
|
|
|
## Running Example
|
|
|
|
```bash
|
|
|
|
python main.py example/a.py
|
|
|
|
```
|
|
|
|
|
2020-12-23 15:39:48 +08:00
|
|
|
## Files
|
|
|
|
All files named `test_xxx` are used for inspecting the result of algorithms, and
|
|
|
|
can be ignored for now.
|
|
|
|
|
|
|
|
Here is the list of files and their purpose:
|
|
|
|
* `helper.py`: mainly for the error definition.
|
|
|
|
* `inference.py`: type-check for function invocation.
|
|
|
|
* `inheritance.py`: perform method and field inheritance.
|
|
|
|
* `main.py`: main script for checking an entire python script.
|
|
|
|
* `parse_expr.py`: type-check for expressions in python AST.
|
|
|
|
* `parse_stmt.py`: type-check for statements in python AST.
|
|
|
|
* `primitives.py`: definition of primitives, operations, and built-in functions.
|
|
|
|
* `top_level.py`: gather class, function and type variable definitions from
|
|
|
|
python AST.
|
|
|
|
* `type_def.py`: python class for various types.
|
2020-12-22 15:38:39 +08:00
|
|
|
|
2020-12-23 16:48:20 +08:00
|
|
|
## Variable Scope
|
|
|
|
There is no shadowing in Python, so we decided that variables with the same name
|
|
|
|
in a function must have the same type. For example, the following is not
|
|
|
|
allowed:
|
|
|
|
```python
|
|
|
|
if foo():
|
|
|
|
a = 1
|
|
|
|
else:
|
|
|
|
a = None
|
|
|
|
```
|
|
|
|
|
|
|
|
Also, as variables has to be well typed, they must be initialized before using
|
|
|
|
them. If a variable could be not initialized in some code path, then it is not
|
|
|
|
readable. The following is also not allowed:
|
|
|
|
```python
|
|
|
|
if foo():
|
|
|
|
a = 1
|
|
|
|
a = a + 1
|
|
|
|
```
|
|
|
|
## Generics
|
|
|
|
Generics are supported via type variables.
|
|
|
|
|
|
|
|
* Generic type variable:
|
|
|
|
```python
|
|
|
|
A = TypeVar('A')
|
|
|
|
```
|
|
|
|
* Bounded type variable (`A` can either be `T1` or `T2` or ...):
|
|
|
|
```python
|
|
|
|
A = TypeVar('A', T1, T2, ...)
|
|
|
|
```
|
|
|
|
> Note:
|
|
|
|
>
|
|
|
|
> 1. In normal python, the *bound* of a type variable is actually about class
|
|
|
|
> inheritance. However, our type variable would be invariant and would not
|
|
|
|
> deal with subtyping.
|
|
|
|
> 2. Type variables cannot contain any type variable in their bound. For
|
|
|
|
> example, `B = TypeVar('B', A, T3)` is not allowed.
|
|
|
|
> 3. I did not really check the difference between the variable name and the
|
|
|
|
> name parameter of `TypeVar`, so idk what would happen if they are
|
|
|
|
> different. Please don't do that right now, would be fixed in later more
|
|
|
|
> serious implementations.
|
|
|
|
|
|
|
|
For generic type variables, you can't really do much with them, other than
|
|
|
|
passing them around in parameters, dealing with their list, etc.
|
|
|
|
|
|
|
|
For bounded type variables, if an operation is supported by all the possible
|
|
|
|
values of the variable, we can use that directly:
|
|
|
|
```python
|
|
|
|
A = TypeVar('A', int32, int64)
|
|
|
|
|
|
|
|
def add(a: A, b: A) -> A:
|
|
|
|
return a + b
|
|
|
|
```
|
|
|
|
|
|
|
|
If an operation is supported by some possible values, we can use type guard:
|
|
|
|
```python
|
|
|
|
A = TypeVar('A', int32, list[int32])
|
|
|
|
def add2(a: int32, b: A) -> a:
|
|
|
|
if type(b) == int32:
|
|
|
|
# b is int32 here
|
|
|
|
return a + b
|
|
|
|
else:
|
|
|
|
# b is list[int32] here
|
|
|
|
for x in b:
|
|
|
|
a = a + b
|
|
|
|
return a
|
|
|
|
```
|
|
|
|
|
|
|
|
Note that we only support very simple kinds of type guards in this toy
|
|
|
|
implementation. More specifically, the type guard has to meet the following
|
|
|
|
conditions:
|
|
|
|
1. The if statement must be of the form `type(*) == **` or `type(*) != **`.
|
|
|
|
For example, `if type(b) == int32 or type(b) == list[int32]` is not allowed.
|
|
|
|
2. The type of `*` must be a type variable. For example, `list[X]` is not
|
|
|
|
allowed.
|
|
|
|
|
|
|
|
### Substitution
|
2020-12-23 15:39:48 +08:00
|
|
|
The crucial constraint and assumption in our system is that, every
|
|
|
|
(sub-)expressions must have their types fully determined, and cannot depend on
|
|
|
|
statements/expressions after them. Hence, in a function call, every arguments
|
|
|
|
are well typed. We only have to determine the substitution of type variables
|
|
|
|
present in the function type signature that makes the type agree.
|
|
|
|
|
|
|
|
There is a tiny difference between unification and our implementation. In our
|
|
|
|
implementation, the substitution would only be applied to the type signature of
|
|
|
|
the target function call but not the variables present in the function call.
|
|
|
|
This way we don't have to make the type variables in the callee fresh before
|
|
|
|
doing unification.
|
|
|
|
|
|
|
|
Consider the following example:
|
|
|
|
|
2020-12-23 16:05:45 +08:00
|
|
|
```python
|
2020-12-23 15:39:48 +08:00
|
|
|
X = TypeVar('X')
|
|
|
|
|
|
|
|
def head(a: list[X]) -> X:
|
|
|
|
return a[0]
|
|
|
|
|
|
|
|
head([1, 2, 3])
|
|
|
|
```
|
|
|
|
|
|
|
|
In this example, the expression `[1, 2, 3]` has type `list[int32]`, so the
|
|
|
|
algorithm tries to fit `(list[int32])` into `(list[X])`, giving a substitution
|
|
|
|
`X -> int32`.
|
|
|
|
|
|
|
|
Substitution can also substitute variables into another variable. Consider the
|
|
|
|
following example:
|
|
|
|
|
2020-12-23 16:05:45 +08:00
|
|
|
```python
|
2020-12-23 15:39:48 +08:00
|
|
|
X = TypeVar('X')
|
|
|
|
Y = TypeVar('Y', int32, int64)
|
|
|
|
|
|
|
|
def head(a: list[X]) -> X:
|
|
|
|
return a[0]
|
|
|
|
|
|
|
|
def sum_of_heads(a: list[Y], b: list[Y]) -> Y:
|
|
|
|
return head(a) + head(b)
|
|
|
|
```
|
|
|
|
|
|
|
|
In this example, `a` has type `list[Y]`, so the algorithm would give a
|
|
|
|
substitution `X -> Y` for the call `head(a)`, and similarly for `b`.
|
2020-12-23 16:48:20 +08:00
|
|
|
As `Y` can only range over `int32` and `int64`, in the two instances of `Y`,
|
2020-12-23 15:39:48 +08:00
|
|
|
the return statement would have type
|
|
|
|
* `int32 + int32 : int32 : Y` under `Y -> int32`, and
|
|
|
|
* `int64 + int64 : int64 : Y` under `Y -> int64`.
|
|
|
|
So the function is well typed.
|
|
|
|
|
|
|
|
Note that variables are fresh in every invocation. Consider the following
|
|
|
|
example:
|
2020-12-23 16:05:45 +08:00
|
|
|
```python
|
2020-12-23 15:39:48 +08:00
|
|
|
I = TypeVar('I', int32, list[int32])
|
|
|
|
|
|
|
|
def add(a: int32, b: I) -> int32:
|
|
|
|
if type(b) == int32:
|
|
|
|
return a + b
|
|
|
|
else:
|
2020-12-23 16:48:20 +08:00
|
|
|
# b must be list[int32] in this branch.
|
2020-12-23 15:39:48 +08:00
|
|
|
for x in b:
|
|
|
|
a = add(a, x)
|
|
|
|
return a
|
|
|
|
|
|
|
|
add(1, [1, 2, 3])
|
|
|
|
```
|
2020-12-23 16:05:45 +08:00
|
|
|
This one should type check. `I -> list[int32]` only affects 1 call,
|
2020-12-23 15:39:48 +08:00
|
|
|
and the recursion inside could substitute `I -> int32`.
|
2020-12-22 16:59:36 +08:00
|
|
|
|
2020-12-23 16:48:20 +08:00
|
|
|
Example of a failure case:
|
|
|
|
```python
|
|
|
|
A = TypeVar('A')
|
|
|
|
B = TypeVar('B')
|
|
|
|
|
|
|
|
def foo(a: A, b: A):
|
|
|
|
pass
|
|
|
|
|
|
|
|
def bar(a: A, b: B):
|
|
|
|
foo(a, b)
|
|
|
|
```
|
|
|
|
|
|
|
|
This would fail. From the first argument, we have `A -> A`, and from the second
|
|
|
|
argument we need `A -> B`. In general, we may have `A != B`, so there is no
|
|
|
|
substitution that meets the requirement and the type check failed.
|
2020-12-22 16:59:36 +08:00
|
|
|
|
2020-12-23 16:53:01 +08:00
|
|
|
## Operator Overloading
|
|
|
|
Most operations are actually implemented via operator overloading.
|
|
|
|
We currently support:
|
|
|
|
* Normal:
|
|
|
|
* `__init__`
|
|
|
|
* Comparison:
|
|
|
|
* `__lt__`
|
|
|
|
* `__le__`
|
|
|
|
* `__gt__`
|
|
|
|
* `__ge__`
|
|
|
|
* `__eq__`
|
|
|
|
* `__ne__`
|
|
|
|
* Arithmetic:
|
|
|
|
* `__add__`
|
|
|
|
* `__sub__`
|
|
|
|
* `__mul__`
|
|
|
|
* `__matmul__`
|
|
|
|
* `__truediv__`
|
|
|
|
* `__floordiv__`
|
|
|
|
* `__mod__`
|
|
|
|
* `__pow__`
|
|
|
|
* `__lshift__`
|
|
|
|
* `__rshift__`
|
|
|
|
* `__and__`
|
|
|
|
* `__or__`
|
|
|
|
* `__xor__`
|
|
|
|
* `__neg__`
|
2020-12-22 16:59:36 +08:00
|
|
|
|