nac3-spec/toy-impl/README.md

245 lines
7.1 KiB
Markdown
Raw Permalink Normal View History

2020-12-22 15:38:39 +08:00
# Toy Implementation
Currently the rough implementation is done, works remain are the code for
checking a real python script, getting the type variables, some implementation
details etc.
These features are considered in the proposal, but would not be implemented here
for simplicity reasons:
* Referencing Python Variables.
2020-12-22 16:56:40 +08:00
* Most of the types, only the following types are implemented:
* `int32`
* `int64`
2020-12-22 16:59:36 +08:00
* `float`
2020-12-22 16:56:40 +08:00
* `bool`
* `list[T]`
* `tuple[T1,...]`
* `virtual[T]`
2020-12-22 15:38:39 +08:00
* Storing large constants as `uint32`, `int64` or `uint64`.
* AugAssign, `a += b` etc.
* `with`, `try except`, etc.
2020-12-22 16:56:40 +08:00
* const indexing with tuple.
2021-01-19 15:42:46 +08:00
* method override check modulo type variable renaming.
2020-12-23 11:22:17 +08:00
* more complicated type guard
2020-12-22 15:38:39 +08:00
2020-12-23 16:53:01 +08:00
## Running Example
```bash
python main.py example/a.py
```
2020-12-23 15:39:48 +08:00
## Files
All files named `test_xxx` are used for inspecting the result of algorithms, and
can be ignored for now.
Here is the list of files and their purpose:
* `helper.py`: mainly for the error definition.
* `inference.py`: type-check for function invocation.
* `inheritance.py`: perform method and field inheritance.
* `main.py`: main script for checking an entire python script.
* `parse_expr.py`: type-check for expressions in python AST.
* `parse_stmt.py`: type-check for statements in python AST.
* `primitives.py`: definition of primitives, operations, and built-in functions.
* `top_level.py`: gather class, function and type variable definitions from
python AST.
* `type_def.py`: python class for various types.
2020-12-22 15:38:39 +08:00
2020-12-23 16:48:20 +08:00
## Variable Scope
There is no shadowing in Python, so we decided that variables with the same name
in a function must have the same type. For example, the following is not
allowed:
```python
if foo():
a = 1
else:
a = None
```
Also, as variables has to be well typed, they must be initialized before using
them. If a variable could be not initialized in some code path, then it is not
readable. The following is also not allowed:
```python
if foo():
a = 1
a = a + 1
```
## Generics
Generics are supported via type variables.
* Generic type variable:
```python
A = TypeVar('A')
```
* Bounded type variable (`A` can either be `T1` or `T2` or ...):
```python
A = TypeVar('A', T1, T2, ...)
```
> Note:
>
> 1. In normal python, the *bound* of a type variable is actually about class
> inheritance. However, our type variable would be invariant and would not
> deal with subtyping.
> 2. Type variables cannot contain any type variable in their bound. For
> example, `B = TypeVar('B', A, T3)` is not allowed.
> 3. I did not really check the difference between the variable name and the
> name parameter of `TypeVar`, so idk what would happen if they are
> different. Please don't do that right now, would be fixed in later more
> serious implementations.
2020-12-23 17:04:31 +08:00
We require the function to be well typed under every possible substitution of
the type variables.
2020-12-23 16:48:20 +08:00
For generic type variables, you can't really do much with them, other than
passing them around in parameters, dealing with their list, etc.
For bounded type variables, if an operation is supported by all the possible
values of the variable, we can use that directly:
```python
A = TypeVar('A', int32, int64)
def add(a: A, b: A) -> A:
return a + b
```
If an operation is supported by some possible values, we can use type guard:
```python
A = TypeVar('A', int32, list[int32])
def add2(a: int32, b: A) -> a:
if type(b) == int32:
# b is int32 here
return a + b
else:
# b is list[int32] here
for x in b:
a = a + b
return a
```
Note that we only support very simple kinds of type guards in this toy
implementation. More specifically, the type guard has to meet the following
conditions:
1. The if statement must be of the form `type(*) == **` or `type(*) != **`.
For example, `if type(b) == int32 or type(b) == list[int32]` is not allowed.
2. The type of `*` must be a type variable. For example, `list[X]` is not
allowed.
### Substitution
2021-01-19 15:42:46 +08:00
> Actually this scheme is to simplify type inference... as we think that
> standard type inference which solves equations is a bit hard to reason about,
> and hard to add subtyping into it.
2020-12-23 15:39:48 +08:00
The crucial constraint and assumption in our system is that, every
(sub-)expressions must have their types fully determined, and cannot depend on
statements/expressions after them. Hence, in a function call, every arguments
are well typed. We only have to determine the substitution of type variables
present in the function type signature that makes the type agree.
There is a tiny difference between unification and our implementation. In our
implementation, the substitution would only be applied to the type signature of
the target function call but not the variables present in the function call.
This way we don't have to make the type variables in the callee fresh before
doing unification.
Consider the following example:
2020-12-23 16:05:45 +08:00
```python
2020-12-23 15:39:48 +08:00
X = TypeVar('X')
def head(a: list[X]) -> X:
return a[0]
head([1, 2, 3])
```
In this example, the expression `[1, 2, 3]` has type `list[int32]`, so the
algorithm tries to fit `(list[int32])` into `(list[X])`, giving a substitution
`X -> int32`.
Substitution can also substitute variables into another variable. Consider the
following example:
2020-12-23 16:05:45 +08:00
```python
2020-12-23 15:39:48 +08:00
X = TypeVar('X')
Y = TypeVar('Y', int32, int64)
def head(a: list[X]) -> X:
return a[0]
def sum_of_heads(a: list[Y], b: list[Y]) -> Y:
return head(a) + head(b)
```
In this example, `a` has type `list[Y]`, so the algorithm would give a
substitution `X -> Y` for the call `head(a)`, and similarly for `b`.
2020-12-23 16:48:20 +08:00
As `Y` can only range over `int32` and `int64`, in the two instances of `Y`,
2020-12-23 15:39:48 +08:00
the return statement would have type
* `int32 + int32 : int32 : Y` under `Y -> int32`, and
* `int64 + int64 : int64 : Y` under `Y -> int64`.
So the function is well typed.
Note that variables are fresh in every invocation. Consider the following
example:
2020-12-23 16:05:45 +08:00
```python
2020-12-23 15:39:48 +08:00
I = TypeVar('I', int32, list[int32])
def add(a: int32, b: I) -> int32:
if type(b) == int32:
return a + b
else:
2020-12-23 16:48:20 +08:00
# b must be list[int32] in this branch.
2020-12-23 15:39:48 +08:00
for x in b:
a = add(a, x)
return a
add(1, [1, 2, 3])
```
2020-12-23 16:05:45 +08:00
This one should type check. `I -> list[int32]` only affects 1 call,
2020-12-23 15:39:48 +08:00
and the recursion inside could substitute `I -> int32`.
2020-12-22 16:59:36 +08:00
2020-12-23 16:48:20 +08:00
Example of a failure case:
```python
A = TypeVar('A')
B = TypeVar('B')
def foo(a: A, b: A):
pass
def bar(a: A, b: B):
foo(a, b)
```
This would fail. From the first argument, we have `A -> A`, and from the second
argument we need `A -> B`. In general, we may have `A != B`, so there is no
substitution that meets the requirement and the type check failed.
2020-12-22 16:59:36 +08:00
2020-12-23 16:53:01 +08:00
## Operator Overloading
Most operations are actually implemented via operator overloading.
We currently support:
* Normal:
* `__init__`
* Comparison:
* `__lt__`
* `__le__`
* `__gt__`
* `__ge__`
* `__eq__`
* `__ne__`
* Arithmetic:
* `__add__`
* `__sub__`
* `__mul__`
* `__matmul__`
* `__truediv__`
* `__floordiv__`
* `__mod__`
* `__pow__`
* `__lshift__`
* `__rshift__`
* `__and__`
* `__or__`
* `__xor__`
* `__neg__`
2020-12-22 16:59:36 +08:00
2020-12-23 16:56:49 +08:00
## Builtin Functions
* `len(list[X]) -> int32`
* `range(int32) -> list[int32]`