HM style type inference #3

Closed
opened 2021-07-21 11:36:02 +08:00 by pca006132 · 3 comments

I would document the design, current implementation progress, and limitations of the current implementation here.

Type Inference

Types

  • Type variables. Variants below can be considered as metadata attached to the type variable.
    • Unconstrained type variable.
    • Sequence type variable, contains a map from index to type, so it can be unified with either list or tuple type.
    • Record type variable, contains a map from attribute to type, so it can be unified with either object or virtual object type.
  • Concrete types.
    • Object type.
    • Virtual object type.
    • List type.
    • Tuple type.
    • (Polymorphic) Function type.
    • Call type, contains a list of function call arguments and return type. This can be later unified with a function type.

One can limit the possible valuation of type variables.

Unification

(This is only a high-level overview of it)

  • 2 type variables: Change and verify/unify metadata. Error if metadata does not match or variants are not compatible (e.g. unifying a sequence with a call)
  • Type variable and a concrete type: Verify/unify the metadata on the type variable and point the type variable to the concrete type. (e.g. unify the fields of a record type variable with the fields of an object)
  • 2 concrete types: Recursively unify their fields. Error if they are not the same type (e.g. unifying a list with a tuple)

Note that when we unify a call with a function, we would instantiate the type variables in the function to fresh type variables first.

Some example typing rules:

  • expr1 = expr2: unify the type of expr1 and expr2.
  • foo.bar: Create a fresh tyandpe variable x, unify the type of foo to record type variable {bar: x}, and assign type x to foo.bar.

Control Flow

(for nac3embedded)

  1. CPython loads the modules and registers those with @kernel/@portable decorator.
  2. nac3embedded get the module information (module object and module path) and put them in a work queue for parallel loading/parsing.
  3. When all modules are parsed, class and function declarations are analyzed and put stored in a map from CPython object ID to type. (for query later)
  4. The function nodes are put in a work queue for parallel type inference.
    • nac3core would only analyze the body of the function. nac3embedded should provide the parameter/return types and a list of bounded type variables.
    • When an unknown identifier is encountered by nac3core, it would query nac3embedded by sending it the AST node/string. nac3embedded should respond by querying the CPython ID oandf the identifier and return the type from the aforementioned mapping. (this is the symbol resolver)
    • At the end of the type inference (for each function), nac3core would revisit the function AST to check that:
      1. Each type variable is either resolved to a concrete type or a bounded type variable (those that occurred in the function/class declaration).
      2. There is no use-before-def for variables
      3. the function would not exit without returning (if it would return something).
  5. nac3core would provide a list of function invocations within a function (possibly with bounded type variables) after the above analysis. It can use this to figure out what functions to generate later.

Implementation Progress

  • Unification
  • Typing rules
  • Checks after function inference.
  • nac3embedded related code.

Limitations

The current implementation lacks certain features:

  • Better subtyping support for virtual. We require an explicit cast for some more complicated cases.
  • Type guard.
  • Pseudocomment related features (unrolling, suppress warning for unused fields).
  • Lists would not have methods/fields, objects would not be callable. (this is a limitation due to the metadata, this could be relaxed if needed)
  • Error messages would be the bare minimum, would require modification later.
  • Type variable range does not support things like List[int], List[bool], instead we require something like List[A] where the range of A is int, bool. This should not be a problem as we can implement this by normalizating user input, but error messages may be complicated.

Some features would be implemented on the nac3embedded side, probably, so I can't say for now.

I would document the design, current implementation progress, and limitations of the current implementation here. ## Type Inference ### Types - Type variables. Variants below can be considered as metadata attached to the type variable. - Unconstrained type variable. - Sequence type variable, contains a map from index to type, so it can be unified with either list or tuple type. - Record type variable, contains a map from attribute to type, so it can be unified with either object or virtual object type. - Concrete types. - Object type. - Virtual object type. - List type. - Tuple type. - (Polymorphic) Function type. - Call type, contains a list of function call arguments and return type. This can be later unified with a function type. One can limit the possible valuation of type variables. ### Unification (This is only a high-level overview of it) - 2 type variables: Change and verify/unify metadata. Error if metadata does not match or variants are not compatible (e.g. unifying a sequence with a call) - Type variable and a concrete type: Verify/unify the metadata on the type variable and point the type variable to the concrete type. (e.g. unify the fields of a record type variable with the fields of an object) - 2 concrete types: Recursively unify their fields. Error if they are not the same type (e.g. unifying a list with a tuple) Note that when we unify a call with a function, we would instantiate the type variables in the function to fresh type variables first. Some example typing rules: - `expr1 = expr2`: unify the type of `expr1` and `expr2`. - `foo.bar`: Create a fresh tyandpe variable `x`, unify the type of `foo` to record type variable `{bar: x}`, and assign type `x` to `foo.bar`. ### Control Flow (for nac3embedded) 1. CPython loads the modules and registers those with `@kernel`/`@portable` decorator. 2. nac3embedded get the module information (module object and module path) and put them in a work queue for parallel loading/parsing. 3. When all modules are parsed, class and function declarations are analyzed and put stored in a map from CPython object ID to type. (for query later) 4. The function nodes are put in a work queue for parallel type inference. - nac3core would only analyze the body of the function. nac3embedded should provide the parameter/return types and a list of bounded type variables. - When an unknown identifier is encountered by nac3core, it would query nac3embedded by sending it the AST node/string. nac3embedded should respond by querying the CPython ID oandf the identifier and return the type from the aforementioned mapping. (this is the symbol resolver) - At the end of the type inference (for each function), nac3core would revisit the function AST to check that: 1. Each type variable is either resolved to a concrete type or a bounded type variable (those that occurred in the function/class declaration). 2. There is no use-before-def for variables 3. the function would not exit without returning (if it would return something). 5. nac3core would provide a list of function invocations within a function (possibly with bounded type variables) after the above analysis. It can use this to figure out what functions to generate later. ## Implementation Progress - [x] Unification - [x] Typing rules - [x] Checks after function inference. - [ ] nac3embedded related code. ## Limitations The current implementation lacks certain features: - Better subtyping support for `virtual`. We require an explicit cast for some more complicated cases. - Type guard. - Pseudocomment related features (unrolling, suppress warning for unused fields). - Lists would not have methods/fields, objects would not be callable. (this is a limitation due to the metadata, this could be relaxed if needed) - Error messages would be the bare minimum, would require modification later. - Type variable range does not support things like `List[int], List[bool]`, instead we require something like `List[A]` where the range of `A` is `int, bool`. This should not be a problem as we can implement this by normalizating user input, but error messages may be complicated. Some features would be implemented on the nac3embedded side, probably, so I can't say for now.

Todo:

  • CPython integration:
    • Parse top-level class/function definitions.
    • Implement with, raise and try except.
    • Iterate over possible instantiations of bounded type variables (only those with finite range)
    • Object index to type mapping.
    • Rigid type variable.
  • Code generation.
    • RPC Interface?
  • Misc.
    • Primitive definitions (magic methods)
    • Replace is_concrete with a recursive check to check for unbounded type variables. (but need tests)
    • Better error messages.
    • Add more unit tests (could wait after adding top-level definition parser).
Todo: - [ ] CPython integration: - [ ] Parse top-level class/function definitions. - [ ] Implement `with`, `raise` and `try except`. - [ ] Iterate over possible instantiations of bounded type variables (only those with finite range) - [ ] Object index to type mapping. - [ ] Rigid type variable. - [ ] Code generation. - [ ] RPC Interface? - [ ] Misc. - [x] Primitive definitions (magic methods) - [x] Replace `is_concrete` with a recursive check to check for *unbounded* type variables. (but need tests) - [ ] Better error messages. - [ ] Add more unit tests (could wait after adding top-level definition parser).

Should we close this and break it down into smaller Issues?

Should we close this and break it down into smaller Issues?

Should we close this and break it down into smaller Issues?

OK

> Should we close this and break it down into smaller Issues? OK
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#3
There is no content yet.