escape analysis #51

Open
opened 2021-10-03 17:50:15 +08:00 by sb10q · 4 comments
https://github.com/m-labs/artiq/issues/1678#issuecomment-932899551
pca006132 was assigned by sb10q 2021-10-03 17:50:15 +08:00

iirc the plan is to implement it later, when other features are finished.

and there is another plan about using region-based memory management, and we do not need escape analysis if we use that.

iirc the plan is to implement it later, when other features are finished. and there is another plan about using region-based memory management, and we do not need escape analysis if we use that.
sb10q added the
high-priority
label 2021-12-21 18:52:01 +08:00
sb10q added this to the Beta milestone 2022-02-28 10:53:49 +08:00
sb10q modified the milestone from Beta to Alpha 2022-03-31 10:28:47 +08:00

Approach: We are doing a field-sensitive intraprocedural dataflow analysis on each function to determine the lifetime of each values, and whether certain assignment/return operations are illegal based on the lifetime information. There are currently 5 different lifetimes: static, non-local, local (precise), local (imprecise) and unknown.

Lifetime Kind Static Non-local Local (precise) Local (imprecise) Unknown
Description Lifetime of global variables Lifetime of function parameters Lifetime of local data in which we can track precisely (not stored in a list) Local data that we cannot track precisely Lifetime of data which might be local/non-local/global
Can be returned? Yes Yes No No No
Can be assigned to fields of Any object Local objects Local objects Local objects Local objects

We will only track the lifetime of fields of local objects, because we cannot assign objects with non-static lifetime to other objects. The distinction between precise and imprecise is to improve the precision of lifetime tracking: for precise objects, we can use strong update (replace), but we need weak update (unification) for imprecise objects. Note that when a value is unified with imprecise local lifetime, all its fields will be downgraded from precise local lifetime to imprecise local lifetime.

Unification rules:

  1. Local (precise) unifies with another Local (precise/imprecise): Local (imprecise)
  2. X unifies with X: X
  3. Static unifies with Non-local: Non-local
  4. Otherwise: Unknown

Function call handling: We treat the result of function calls to have lifetime unknown (except constructors, which the return value has lifetime precise local), because the function may return one of its parameters (which may have any lifetime). Because users are allowed to assign global variables to parameter fields, we will unify each field (transitively) of each parameter with static lifetime to account for this possibility. If the function's return type is T, values of type T reachable from the function parameters will be downgraded from precise local to imprecise local, because it is possible that it will alias with the function's return value.

Approach: We are doing a field-sensitive intraprocedural dataflow analysis on each function to determine the lifetime of each values, and whether certain assignment/return operations are illegal based on the lifetime information. There are currently 5 different lifetimes: static, non-local, local (precise), local (imprecise) and unknown. | Lifetime Kind | Static | Non-local | Local (precise) | Local (imprecise) | Unknown | | ---- | ---- | ---- | ---- | ---- | ---- | | Description | Lifetime of global variables | Lifetime of function parameters | Lifetime of local data in which we can track precisely (not stored in a list) | Local data that we cannot track precisely | Lifetime of data which might be local/non-local/global | | Can be returned? | Yes | Yes | No | No | No | | Can be assigned to fields of | Any object | Local objects | Local objects | Local objects | Local objects | We will only track the lifetime of fields of local objects, because we cannot assign objects with non-static lifetime to other objects. The distinction between precise and imprecise is to improve the precision of lifetime tracking: for precise objects, we can use strong update (replace), but we need weak update (unification) for imprecise objects. Note that when a value is unified with imprecise local lifetime, all its fields will be downgraded from precise local lifetime to imprecise local lifetime. Unification rules: 1. Local (precise) unifies with another Local (precise/imprecise): Local (imprecise) 2. X unifies with X: X 3. Static unifies with Non-local: Non-local 4. Otherwise: Unknown Function call handling: We treat the result of function calls to have lifetime unknown (except constructors, which the return value has lifetime precise local), because the function may return one of its parameters (which may have any lifetime). Because users are allowed to assign global variables to parameter fields, we will unify each field (transitively) of each parameter with static lifetime to account for this possibility. If the function's return type is `T`, values of type `T` reachable from the function parameters will be downgraded from precise local to imprecise local, because it is possible that it will alias with the function's return value.

Plan: Support optional lifetime annotation for functions and object fields, and will be enforced by the escape analyzer. Functions will be treated as transfer functions that transform input parameter lifetimes with an additional return value. Lifetime variables can be treated as non-local lifetime, with additional capabilities: an object with lifetime X can be assigned to a field annotated with lifetime X, and unification with other lifetime variables/non-local lifetime will cause it to become a normal non-local lifetime. Example (the syntax is not finalized yet):

class Foo:
    a: Bar         // nac3: @lifetime("x")
    b: list[int32] // nac3: @lifetime("y")

some_static_list = [1, 2, 3]

// nac3: @lifetime("('a['b, 'c], 'd) -> ('a['d, 'static], 'd)")
def foo(obj: Foo, bar: Bar):
    obj.a = bar
    obj.b = some_static_list

Note that the following lifetime annotation is incorrect because 'static is different from 'c

// nac3: @lifetime("('a['b, 'c]) -> ('a['b, 'c])")
def foo2(obj: Foo):
    obj.b = some_static_list

And we will downgrade values with precise local lifetime to imprecise local lifetime according to the transfer function: if its type occurs in the transfer function output multiple times with compatible lifetime (same lifetime variable, non-local lifetime that are not another lifetime variable, or unknown lifetime).

Plan: Support optional lifetime annotation for functions and object fields, and will be enforced by the escape analyzer. Functions will be treated as transfer functions that transform input parameter lifetimes with an additional return value. Lifetime variables can be treated as non-local lifetime, with additional capabilities: an object with lifetime `X` can be assigned to a field annotated with lifetime `X`, and unification with other lifetime variables/non-local lifetime will cause it to become a normal non-local lifetime. Example (the syntax is not finalized yet): ```python class Foo: a: Bar // nac3: @lifetime("x") b: list[int32] // nac3: @lifetime("y") some_static_list = [1, 2, 3] // nac3: @lifetime("('a['b, 'c], 'd) -> ('a['d, 'static], 'd)") def foo(obj: Foo, bar: Bar): obj.a = bar obj.b = some_static_list ``` Note that the following lifetime annotation is incorrect because `'static` is different from `'c` ```python // nac3: @lifetime("('a['b, 'c]) -> ('a['b, 'c])") def foo2(obj: Foo): obj.b = some_static_list ``` And we will downgrade values with precise local lifetime to imprecise local lifetime according to the transfer function: if its type occurs in the transfer function output multiple times with compatible lifetime (same lifetime variable, non-local lifetime that are not another lifetime variable, or unknown lifetime).
Collaborator

That seems like a pretty nice tradeoff!

Here are some remarks off the top of my head:

Abstract Interpretation

I see that you are using the term "unification" a lot. Here I assume (hope) that lifetime information is not actually part of types, but some extra information that would be computed and checked after type checking, right? If so, unification might not be the best framework to work in.

I understand What you have as a (degenerate) abstract interpretation semi-lattice of the form (pardon my ASCII art):

            Local (precise)
              |
Static      Local (imprecise)
  |           |
Non-local    /
      \     /
      Unknown

And what you want to do is compute a fixed point of some annotations of each value in created/accessible in the current function.

Several things come to mind from this interpretation:

First, it seems we could easily make the lattice more interesting by replacing the Local (precise) and Local (imprecise) levels by the sublattice of sets of precise local value, possibly including all local values (the bottom of the sublattice, which would be equivalent to the old Local (imprecise)).

HOWEVER, this is just a refinement that may come later if the current system turns out to be too crude, and we don't need to do it now.

Second, what you mean by unification here would be to take the meet of elements in the lattice. This is valid for all places that are in output (i.e., positive) position. But in case of input positions, what one should choose instead would be the join.

I don't know whether this is relevant in the current system. It would certainly be if we supported first-class functions. Imagine something like f if ... else g where f takes parameters of static lifetimes as input, and g takes parameters of unknown lifetimes as input. It is wrong to take their meet here (unknown), because it may result in f being passed something that's not static, which would be unsound.

You can construct a similar problem with mutation. If l0 is a list supposed to contain global values and l1 is one supposed to contain unknown values, then (l0 if ... else l1)[0] = some_unknown_value will also be wrong. In the case of mutable fields/list-slots, what we really need is two lifetimes – one input lifetime and one output lifetime – and taking the meet of two such objects should take the meet of the output lifetimes but the join of the input lifetimes.

Memory Management

How does memory management work, exactly? Is everything stack allocated? If so, I don't currently see a poblem with the approach. But if we introduced dynamic allocations at some point, things could get tricky, for example when analayzing lcaol_obj.x = dyn_alloc(SomeClass()) if ... else some_static_value. (How would we tell which value should be deallocated and when?)

Lifetime Polymorphism

Lifetime variables can be treated as non-local lifetime, with additional capabilities: an object with lifetime X can be assigned to a field annotated with lifetime X, and unification with other lifetime variables/non-local lifetime will cause it to become a normal non-local lifetime.

I suppose you're talking about those lifetime variables we got from instantiating a function's lifetime signature? Because within the body of a function with lifetime variables declared, these variables should probably be treated as rigid (as in: they cannot be unified with anything else than themselves), right?.

class Foo:
    a: Bar         // nac3: @lifetime("x")
    b: list[int32] // nac3: @lifetime("y")

What do x and y refer to, here? Where are they bound? Are they lifetime variables of Foo? If so, why isn't Foo declared generic in them?

That seems like a pretty nice tradeoff! Here are some remarks off the top of my head: ### Abstract Interpretation I see that you are using the term "unification" a lot. Here I assume (hope) that lifetime information is not actually part of types, but some extra information that would be computed and checked after type checking, right? If so, unification might not be the best framework to work in. I understand What you have as a (degenerate) abstract interpretation semi-lattice of the form (pardon my ASCII art): ``` Local (precise) | Static Local (imprecise) | | Non-local / \ / Unknown ``` And what you want to do is compute a fixed point of some annotations of each value in created/accessible in the current function. Several things come to mind from this interpretation: First, it seems we could easily make the lattice more interesting by replacing the `Local (precise)` and `Local (imprecise)` levels by the sublattice of sets of precise local value, possibly including _all_ local values (the bottom of the sublattice, which would be equivalent to the old `Local (imprecise)`). HOWEVER, this is just a refinement that may come later if the current system turns out to be too crude, and we don't need to do it now. Second, what you mean by unification here would be to take the _meet_ of elements in the lattice. This is valid for all places that are in output (i.e., positive) position. But in case of input positions, what one should choose instead would be the _join_. I don't know whether this is relevant in the current system. It would certainly be if we supported first-class functions. Imagine something like `f if ... else g` where `f` takes parameters of static lifetimes as input, and `g` takes parameters of unknown lifetimes as input. It is wrong to take their meet here (unknown), because it may result in `f` being passed something that's _not_ static, which would be unsound. You can construct a similar problem with mutation. If `l0` is a list supposed to contain global values and `l1` is one supposed to contain unknown values, then `(l0 if ... else l1)[0] = some_unknown_value` will also be wrong. In the case of mutable fields/list-slots, what we really need is _two_ lifetimes – one input lifetime and one output lifetime – and taking the meet of two such objects should take the meet of the output lifetimes but the _join_ of the input lifetimes. ### Memory Management How does memory management work, exactly? Is everything stack allocated? If so, I don't currently see a poblem with the approach. But if we introduced dynamic allocations at some point, things could get tricky, for example when analayzing `lcaol_obj.x = dyn_alloc(SomeClass()) if ... else some_static_value`. (How would we tell which value should be deallocated and when?) ### Lifetime Polymorphism > Lifetime variables can be treated as non-local lifetime, with additional capabilities: an object with lifetime X can be assigned to a field annotated with lifetime X, and unification with other lifetime variables/non-local lifetime will cause it to become a normal non-local lifetime. I suppose you're talking about those lifetime variables we got from instantiating a function's lifetime signature? Because within the body of a function with lifetime variables declared, these variables should probably be treated as rigid (as in: they cannot be unified with anything else than themselves), right?. ```python class Foo: a: Bar // nac3: @lifetime("x") b: list[int32] // nac3: @lifetime("y") ``` What do `x` and `y` refer to, here? Where are they bound? Are they lifetime variables of `Foo`? If so, why isn't `Foo` declared generic in them?
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#51
There is no content yet.