add comment support to rustpython parser #20

Closed
opened 2021-09-24 10:51:41 +08:00 by sb10q · 8 comments
There is no content yet.
ychenfo was assigned by sb10q 2021-09-24 10:51:41 +08:00
Collaborator

The new branch with_nac3comment contains modifications to use the moded python parser to support nac3comment in ast.

The current approach is to add two more string fields to the ast::StmtKind::AnnAssign and ast::StmtKind::For which contains the nac3 comment. Not much is needed to modify the current nac3 code.

I do not seem to have the permission to push to the RustPython repository in m-labs, so currently I just temporarily forked a copy of rustpython and use it in the branch with_nac3comment, will modify later.

The new branch [with_nac3comment](https://git.m-labs.hk/M-Labs/nac3/src/branch/with_nac3comment) contains modifications to use the moded python parser to support nac3comment in ast. The current approach is to add two more string fields to the `ast::StmtKind::AnnAssign` and `ast::StmtKind::For` which contains the nac3 comment. Not much is needed to modify the current nac3 code. I do not seem to have the permission to push to the RustPython repository in m-labs, so currently I just temporarily forked a copy of rustpython and use it in the branch [with_nac3comment](https://git.m-labs.hk/M-Labs/nac3/src/branch/with_nac3comment), will modify later.
Poster
Owner

I do not seem to have the permission to push to the RustPython repository in m-labs

Just ask - fixed.

> I do not seem to have the permission to push to the RustPython repository in m-labs Just ask - fixed.
Poster
Owner

Can we avoid NAC3 references in the parser? Hopefully this comment support would be generic enough and perhaps even merged upstream.

Can we avoid NAC3 references in the parser? Hopefully this comment support would be generic enough and perhaps even merged upstream.
Collaborator

The updated with_nac3comment branch contains updates related to previous discussion, including:

  • avoid NAC3 references in the parser
  • can custom-define the prefix of the pseudo-comment to be parsed
  • all ast::StmtKind can have pseudo-comment, on top of them(correctly indented) or on the right hand side of the first line of them. And multiple pseduo-comments are supported:
# nac3: while1                   <- included in the ast::StmtKind::While node
# nac3: while2                   <- included in the ast::StmtKind::While node
# normal comment                 <- NOT included in the ast::StmtKind::While node, discard
while test: # nac3: while3       <- included in the ast::StmtKind::While node
    # nac3: simple assign0       <- included in the ast::StmtKind::Assign node
    a = 3 # nac3: simple assign1 <- included in the ast::StmtKind::Assign node

will give this ast:

Located {
    location: Location {
        row: 1,
        column: 1,
    },
    custom: (),
    node: While {
        test: Located {
            location: Location {
                row: 4,
                column: 7,
            },
            custom: (),
            node: Name {
                id: "test",
                ctx: Load,
            },
        },
        body: [
            Located {
                location: Location {
                    row: 5,
                    column: 5,
                },
                custom: (),
                node: Assign {
                    targets: [
                        Located {
                            location: Location {
                                row: 6,
                                column: 5,
                            },
                            custom: (),
                            node: Name {
                                id: "a",
                                ctx: Load,
                            },
                        },
                    ],
                    value: Located {
                        location: Location {
                            row: 6,
                            column: 9,
                        },
                        custom: (),
                        node: Constant {
                            value: Int(
                                3,
                            ),
                            kind: None,
                        },
                    },
                    type_comment: None,
                    config_comment: [
                        "simple assign0",
                        "simple assign1",
                    ],
                },
            },
        ],
        orelse: [],
        config_comment: [
            "while1",
            "while2",
            "while3",
        ],
    },
}
  • simple statements in the same line (if 1: a = a + 1) and semicolons (a + 1; print(a)) supported with pseudocomment
  • previous weird behavior caused by simple statements and ; is fixed
if 1: # nac3: this should attach to `if` can the whole program should not parse
a + 1

# the above is however equivalent to this in the previous version:

if 1:
    # nac3: this should attach to `if` can the whole program should not parse
    a + 1

now the first one is not allowed

The updated [with_nac3comment](https://git.m-labs.hk/M-Labs/nac3/src/branch/with_nac3comment) branch contains updates related to previous discussion, including: - avoid NAC3 references in the parser - can custom-define the prefix of the pseudo-comment to be parsed - all `ast::StmtKind` can have pseudo-comment, on top of them(correctly indented) or on the right hand side of the first line of them. And multiple pseduo-comments are supported: ```python # nac3: while1 <- included in the ast::StmtKind::While node # nac3: while2 <- included in the ast::StmtKind::While node # normal comment <- NOT included in the ast::StmtKind::While node, discard while test: # nac3: while3 <- included in the ast::StmtKind::While node # nac3: simple assign0 <- included in the ast::StmtKind::Assign node a = 3 # nac3: simple assign1 <- included in the ast::StmtKind::Assign node ``` will give this ast: ``` Located { location: Location { row: 1, column: 1, }, custom: (), node: While { test: Located { location: Location { row: 4, column: 7, }, custom: (), node: Name { id: "test", ctx: Load, }, }, body: [ Located { location: Location { row: 5, column: 5, }, custom: (), node: Assign { targets: [ Located { location: Location { row: 6, column: 5, }, custom: (), node: Name { id: "a", ctx: Load, }, }, ], value: Located { location: Location { row: 6, column: 9, }, custom: (), node: Constant { value: Int( 3, ), kind: None, }, }, type_comment: None, config_comment: [ "simple assign0", "simple assign1", ], }, }, ], orelse: [], config_comment: [ "while1", "while2", "while3", ], }, } ``` - simple statements in the same line (`if 1: a = a + 1`) and semicolons (`a + 1; print(a)`) supported with pseudocomment - previous weird behavior caused by simple statements and `;` is fixed ```python if 1: # nac3: this should attach to `if` can the whole program should not parse a + 1 # the above is however equivalent to this in the previous version: if 1: # nac3: this should attach to `if` can the whole program should not parse a + 1 ``` now the first one is not allowed

what do you mean by

this should attach to if can the whole program should not parse

?

what do you mean by > this should attach to `if` can the whole program should not parse ?
Collaborator

what do you mean by

this should attach to if can the whole program should not parse

?

sorry, please just ignore the content of the previous comment. Let me explain again:

I mean for this piece of code:

if 1: # nac3: some comment
a + 1

b = 1; # nac3: some other comment
c = b

it should not parse, but the previous version will accept this and regard the first comment as written for the a + 1, and the second comment as written for c = b which is equivalent to:

if 1:
    # nac3: some comment
    a + 1

b = 1
# nac3: some other comment
c = b

And in the latest version, this is fixed:

if 1: # nac3: some comment
a + 1

b = 1; # nac3: some other comment
c = b

this code will not parse

> what do you mean by > > this should attach to `if` can the whole program should not parse > > ? sorry, please just ignore the content of the previous comment. Let me explain again: I mean for this piece of code: ```python if 1: # nac3: some comment a + 1 b = 1; # nac3: some other comment c = b ``` it should not parse, but the previous version will accept this and regard the first comment as written for the `a + 1`, and the second comment as written for `c = b` which is equivalent to: ```python if 1: # nac3: some comment a + 1 b = 1 # nac3: some other comment c = b ``` And in the latest version, this is fixed: ```python if 1: # nac3: some comment a + 1 b = 1; # nac3: some other comment c = b ``` this code will not parse

I mean for this piece of code:

if 1: # nac3: some comment
a + 1

b = 1; # nac3: some other comment
c = b

it should not parse

I don't think it should parse due to indentation error? You mean

if 1: #nac3: some comment
    a + 1


b = 1; # nac3: some other comment
c = b

should not parse right? But why can't we associate the first comment to the if statement and the second comment to the b = 1 statement?

> I mean for this piece of code: > ```python > if 1: # nac3: some comment > a + 1 > > b = 1; # nac3: some other comment > c = b > ``` > it should not parse I don't think it should parse due to indentation error? You mean ```python if 1: #nac3: some comment a + 1 b = 1; # nac3: some other comment c = b ``` should not parse right? But why can't we associate the first comment to the if statement and the second comment to the `b = 1` statement?
Collaborator

But why can't we associate the first comment to the if statement and the second comment to the b = 1 statement?

Thanks for the suggestion, I have added this support in the latest version

now

b = 1; # nac3: some other comment
c = b

would parse to:

Located {
    location: Location {
        row: 1,
        column: 1,
    },
    custom: (),
    node: Assign {
        targets: [
            Located {
                location: Location {
                    row: 1,
                    column: 1,
                },
                custom: (),
                node: Name {
                    id: "b",
                    ctx: Load,
                },
            },
        ],
        value: Located {
            location: Location {
                row: 1,
                column: 5,
            },
            custom: (),
            node: Constant {
                value: Int(
                    1,
                ),
                kind: None,
            },
        },
        type_comment: None,
        config_comment: [
            "some other comment",
        ],
    },
},
Located {
    location: Location {
        row: 2,
        column: 1,
    },
    custom: (),
    node: Assign {
        targets: [
            Located {
                location: Location {
                    row: 2,
                    column: 1,
                },
                custom: (),
                node: Name {
                    id: "c",
                    ctx: Load,
                },
            },
        ],
        value: Located {
            location: Location {
                row: 2,
                column: 5,
            },
            custom: (),
            node: Name {
                id: "b",
                ctx: Load,
            },
        },
        type_comment: None,
        config_comment: [],
    },
}
> But why can't we associate the first comment to the if statement and the second comment to the `b = 1` statement? Thanks for the suggestion, I have added this support in the [latest version](https://github.com/m-labs/RustPython/commit/0acbe3642a8cb9b8c99a67904d40efda7c01e88f) now ```python b = 1; # nac3: some other comment c = b ``` would parse to: ``` Located { location: Location { row: 1, column: 1, }, custom: (), node: Assign { targets: [ Located { location: Location { row: 1, column: 1, }, custom: (), node: Name { id: "b", ctx: Load, }, }, ], value: Located { location: Location { row: 1, column: 5, }, custom: (), node: Constant { value: Int( 1, ), kind: None, }, }, type_comment: None, config_comment: [ "some other comment", ], }, }, Located { location: Location { row: 2, column: 1, }, custom: (), node: Assign { targets: [ Located { location: Location { row: 2, column: 1, }, custom: (), node: Name { id: "c", ctx: Load, }, }, ], value: Located { location: Location { row: 2, column: 5, }, custom: (), node: Name { id: "b", ctx: Load, }, }, type_comment: None, config_comment: [], }, } ```
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: M-Labs/nac3#20
There is no content yet.