Hi!
I’m developing a plugin that implements support for a new language. I tried writing a grammar for it a few times but failed. So I retrieved an ANTLR4 grammar and used the antlr4-intellij-adaptor
to use it in the plugin.
This is working quite well, but that grammar was made for a compiler, not for an IDE use. This means the lexer is only creating tokens for parts that the compiler requires, resulting in larger and less-significant tokens. For example, whitespaces are included in other token types, no token is created for parentheses, brackets, etc.
I would like to improve the PSI tree that is built upon that grammar, by splitting tokens into smaller ones. That would not modify the file content, only improve its PSI structure.
E.g. having PsiWhiteSpace PsiElement('abc')
instead of PsiElement(' abc')
.
I’m trying to find the right place where to do that:
- Modify the token stream in the
ANTLRLexerAdaptor
: it seems straightforward, but it would then mess with the ANTLR parser, not accepting the new tokens in these places? - Modify
ANTLRParserAdaptor
to create the rightPsiBuilder.Markers
: this seems like rewriting the whole class, due to the tight coupling of the parser, lexer and PSI builder. - Modify the AST tree built by
PsiParser.parse(IElementType, PsiBuilder)
: this seems doable, but it may be error-prone to modify the tree while walking it with all those links between nodes. - Modify the PSI structure by hooking on
ParserDefinition.createElement(ASTNode)
: I remember having read in the documentation that each PSI element should be backed by an AST node, so this wouldn’t be the right place?
Any advice on the best way to do that? From what I researched, I think the better way to do it is the third solution, modifying the AST tree. Is there any tooling to do it easier/safer?
Thank you!