Additional Context: LeafPatcher, Self-Injection, and Potential Workarounds
After further investigation, I wanted to add some details that may help pinpoint the right solution.
The exact mechanism: LeafPatcher.patch()
The decoded text is lost at a very specific point in the pipeline. After parsing succeeds on the decodedChars buffer, InjectionRegistrarImpl.parseFile() calls patchLeaves(), which invokes LeafPatcher.patch() to rewrite every leaf node’s text range so that the injected PSI tree’s text matches the DocumentWindow text (which mirrors the host document, escapes included). Before patching, identNode.getText() returns A; after patching, it returns \101. The decoded buffer used during parsing is effectively discarded from the PSI layer at this point.
This is a self-injection scenario
It’s worth noting that our use case is somewhat unusual: we inject our language back into itself (the host and injected language are the same). The interpolated expressions inside "%expr%" reference symbols defined in the host file’s scope.
In typical injection scenarios (e.g., SQL into Java strings, regex into Python), the injected language resolves references against its own symbol space, so the host’s escape encoding never interferes with name matching. In our case, the injected PSI’s GclFieldRef.getText() must match field names from the host GCL file — which is where the patched (escaped) text causes resolution failures.
This likely explains why this issue hasn’t surfaced more broadly — most plugins never encounter it.
Kotlin’s approach avoids the problem structurally
KotlinStringLiteralTextEscaper (in psi-api) builds a sourceOffsets array during decode() for correct offset mapping. However, Kotlin sidesteps the text identity problem entirely because template expressions (${expr}) are separate PSI children of KtStringTemplateExpression — they are never decoded from escape sequences. The identifiers inside ${} exist as first-class PSI nodes with their own literal text, so LeafPatcher never rewrites them.
Workaround limitations
We investigated whether the decoded text is recoverable after injection:
element.getText() → returns host text (\101) — leaf nodes are patched to the DocumentWindow
element.getContainingFile().getViewProvider().getDocument().getText(element.getTextRange()) → also returns \101 — getTextRange() is remapped to the DocumentWindow which mirrors the host document
- The original
decodedChars buffer is passed to VirtualFileWindowImpl for parsing but is not accessible through any public API after LeafPatcher rewrites the tree
As far as we can tell, the decoded text is effectively lost once patchLeaves() completes. There is no PsiElement or Document API that returns the pre-patched (decoded) content.
This means our only current option is to re-decode manually using the LiteralTextEscaper from the host element during reference resolution, which duplicates work and feels fragile.
Concrete proposal
Would any of the following be considered appropriate platform-level solutions?
-
A PsiElement API for decoded text — e.g., getDecodedText() or an InjectedLanguageManager utility that retrieves text from the VirtualFileWindowImpl’s decoded buffer rather than from the patched leaf ranges.
-
A user-data key on patched leaves — LeafPatcher could store the original decoded text as UserData on each leaf it modifies, making it retrievable without going through the DocumentWindow.
-
An opt-out from LeafPatcher — A flag on LiteralTextEscaper (e.g., preserveDecodedText()) that tells the injection framework to skip leaf patching for fragments where the plugin handles offset mapping itself.
Any guidance on the recommended pattern would be greatly appreciated. Thank you!