generated class names now "mangled" differently

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

generated class names now "mangled" differently

Per Bothner
Summary: The way Kawa generates class names has changed, so fewer characters
are "mangled" (encoded) when dealing with disallowed characters.

The Kawa compiler generates classes that execute on the JVM.
Generated classes include those defined by define-simple-class and define-class;
ones generated by a define-library; and the "module class" generated for each source file.
Each class has to have a name, and there are certainly restrictions
on the characters in the name; the names come from Scheme symbols and lists (for
define-library), or generated from a source file name.  These sources have almost
no restrictions on allowed characters.  So we have to encode (or "mangle")
disallowed characters in source names to ones allowed in class names.

Until now. disallowed characters were converted to a 3-character string starting with '$'.
The reason for using '$' is that it is valid for Java identifiers, so you could usually
refers to the generated classes via ugly but valid Java identifiers.  (There was an
exception for reserved words, such as |package|.)

However, the benefit of generating Java identifiers is minor (you could
always use reflection), and there are two main problems:
(1) The generated class names are ugly; unnecessarily so in the case of characters
that are allowed for the JVM but not allowed for Java.
(2) If we mangle a valid JVM characters unnecessarily we're more likely to
get inconsistencies between source file names and class names.  This is especially
bad when it comes to package names, since since now they might end up in an
unexpected directory.

So I've decided to "mangle less" - i.e.only those characters that are disallowed
in class names, not all those disallowed in Java names.  And since I changed the
mangling, I decided to switch to one proposed by John Rose:
https://blogs.oracle.com/jrose/entry/symbolic_freedom_in_the_vm

I believe (but haven't had it confirmed) that other languages and tools
use this mangling, so it made sense to chose it.

This change results in a binary incompatibility, so you need to re-compile everything.
However, you shouldn't need to change the source, unless you did something unusual.

Note this change only affects class and package names, as well as .class files.
It does not change how variable and procedure names are mapped to field and method names.
If it seems to make sense we might change field name mangling in the future.  Method names
are unlikely to change, because one of Kawa's convenience features is the equivalence
between foo-bar-baz and fooBarBaz - or getFooBarBaz in the case of properties.
--
        --Per Bothner
[hidden email]   http://per.bothner.com/