Wednesday, February 24, 2010

Scripting Refactoring -- Overthrowing the GUI

The most basic and common refactoring in any language is Rename. Whether it's Rename Variable, Rename Method, Rename Class, or Rename Namespace/Package, this simple refactoring helps improve code clarity and, when applied correctly, makes code easier to understand. As reading code is the most frequent activity a programmer undertakes, this makes the Rename refactoring one of the most powerful tools in a developers' arsenal.

Introducing Element Identification

Consider now the Rename refactoring when applied in a batch. In order to apply the Rename refactoring, I first need to be able to unambiguously identify the element being renamed, be it a class, method, variable, etc. Let's entertain for a moment the following notation

[Member::[Member::[Member::...]]]Member

This notation could be used to identify:

  • a namespace at any level (e.g., MyNamespace, Namespace1::Namespace2)
  • a class within a namespace (e.g., MyNamespace::MyClass, SomeClass)
  • a member of a class (e.g., AClass::someVar, ANamespace::MyClass::a)

Although this syntax resembles the C++ scoping syntax, it is important to note that it is ambiguous. Given the search query, Something::someMember, determining whether Something is a class, struct, or namespace is impossible. Similarly, someMember could be a class within a namespace, a class or struct within a namespace, or a member variable within a class or struct.

Overcoming Ambiguity

Since we need to constrain how we select elements, lets extend our syntax a bit by adding an optional { TYPE_LITERAL } constraint to each member:

[Member[{TYPE}]::...]Member[{TYPE}]

More cases can now be handled well:

  • Something{namespace}::someMember{class}
  • Something{class}::someMember{method}
  • Something{method}::someMember{variable}
  • Something{namespace}::Something{class}::someMember{method}

But... we're still ambiguous. Consider the following code:

void doSomething() {
    for (int i=0; i<10; ++i) { /* ... */ }
    /* ... more code here ... */
    for (int i=0; i<15; ++i) { /* ... */ }
}

Staying Useful

In the above code we have two different counter variables with a name of i. Does that mean it's pointless to try to support batch refactorings or to try to constrain elements in a refactoring scripting language? No.

Regular expressions don't allow everything to be queried, yet they're still useful. Even after adding positive or negative lookahead (or lookbehind) assertions, they're still limited. Yet, at the same time, they handle a few more cases and become more generally useful.

The same thing should be able to happen for (most) programming languages. It's possible to create a scripting language, or syntax, that will allow us to easily identify most syntactic elements within a programming language. Perhaps this refactoring scripting language could leverage the BNF that defines the programming language, or perhaps there's even something better. Either way, we're one step closer to overthrowing the GUI and enabling bulk and scriptable refactorings.

Yet more excerpts and discussion from my thesis to come.

No comments:

Post a Comment