Tuesday, March 30, 2010

Scripting Refactoring -- Overthrowing the GUI (part 4)

Over the last few blog posts I've mentioned a number of different reasons why it would be nice to be able to script the application of refactorings. This post talks more about one possible scripting language that could be used to script refactorings.

Interesting Cases

Although I could just provide the grammar and some sample input, it wouldn't be very instructive or helpful. Here are some more interesting cases and considerations.

Camel Case

Camel Case is a standard Java convention that uses capital letters to separate words that form an identifier. For example, isTrue, shouldValidate, and eatsHotdogsOrHamburgers are all camel case identifier names. The first letter is typically lower case with the first letter in each subsequent word in the identifier being upper case.

Imagine that Jr. Programmer comes along and creates a class named AbstractSyntaxNode that contains methods called complexNode, simpleNode and 20 more similarly named methods. When Mr. Senior Programmer comes along he immediately notices that in order to follow standard Java programming conventions, these should instead be named getComplexNode, getSimpleNode, and so on. Although it would be painful to manually go through and invoke the rename refactoring on all 22 methods, this is a good case for scripted refactorings. Consider the following rename refactoring script:

rename {
    AbstractSyntaxNode{class}::(.*)Node{method},
    AbstractSyntaxNode::get\1Node;
}

This, however, would result in the following method names:

  • getcomplexNode
  • getsimpleNode
  • ...

In the above case, we need an easy way to tell it that the first letter needs to be transformed to an upper case character. Perl provides a \u regular expression escape sequence that capitalizes the following character. Thus, to handle the camel case issue nicely, the substitution engine would need to support similar escape sequences.

Absolute and Relative Position

Absolute and relative position markers also make an interesting case. For example, I might want to rename the variable on the second line after the start of the method. Or, I might have my cursor over a variable within an editor that has a keyboard shortcut for invoking a refactoring. In these cases, I have both relative and absolute position information that is likely desirable in a refactoring scripting language.

Assume for a moment that I want to do an Extract Method refactoring of everything between line 40 and 60, inclusive. I might write the following:

extractMethod {
    40, 60, newMethodName;
}

In this case, there's no need for column information, so using a basic integer for the line information is sufficient. But, what if you wanted to inline the function referenced between column 20 and 30 on line 42? Then I would need to write something like this:

inline {
    42:20, 42:30;
}

As many editors display the line and column information as line:column, the above syntax is fairly familiar. But, how could we go about identifying relative position indicators? Consider the following:

rename {
    @(SomeClass::someMethod{method && definition} + 2), newVariableName;
}

This shows a couple of more interesting cases - first, I need to be able to identify the the method definition which I do by restricting the results of my element query to definitions in my scope clause. And, second, I needed to use the @ operator to make the element reference a position based reference.

Conclusion

Such a scripting language could easily grow unwieldy, but the Pareto Principle suggests that there must be a fairly "easy" solution that will handle 80% of the cases. What would such a solution look like? It's really hard to say until somebody actually implements one and has some real-world users. Nevertheless, my sample input and ANTLR based grammar are linked to below.

Comments and thoughts? I'd love to hear them! Thanks.

A full set of sample inputs is available on GitHub. Similarly, the full grammar is also available.

No comments:

Post a Comment