Tuesday, February 16, 2010

Enabling Bulk Refactorings

In my last post, I talked about some of the limitations of refactoring IDEs and enumerated three different cases that could better be handled by refactoring IDEs:

  • Vendor Branches - can we make refactorings apply like patches?
  • Universal Language - can we switch between vocabularies or languages?
  • In Concrete - can we overcome some of the small problems presented?

For this post, I'm going to examine the small concrete cases presented in In Concrete.

Bulk Renaming

My first case was to "Rename all my *ElementName classes in a given namespace to *Node where the asterisk represents some wildcard.
Consider over a 100 classes like the following, spread out over at least as many files. Each one might represent a different element in an XML or otherwise non-trivial file format:

package com.example.xml.schemas;

public class ComplexTypeElement { /* ... */ }
public class CompoundTypeElement { /* ... */ }
public class SimpleTypeElement { /* ... */ }
public class AnyElement { /* ... */ }

Here's what the current Rename refactoring dialog looks like in Eclipse:

Augmenting the Rename UI

In this case, our goal is to bulk refactor the classes since our naming convention of *ElementName ended up being a poor convention. What we essentially need is a context sensitive, regex-based find-and-replace refactoring engine. So, let's augment the rename dialog with a couple of other elements that, left alone, wouldn't alter the rename behavior:

Now, let me describe a couple of the elements. First, I introduced a checkbox that allows the programmer to specify whether the refactoring should indeed be a wildcard refactoring. Second, I introduced an "Old name" field. This field represents the entire fully qualified class name to which the refactoring should be applied when a wildcard refactoring is being performed. When a standard (non-bulk or wildcard) refactoring is being performed, this field would display the original class name. Third, I added support for backreferences in the "New name" field. By allowing support for backreferences, the new class name can be based on and derived from the old class name, much like might be done with sed.

To handle the second case presented in my previous blog post, I could now use the following in the "Old name" and "New name" input fields, respectively:

Old name: DataNode::(.*)::get.*Instance

New name: DataNode::\1::instance

In this example, the backreference is used to identify the class but not needed for the member function. Like sed or awk, the syntax and structure would take a bit of learning, but it could be learned far faster than it would likely take to rename over 100 different classes manually.

Conclusion

The above syntax is, of course, not yet sufficient for everything that might need to happen. For example, it might also match DataNode::NestedNamespace::AnotherNestedNamespace::getSomeInstance even though only one namespace should have been considered, but it's a start, and one that I'll elaborate on in later posts.

The main point, however, is that it is possible to overcome many of the current limitations of refactoring engines. Yes, it will take some work, but it will be worth it in the long run. And, in the above cases at least, the implementation is trivial compared with the work already done to support a single rename refactoring.

No comments:

Post a Comment