Thursday, March 4, 2010

Scripting Refactoring -- Overthrowing the GUI (part 2)

Last week I introduced a hokey syntax that could be used to identify various elements that were going to be refactored in bulk. The syntax was ambiguous and incapable of expressing anything of even mild complexity. Now for something better.

The proposed syntax doesn't apply to every language. It would be both pointless and painful to attempt such a task. Rather, the following syntax is intended to work with languages like Java, C/C++, and other similarly structured languages.

A Grammar Proposal

Consider the following grammar defined in EBNF, with non-terminals being lower-cased and terminals being upper-cased and left implicit:

query-element: pattern ( "::" pattern )*;

pattern: ( IDENTIFIER | "/" REGEXP-LITERAL "/" ) type-specifier?;

type-specifier: "{" TYPE-LITERAL "}";

Using the above EBNF, arbitrary elements can be queried and their type constrained when necessary. Here's a few examples:

  • All classes named MyClass in any namespace - /.*/::MyClass
  • Any class containing the word Node in the AST namespace - AST::/.*Node.*/
  • The second class containing the word Node in the AST namespace - AST::/.*Node.*/[1]
  • All typedefs named subType contained in functions of the Node class - Node::/.*/::subType{typedef}
  • All variables named other contained in functions of the Node class - Node::/.*/{function}::other{variable}
  • Any function starting with test in the Temp namespace - Temp::/test.*/{function}

Admittedly, this query language does not support all the queries that one might want to make, but it should be noted that the above query language can be easily extended. For example, instead of using a TYPE-LITERAL terminal symbol, one could use a type-literal production that allowed for negating conditions, alternations, and more.

Function and operator overloading, as supported by C++, also complicate matters. Even though the fully qualified name of a function may be specified, it does not imply that only a single match exists. Any number of functions might exist. This language could be expanded to allow the types of the parameters to be specified. By specifying the parameter types, any function could be fully resolved. In addition, anonymous namespaces and blocks could be identified by augmenting the language to support empty names and array indexing.

Isn't It Complicated?

Yes, now we're starting to get complicated. But honestly, most of those features wouldn't matter. It's like the saying, "20% of your code you handle 80% of the cases." By starting out simple, and only augmenting the grammar when a real, arguable need is established, the grammar could be expanded little-by-little in useful ways. It really goes back to two principles: KISS and YAGNI. Start simple, design when necessary, refactor to your goal

Bonus points to the person who finds the grammar / example mismatch above.

No comments:

Post a Comment