Pages

Sunday, May 30, 2010

Property Driven Code

Sometimes I feel like this,



trapped in a hard to maintain spaghetti code. I find myself writing again and again fragile code.
My definition of fragile code is, suppose you want to add a feature - good code, there's one place where you add that feature and it fits; fragile code, you've go to touch ten places.
Ken ThomsonCoders at Work
The program may be structured but that does not exclude spaghetti code. The two are not mutually exclusive. Embedding the application's behavior or logic or -even worse- data properties is a path that leads to spaghetti code; pull it here and something moves on the other side.

But, what do I mean by Property Driven Code and how it can help me to reduce spaghetti code?

In its core lays a Data Driven design along with Property Modeling.
Here is how that kind of model looks like:



The programmer's main job is to write Property Driven Code, that means code that introspects data properties, mostly at run-time, and do stuff accordingly. Introspection, or reflection, is the magic thing that can reduce code complexity and hard-coded outer world knowledge.Yes, properties drive the code.

Properties, or Prototype-based programming, is similar to OOP but not the same thing. You can read Steve Yegge's great article The Universal Design Pattern by which I got inspired and wrote my proto_t data structure so I can use property modeling in my C++ projects.

Now I'll try to illustrate the concept with a simple example. Suppose that we have to load and save tabular data from and into text files. The file formats are from 3rd party software and cannot be change. Some of them are similar to CSV format but they can have slight differences. Or they might not be CSV but just lines with fixed length fields. If you think that's an unlikely scenario then welcome to the world of Electronic Cash Registers. How would you approach that problem? Would you use well known and tested CSV parsers? Write your custom parser? And if you pick up a library to do the job how would you structure the different format cases? By using a switch case? OOP and polymorphic classes? Plain old if-else chains?

Let's suppose we have to parse a file that contains lines like this one:
"hello, world", 42

That's a quoted string followed by a comma delimiter and an integer. If we choose to use Properties to design a solution, we'll have to declare a format object with 2 field entries which could have the below properties:

   name = "Description"
   maxSize = 12
   quoted? = true
   text? = true
   delimiter = ","
   

   name = "Magic Number"
   maxSize = 4
   quoted? = false
   integer? = true
   delimiter = ""


The above properties describe the line format quite accurately, thus they are sufficient for driving code to parse or construct such lines. So, with that format we can construct an appropriate model to store the data, customize a table control to display the data, serialization and deserialization code and what else we might need. Also note that the two fields may or may not share the same properties. In the OOP world, such specialization is a bit of problem and either we'd had to spread the properties among specific classes or put all of them in a base class and just let subclasses to ignore some of them. In a prototypal inheritance, Property model that is, classes do not exist. The properties define the object class.

As an example, our deserialization routine can construct a regular expression from the above format, on the fly, to extract the fields from the line.
Eg, pattern = ^"([^"]{,12})",(\d{,4})$

That way we can create an infinite number of little parsers to extract data, even from cranky formats, simply by taking advantage of regular expressions driven by properties. Cool, isn't it? And the best part is that we don't have to hard code format-logic in our code.

OK, that's the basic idea. The main point is that if we declare data relationships and data properties we can exploit run-time introspection and that's a powerful design technique. Whatever we do we'll have to add  functionalities at some point in the one way or the other. If you don't automate, somehow, the data manipulation you'll have to explicitly do it. No way to avoid that.

No comments:

Post a Comment