Sunday, April 5, 2009

Two schools of thought: random thinkings from a spider.

DRAFT todo: fix footnote indexes and further proof reading / copy editing.

Two schools of thought



random thinkings from a spider.




There is certainly more then one way to solve most problems, it's just a matter of their merits. This paper serves to compare and contrast two common methods of solving a simple problem.

In our example, let us say we have a large and complex E-Mail message. We began editing it on one computer, then copied the draft to a USB stick0 and took it home. We then begin making some further adjustments. Later, we return to the first machine to finish the draft, but realize we didn't copy the file back to the USB stick! We continue to edit the draft, and take it home again. We now have two different but related forms of our message. The message is quite big, and we don't want to rewrite any of the good parts, so what do we do? We have to compare and merge the two different messages into a single final draft.


How might we solve this problem?


Most people I know1, would open each version in a mail client (e.g. Outlook, Thunderbird) or an editor (e.g. notepad.exe, gedit) and place the windows side by side; and then go about visually comparing the files; either merging one into the other, or using a third window. This is slow, error prone, and rather clumsy.


Since I'm not willing to take that much time, I would apply software to fix this problem2. This means we would need software to compare, merge, and edit the files. How might these programs work?


Common software that comes to mind: diff, patch, diff3, kdiff3, kompare, meld, windiff, winmerge, and most decent (programmers) editors support the task as well (vim, emacs, jedit, etc), along with decent Version Control Systems (VCSs) and Integrated Development Environments (IDEs). Some programs are textual, some are graphical, some are an amalgamation of parts, while others are heavy lifters3 in their own right.

We'll take a look at how two different styles might result in software to complete these simple tasks:

  1. Compare two or more files.4
  2. Merge two files into a third.
  3. Allow the editing of changes to be done.


I'll call them styles A and B.


Let's start off easy, how can we compare the files? It's rather easy for a program to do a byte by byte or character by character5 comparison of a files contents, but it would be nice to be able to see what actually changed - in some format we can understand.

In Style A, we write a simple program that can pretty print the differences to a text file (or better yet, an output stream6). If we want a more interactive interface, we can view the output file in an editor or pager; or write such a program of our own. One that understands our pretty printed format, and can display it to the user. So let's say we now have a simple 'compare' program that outputs text, and a simple 'viewer' program that accepts the result of the compare, and allows the user to browse it on their display.

In Style B, we write a program 'compview' that generates a format suitable for it's own internal use; perhaps a list of change-nodes with data on where to display it and how to display it. Then set out and write a viewer program to display this to the users display - in essence creating a pager like program, whether it is textual or graphical in nature. For ease of viewing the differences over time, someday we may add an "Export to file" feature that dumps the data to a suitable format.

Now let's take a step back, and look at what we can do with these kinds of solution. Since style A developed 'compare' to output a very simple textual stream, we can view the file in any program that we like, without having to use the supplied 'view' program. We might even develop a program (or change compare) to [re]generate the output in HTML, so that we can view the comparison in a web browser instead.

In short, the design choice makes the view program almost superficial, not to
mention that it can be kept quite simple; for people without better
tools7 rather then the whole kit and kabootle. The 'compview'
program built in style B, will likely have a close relationship between the
file compassion and viewing operations; perhaps to the point of excluding the
ability to do the view in an external program without exporting to a suitable format8,
which may or may not be easy to use with other tools. In style B, even if the
internal format was XML or HTML, compview would still likely contain half a web
browser9.


Now that our software can compare two files in a way we like, let's move on to the process of merging the two files.


Style A might create a program, 'merge' that understands the output of compare or a filter that can convert the output into instructions for another program (edit) to complete the changes itself. Some operating systems (i.e. UNIX and DOS) provide suitable editors for this task: some people might be inclined to implement their own hbatch or stream editor (I would suggest installing 'ex' and 'sed', or make it a *really* expressive line editor). The up side of the latter approach, the compare format and the file merging can be more readily separated; the user could even find other 'edit' programs or intersperse the chain of commands with other compatible tools.

Style B would likely take it upon itself to conduct the merge operation directly. One down side of this would be the means of which we save or 'Export' the comparisons from compview. If an format based on compview's internal data structures was used, different versions of compview might not even be able to understand it; oh joy, now we have to remember what versions of our compare & merge app knew what! But all in all, the program will probably develop some good interactive comparison and merging features, if the programmer doesn't go postal first.


What about editing the file or adjusting the file comparisons to create a more complex merge? If style A provided an 'edit' back end, we don't even need another text editor to work with compare's output: but we could use our own11 and feed the result back into the tool chain. Style B might provide support for an external editor or build one into compview's interface. Since an external editor would mean 'compview' would have to translate it's data to text for the editor, then back into the internal data, it would be a major pain. Building in an editor might be a fairly simple task or a problem; most GUI toolkits provide an editable text area, some TUI toolkits may not. Depending on the libraries being used, the programmer might have to hash out a HTML WYSIWYW12 text editor component built into compview13; assuming compview isn't half a web browser in it's own right yet.


Bonus: change of interface


Let's say we want to convert from a graphical to a textual or from a textual to a graphical interface.

The compare and related utilities developed in style A, could have been done largely in a display agnostic view; what does a file comparator need to know about user interfaces? In this case, the 'view' program would only need replacement with a tool that supports the other interface style. Another benefit, because of the separation between tools, even the changed interface may be change yet again in some strange new way21. The possibilities are virtually endless and shared libraries may be utilized to ease related tasks.


While compview on the other hand, developed with style B is likely chained to its interface. If the code monkey was smart, as much of the code base as possible would be abstracted20 away from the interface, and kept simple enough to be used as a library. If the library didn't exist previously, or was not user interface agnostic: it will be much more labour intensive to make it so (or create it) after the fact, then to have done it in the first place22. The ever increasing bloat and complexity of compview, may be its untimely my downfall; because it must either adapt to the changing world, or be replaced. If it can not do so easily, it will either fall by the way side or restrict its users through its own (lame) limitations.



Discourse on the Results


Either of these programs, the compare, view, merge, and edit suite or the compview jack of trades would be suitable for completing the task at hand, but what comes of these two courses or styles of solving it?


Style A may have a tendency to fragment things, or depending on the programmers mind, fall into more 'odds and ends'grouping then a useful tool set. A great advantage, because each element is a separate program, they have a very minimal knowledge of one another's internal workings. Because all communication is accomplished through simple streams of input and output; you can even replace parts of the suite with other tools14. Small or big changes could be done with a change of program, and have very little impact on how the operation is performed by the tool set overall. The only major issues being the comparison format and editor instructions15. Properly documenting the protocols and relationships between the programs would make things all the more easier; both for maintainers and other monkey's who may need to replace or tweak parts of it. Adding features can be done fairly easy, but making big changes may break compatibility with the last decades version.


Style B takes a swiss army knife point of view: whatever needs to be accomplished should be done by 'compview' whenever possible. Depending on the competency of the programmer16, compview can fall into many subtle traps17. In most probability, compview will either be troublesome to inter-operate with other such tools, or require irksome filters or worse; building the filters into compview in a way that may or may not be user serviceable. How closely entwined the various elements are, would likely depend on the attentiveness and skill of the programmer; some people create maintainable tools, others create balls of horse dung that they may end up hating years later. Adding new features may also require massive restructuring of the program, dpeneding on how it was implemented.


In my opinion, compview would likely be capable of becoming more efficient at what it does then the compare/view/merge/edit suite, but is more likely to become inefficient and more difficult to maintain in the long run; because it is more difficult to engineer such a complex program correctly. In a way, you could say it just has to many moving parts.... Why cram the engine, transmission, and power steering into one huge moving part, when you can have 3 smaller moving parts?18. One interesting side effect, the software created by style A may be fairly easy to script, but the program from style B would have to embed it's own scripting language23.


I generally opt for pieces that work together on simple protocol; because it helps keep me from shooting off my own foot later19.




Footnotes


0. Rather then using an USB Stick, I actually would use Webmail or some other network solution - and avoid this kind of problem altogether.

1. Most people that I know, would first have to figure out how to get the e-mail message shuffled between computers, let along view it side by side

2. I would probably use Vi IMproved's 'diff' mode to interactively compare, edit, and merge the files.

3. I believe the modern GNU diffutils and friends have grown horns of their own.

4. We might want to compare our final draft against the two old drafts!

5. These may or may not be the samething, depending on who, what, where, and when.

6. I.e. allow feeding the programs output into another program, without the need for shared memory or (insecure) temporary files.

7. I like less, but using vi as a readonly 'view' is sometimes fun.

8. XML, CSV, Binary dump of internal data; etc.

9. And just like 'compare' would without a 'compare2html' filter, bloat out with having to escape various character data into the format, or risk breaking interoperability with other programs (e.g. Internet Explorer, if compatibility is possibility with it in the first place)

10. In point of fact, because of the flexibility that pipes and redirection offered the UNIX system, it was possible to use the early 'diff' program and 'ed' editor to carry out this kind of solution. To deal with the early systems simplicity, as more useful 'diff' output formats became the normal, Larry Walls 'patch' program was created to heuristically apply the changes to the file set more effectively then was previously possible. Replacing the ed program and simple 'ed diffs' for once and for all (actually, patch could feed ed diffs into ed if that format was used). Since the take over of non-scriptable screen editors had become more common by then, I can't help but wonder if a more expressive program then 'ed' had been available, what shape Larry Walls patch may have taken.

11. I use Vi IMproved (vim); Emacs, KATE, jEdit, and TextMate are also good choices. I have little love for tools like Notepad, Edit, or feature-packed clones.

12. I say What You See Is What You Want because What You See Is Not Always What You Get.

13. A particularly poor programmer, or poor engineer might make this editor component very tightly integrated with compview, rather then something that could be reused on other projects and plugged into our current one. Given the nature of compview, I think the former is a more likely psychological trap then the latter.

14. Much like patch has superseded ed for batch processing of diffs.

15. A smarter programmer will make it easy to retool 'merge' to feed instructions into a new editor; a brillant programmer might anticipate the need, and choose to do see this ahead of time, and choose to supply an instruction set telling merge how to generate said instructions for the associated 'edit' tool; rather then designing it for any specific editor component.

16. And the stress to get it done 'on time'

17. Subtle traps to most, but obvious to me. I've had to deal with to much software that just 'sucks' over the years, not to notice ;-}.

18. I'm scared to think about the auto-industry.

19. That, and I've found many more powerful and flexible tools that can be used that way, then I have ever found swiss army knives that can match such flexibility.

20. Note that I do not mean a group of abstract base classes.

21. Such as from Tcl/Tk to C/Gtk+ or Java/Swing.

22. This is one of the traps new or casual programmers seem to fall into.

23. I would suggest a language like JavaScript or Python if possible for such a task. Unless it resembles a common language or is suitably domain specific, I dislike programs that create their own scripting or extension languages just for a specific applications plugin/automata. The last thing a user needs to do is learn YOUR apps language, that is also highly specific to your specific program that it is also highly useless everywhere else. A customized dialect of LISP or a class library mated with a known language is much better.

No comments:

Post a Comment