Tuesday, April 7, 2009

tpsh: test of expand_quotes()

$ echo 'hi bye' foo "$USER" and "~" or ~
expand_quotes ': echo  | hi bye |  foo "$USER" and "~" or ~
expand_quotes ":  foo  | $USER |  and "~" or ~
expand_quotes ":  and  | ~ |  or ~
hi  bye foo $USER and ~ or /usr/home/Terry
$

# note: 
#       the 2 spaces /displayed/ between hi and bye are a bug in
#       tpsh; echo'ing things to file via I/O redirection works
#       properly. "$USER" is not expanded because expand_parameters()
#       still needs adjustments.
# 


tpsh_parse invokes expand_quotes() to break up its input line based on the shells quoting rules; and proceeds to go about it's business. tpsh_lex() then accepts the token buffer and begins building a new data structure from it. The tokens from tpsh_parse get analyzed and reassembled "on the quotes", i.e. it will do it's check on 'hi\ ' and 'bye' and the rest as separate elements; then reassemble the argument vectors as an array reference: becoming 'hi bye' again. (id est quote expansions add escapes to tell the lex phase where to rejoin things) After everything is said and done between parse and lex, the queue like data structure is ready, the argument vectors contained there in are ready to be mapped onto resolve_cmd() calls for execution.


To hunt down any other booboos in the expand_quotes() subroutine, I've made it display it's work, so I can see how it detects what when testing the shell. basically as "expand_quotes QUOTE: unquoted | quoted | remainder".


As one can guess from what the above shell snippet implies: quoting is handled recursively. Because I'm used to languages with finite stack space and no reliable tail call optimizations; I almost never write recursive functions of any kind, whether they are tco'able or not. Algorithmically, expand_quotes() is a very simple procedure.


It expects to be called with an input line; and treats multiple arguments accordingly (for now). Internally a dispatch table and token stack are maintained; the table contains references to anonymous subroutines, to which the scanned elements are delegated to for the proper expansions.

If no quotes are detected on the line, return the result of expanding it with the default delegate (for unquoted text).

Otherwise break the line on the first set of (matching) quotes.

Any text defined before the beginning quote must be unquoted; apply the default expansion from from the table.

The text between the matching quotes is quoted, apply the appropriate expansion form the table (i.e. ', ", or `).

Any text remaining after the matching quotes may or may not be quoted; invoke expand_quotes() on the remainder to find out, and apply the result.

Each expansion applied is pushed onto the token stack in the escaped form it expanded to (i.e. "'hi bye'" becomes "hi\ bye"), and the stack is returned to the caller once processing is completed.


With refactoring, the procedure could likely be made tail recursive but I don't think perl does TCO. Either way, the users fingers or (likely) the machine generating the inputs should run out of stack space before tpsh could pop a cork at the number of quotes lol. An earlier design for expand_quotes() had more in common with finite state machines (in so far as I've seen them implemented), but was a lot more contorted then expand_quotes()' present shape :-/.

Current bugs are handling nested escaped quotes or multiple empty quotes (the spliter) and removing unquoted quotes (addition to delegate sub for unquoted text).

# bugs in expand_quotes
$ echo 'foo \"bar'
expand_quotes ': echo  | foo \"bar |
foo  "bar
$ echo "foo \"bar"
expand_quotes ": echo  | foo \ | bar"
foo   bar"
#
# correct result would have been equal to the previous command
#
$ echo '' "" '' "" '""' '' "" '"' "'"
expand_quotes ': echo  |  |  "" '' "" '""' '' "" '"' "'"
expand_quotes ":  " |  ''  | " '""' '' "" '"' "'"
expand_quotes ': "  | "" |  '' "" '"' "'"
expand_quotes ':  ' |  ""  | "' "'"
expand_quotes ": "'  | ' |
"  ''  " "" '  ""  "' '
#
# correct result would have been:     ""   " '
# at least, that's how all bourne based shells I 
# know about treat it; I would prefer: "" " '
# i.e. without leading whitespace.
#

For some reason this makes me curious, has anyone ever explained why shell syntax allows "\"" but not '\'' ? (the results being " and unclosed quote /or syntax error respectively)


When trying to solve a programming problem, generally I try the most simple solution before I try something more complex: and then evaluate a neater method. I consider the implications solutions have on efficiency, but that is trying to avoid shooting myself in the foot later, rather then trying to optimize the code for a machine.


Some how, I think expanding quotes is just naturally recursive in my crazy brain :-D.




EDIT
commit aeac14bd177a93b84c138a0c62e2cda49e5fe15c
Author: Terry <***snip***> 
Date:   Tue Apr 7 22:24:35 2009 +0000

     bugfix: parameters now expand within quotes via expand_quotes and may be escaped

commit 089fda7cca0049dcabdc8b9659f94dcae417074b
Author: Terry <***snip***> 

     bugfix: escaped quotes witihn quotes and multiple quotes handled correctly

     previous behaviour:

     $ echo 'foo \"bar'
     expand_quotes ': echo  | foo \"bar |
     foo  "bar
     $ echo "foo \"bar"
     expand_quotes ": echo  | foo \ | bar"
     foo   bar"
     $ echo '' "" '' "" '""' '' "" '"' "'"
     expand_quotes ': echo  |  |  "" '' "" '""' '' "" '"' "'"
     expand_quotes ":  " |  ''  | " '""' '' "" '"' "'"
     expand_quotes ': "  | "" |  '' "" '"' "'"
     expand_quotes ':  ' |  ""  | "' "'"
     expand_quotes ": "'  | ' |
     "  ''  " "" '  ""  "' '
     $

     new behaviour:

     $ echo 'foo \"bar'
     expand_quotes ': echo  | foo \"bar |
     foo  "bar
     $ echo "foo \"bar"
     expand_quotes ": echo  | foo \"bar |
     foo  "bar
     $ echo '' "" '' "" '""' '' "" '"' "'"
     expand_quotes ': echo  |  |  "" '' "" '""' '' "" '"' "'"
     expand_quotes ":   |  |  '' "" '""' '' "" '"' "'"
     expand_quotes ':   |  |  "" '""' '' "" '"' "'"
     expand_quotes ":   |  |  '""' '' "" '"' "'"
     expand_quotes ':   | "" |  '' "" '"' "'"
     expand_quotes ':   |  |  "" '"' "'"
     expand_quotes ":   |  |  '"' "'"
     expand_quotes ':   | " |  "'"
     expand_quotes ":   | ' |
     "" " '
     $

No comments:

Post a Comment