data/TWiki/TWikiInfixParserDotPm.txt
author Colas Nahaboo <colas@nahaboo.net>
Sat, 26 Jan 2008 15:50:53 +0100
changeset 0 414e01d06fd5
permissions -rw-r--r--
RELEASE 4.2.0 freetown
colas@0
     1
---+ Package =TWiki::Infix::Parser=
colas@0
     2
colas@0
     3
A simple stack-based parser that parses infix expressions with nonary,
colas@0
     4
unary and binary operators specified using an operator table.
colas@0
     5
colas@0
     6
Escapes are supported in strings, using backslash.
colas@0
     7
colas@0
     8
colas@0
     9
%TOC%
colas@0
    10
colas@0
    11
---++ new($client_class, \%options) -> parser object
colas@0
    12
colas@0
    13
Creates a new infix parser. Operators must be added for it to be useful.
colas@0
    14
colas@0
    15
The tokeniser matches tokens in the following order: operators,
colas@0
    16
quotes (" and '), numbers, words, brackets. If you have any overlaps (e.g.
colas@0
    17
an operator '<' and a bracket operator '<<') then the first choice
colas@0
    18
will match.
colas@0
    19
colas@0
    20
=$client_class= needs to be the _name_ of a _package_ that supports the
colas@0
    21
following two functions:
colas@0
    22
   * =newLeaf($val, $type)= - create a terminal. $type will be:
colas@0
    23
      1 if the terminal matched the =words= specification (see below).
colas@0
    24
      2 if it is a number matched the =numbers= specification (see below)
colas@0
    25
      3 if it is a quoted string
colas@0
    26
   * =newNode($op, @params) - create a new operator node. @params
colas@0
    27
     is a variable-length list of parameters, left to right. $op
colas@0
    28
     is a reference to the operator hash in the \@opers list.
colas@0
    29
These functions should throw Error::Simple in the event of errors.
colas@0
    30
TWiki::Infix::Node is such a class, ripe for subclassing.
colas@0
    31
colas@0
    32
The remaining parameters are named, and specify options that affect the
colas@0
    33
behaviour of the parser:
colas@0
    34
   1 =words=>qr//= - should be an RE specifying legal words (unquoted
colas@0
    35
     terminals that are not operators i.e. names and numbers). By default
colas@0
    36
     this is =\w+=.
colas@0
    37
     It's ok if operator names match this RE; operators always have precedence
colas@0
    38
     over atoms.
colas@0
    39
   2 =numbers=>qr//= - should be an RE specifying legal numbers (unquoted
colas@0
    40
     terminals that are not operators or words). By default
colas@0
    41
     this is =qr/[+-]?(?:\d+\.\d+|\d+\.|\.\d+|\d+)(?:[eE][+-]?\d+)?/=,
colas@0
    42
     which matches integers and floating-point numbers. Number
colas@0
    43
     matching always takes precedence over word matching (i.e. "1xy" will
colas@0
    44
     be parsed as a number followed by a word. A typical usage of this option
colas@0
    45
     is when you only want to recognise integers, in which case you would set
colas@0
    46
     this to =numbers => qr/\d+/=.
colas@0
    47
colas@0
    48
colas@0
    49
---++ ObjectMethod *addOperator* <tt>(%oper)</tt>
colas@0
    50
Add an operator to the parser.
colas@0
    51
colas@0
    52
=%oper= is a hash, containing the following fields:
colas@0
    53
   * =name= - operator string
colas@0
    54
   * =prec= - operator precedence, positive non-zero integer.
colas@0
    55
     Larger number => higher precedence.
colas@0
    56
   * =arity= - set to 1 if this operator is unary, 2 for binary. Arity 0
colas@0
    57
     is legal, should you ever need it.
colas@0
    58
   * =close= - used with bracket operators. =name= should be the open
colas@0
    59
     bracket string, and =close= the close bracket. The existance of =close=
colas@0
    60
     marks this as a bracket operator.
colas@0
    61
   * =casematters== - indicates that the parser should check case in the
colas@0
    62
     operator name (i.e. treat 'AND' and 'and' as different).
colas@0
    63
     By default operators are case insensitive. *Note* that operator
colas@0
    64
     names must be caselessly unique i.e. you can't define 'AND' and 'and'
colas@0
    65
     as different operators in the same parser. Does not affect the
colas@0
    66
     interpretation of non-operator terminals (names).
colas@0
    67
Other fields in the hash can be used for other purposes; the parse tree
colas@0
    68
generated by this parser will point to the hashes passed to this function.
colas@0
    69
colas@0
    70
Field names in the hash starting with =InfixParser_= are reserved for use
colas@0
    71
by the parser.
colas@0
    72
colas@0
    73
colas@0
    74
colas@0
    75
---++ ObjectMethod *parse* <tt>($string) -> $parseTree</tt>
colas@0
    76
Parses =$string=, calling =newLeaf= and =newNode= in the client class
colas@0
    77
as necessary to create a parse tree. Returns the result of calling =newNode=
colas@0
    78
on the root of the parse.
colas@0
    79
colas@0
    80
Throws TWiki::Infix::Error in the event of parse errors.
colas@0
    81
colas@0
    82