data/TWiki/TWikiInfixParserDotPm.txt,v
author Colas Nahaboo <colas@nahaboo.net>
Sat, 26 Jan 2008 15:50:53 +0100
changeset 0 414e01d06fd5
permissions -rw-r--r--
RELEASE 4.2.0 freetown
colas@0
     1
head	1.1;
colas@0
     2
access;
colas@0
     3
symbols;
colas@0
     4
locks; strict;
colas@0
     5
comment	@# @;
colas@0
     6
colas@0
     7
colas@0
     8
1.1
colas@0
     9
date	2008.01.22.03.21.23;	author TWikiContributor;	state Exp;
colas@0
    10
branches;
colas@0
    11
next	;
colas@0
    12
colas@0
    13
colas@0
    14
desc
colas@0
    15
@buildrelease
colas@0
    16
@
colas@0
    17
colas@0
    18
colas@0
    19
1.1
colas@0
    20
log
colas@0
    21
@buildrelease
colas@0
    22
@
colas@0
    23
text
colas@0
    24
@---+ Package =TWiki::Infix::Parser=
colas@0
    25
colas@0
    26
A simple stack-based parser that parses infix expressions with nonary,
colas@0
    27
unary and binary operators specified using an operator table.
colas@0
    28
colas@0
    29
Escapes are supported in strings, using backslash.
colas@0
    30
colas@0
    31
colas@0
    32
%TOC%
colas@0
    33
colas@0
    34
---++ new($client_class, \%options) -> parser object
colas@0
    35
colas@0
    36
Creates a new infix parser. Operators must be added for it to be useful.
colas@0
    37
colas@0
    38
The tokeniser matches tokens in the following order: operators,
colas@0
    39
quotes (" and '), numbers, words, brackets. If you have any overlaps (e.g.
colas@0
    40
an operator '<' and a bracket operator '<<') then the first choice
colas@0
    41
will match.
colas@0
    42
colas@0
    43
=$client_class= needs to be the _name_ of a _package_ that supports the
colas@0
    44
following two functions:
colas@0
    45
   * =newLeaf($val, $type)= - create a terminal. $type will be:
colas@0
    46
      1 if the terminal matched the =words= specification (see below).
colas@0
    47
      2 if it is a number matched the =numbers= specification (see below)
colas@0
    48
      3 if it is a quoted string
colas@0
    49
   * =newNode($op, @@params) - create a new operator node. @@params
colas@0
    50
     is a variable-length list of parameters, left to right. $op
colas@0
    51
     is a reference to the operator hash in the \@@opers list.
colas@0
    52
These functions should throw Error::Simple in the event of errors.
colas@0
    53
TWiki::Infix::Node is such a class, ripe for subclassing.
colas@0
    54
colas@0
    55
The remaining parameters are named, and specify options that affect the
colas@0
    56
behaviour of the parser:
colas@0
    57
   1 =words=>qr//= - should be an RE specifying legal words (unquoted
colas@0
    58
     terminals that are not operators i.e. names and numbers). By default
colas@0
    59
     this is =\w+=.
colas@0
    60
     It's ok if operator names match this RE; operators always have precedence
colas@0
    61
     over atoms.
colas@0
    62
   2 =numbers=>qr//= - should be an RE specifying legal numbers (unquoted
colas@0
    63
     terminals that are not operators or words). By default
colas@0
    64
     this is =qr/[+-]?(?:\d+\.\d+|\d+\.|\.\d+|\d+)(?:[eE][+-]?\d+)?/=,
colas@0
    65
     which matches integers and floating-point numbers. Number
colas@0
    66
     matching always takes precedence over word matching (i.e. "1xy" will
colas@0
    67
     be parsed as a number followed by a word. A typical usage of this option
colas@0
    68
     is when you only want to recognise integers, in which case you would set
colas@0
    69
     this to =numbers => qr/\d+/=.
colas@0
    70
colas@0
    71
colas@0
    72
---++ ObjectMethod *addOperator* <tt>(%oper)</tt>
colas@0
    73
Add an operator to the parser.
colas@0
    74
colas@0
    75
=%oper= is a hash, containing the following fields:
colas@0
    76
   * =name= - operator string
colas@0
    77
   * =prec= - operator precedence, positive non-zero integer.
colas@0
    78
     Larger number => higher precedence.
colas@0
    79
   * =arity= - set to 1 if this operator is unary, 2 for binary. Arity 0
colas@0
    80
     is legal, should you ever need it.
colas@0
    81
   * =close= - used with bracket operators. =name= should be the open
colas@0
    82
     bracket string, and =close= the close bracket. The existance of =close=
colas@0
    83
     marks this as a bracket operator.
colas@0
    84
   * =casematters== - indicates that the parser should check case in the
colas@0
    85
     operator name (i.e. treat 'AND' and 'and' as different).
colas@0
    86
     By default operators are case insensitive. *Note* that operator
colas@0
    87
     names must be caselessly unique i.e. you can't define 'AND' and 'and'
colas@0
    88
     as different operators in the same parser. Does not affect the
colas@0
    89
     interpretation of non-operator terminals (names).
colas@0
    90
Other fields in the hash can be used for other purposes; the parse tree
colas@0
    91
generated by this parser will point to the hashes passed to this function.
colas@0
    92
colas@0
    93
Field names in the hash starting with =InfixParser_= are reserved for use
colas@0
    94
by the parser.
colas@0
    95
colas@0
    96
colas@0
    97
colas@0
    98
---++ ObjectMethod *parse* <tt>($string) -> $parseTree</tt>
colas@0
    99
Parses =$string=, calling =newLeaf= and =newNode= in the client class
colas@0
   100
as necessary to create a parse tree. Returns the result of calling =newNode=
colas@0
   101
on the root of the parse.
colas@0
   102
colas@0
   103
Throws TWiki::Infix::Error in the event of parse errors.
colas@0
   104
colas@0
   105
colas@0
   106
@