proGRAM - natural language dependency parser


Contents

  1. Contents
  2. What the "proGRAM" is
  3. History
  4. Theoretical background
    1. Dependency tree
    2. Coverage of a node
    3. Gap in a Coverage
    4. dNg - measure of (non)projectivity
    5. DR-tree
    6. Ng - measure of (non)projectivity
    7. Contraction of DR-tree
    8. FODG (Free-order dependency grammar)
    9. Tree according a FODG
    10. Analysis
    11. Analysis according to FODG
    12. Restriction of an analysis
    13. Grammar with errors
    14. Measure Ne
  5. Implementation - proGRAM
    1. What the proGRAM computes
    2. Data (sentence) description language
    3. Grammar description language
    4. Running the proGRAM
    5. Output
    6. TreeView
    7. Sample data
    8. Sample grammar
    9. References
  6. Download

What the "proGRAM" is

The proGRAM is an environment for writing and testing dependency grammars and providing dependency analysis of sentences of natural language according a given grammar.

Main features of proGRAM are

History

First versions of proGRAM were created in 1994-1996 during the work on pivot implementation of grammar-checker for Czech (language with high degree of word-order freedom)(JEP PECO 2824). Next application of the formalism and proGRAM were translator from the English to the Czech, Robust parser of Czech etc.

Theoretical background

Dependency tree

Dependency tree (D-tree)... everybody knows.

Coverage of a node

Coverage of a node u of a (dependency) tree above any sentence is a set of horizontal indices (positions in the sentence) "covered" by the subtree with root u.

Gap in a Coverage

Coverage can be divided into continuous intervals of natural numbers. Any coverage is either one interval or it contains at least one gap.

dNg - measure of (non)projectivity

dNg is defined as number of gaps in a coverage of a node, for tree it is count as maximum over all nodes - measure of (non) projectivity. If dNg(T)=0 then the tree T is projective.
Picture shows an example of Czech sentence and its tree with dNg(T)=3:

DR-tree

Delete-rewrite tree. If we have a device with a tape (sentence is written on it) and two heads - deleting head and rewriting head, device provides an analysis by such a way that it places heads on two (different) words and then delete word under deleting head and rewite (maybe keep it unchanged) word under rewriting head. If it ends with a tape containg just one symbol and it is one of starting symbols, then sentence was right. DR-tree maps process of deleting and rewriting.

Ng - measure of (non)projectivity

Ng is the same measure as dNg is, but it is for DR-trees instead of dependency trees.

Contraction of DR-tree

Contraction of vertical edges in DR-tree is a way how to transform DR-tree into a dependency tree.

One D-tree can be the contraction of two or more different DR-trees.


For every DR-tree T holds dNg(kon(T)) <= Ng(T).

FODG (Free-order dependency grammar)

FODG is a grammar describing a work of two-head automaton above the tape where sentence is written.
Rules of FODG looks like CFG rules in Chomski normal form but (during the reduction) symbols on right-hand-side of the rule need not to be neighbours in the sentence. Subscript L or R says which of symbols will be (during reduction process) rewrited (Left or Right), other symbol will be deleted.

Tree according a FODG

...is a DR-tree where every reduction can be described by some rule of given FODG.

Analysis

Analysis is a function assigning a set of trees (maybe empty) to any string (sentence). D-analysis assigns a set of dependency trees, DR-analysis assigns a set of DR-trees.

Analysis according to FODG

...is analysis where all assigned trees are trees according to given grammar.

Restriction of an analysis

Restriction of an analysis is also analysis, all assigned sets of trees have to fulfil given condition. Eg. we can define restriction of analysis giving only projective trees.

Grammar with errors

We can easily say that some rules of grammar are "bad" - "error rules". With it we can recognise three results of parsing:
  1. Sentence got at least one tree without using error rules - correct sentence
  2. Sentence got any tree, but only using at least one error rule - incorrect sentence
  3. Sentence got no tree - unparsable sentence, we can say nothing about it!

Measure Ne

Ne(T) is a count of occurrences of error rules used inside the tree T (according to some FODG). Ne(T) is the Measure of robustness.

Implementation

The entire tool consists of the following parts: data (sentence) specification language, grammar definition language and configurable analyser (and tree viewer, if you don't like text files).

What the proGRAM computes

The analyser provides the computation of all the DR-trees and D-trees for a given sentence according to the given grammar fulfilling the limitations of the Ng, dNg and Ne measures. The output is represented as a list of DR-trees and their shapes and a list of D-trees and their shapes. These lists are interpreted by the TreeView program.

Data (sentence) description language

This language is used for the specification of the input data - morphologically and lexically processed sentences of natural language. Words can have only simple attributes except for one complex attribute for a valency frame.
The language also allows for an efficient way of the notation of morphological and lexical ambiguity - branching.

The following example shows the (simplified) specification of the Czech word "kvetiny" (flowers) in this language.

      květiny
      LEXF: kvetina
      WCL: noun
      SYNTCL: noun
      GENDER: fem
      ?
         NUM: pl
         CASE: ? voc , acc , nom !
      ,
         NUM: sg
         CASE: gen
      !
      END
This example illustrates the usage of the branching on the level of the whole sets of attributes (branching after the GENDER attribute) and also branching on the level of values (values of the CASE attribute). The entire specification corresponds to four unambiguous definitions. The question mark always denotes the place where the branching begins, commas separate the alternatives and the exclamation mark denotes the end of branching. The branchings can be nested; the resulting set of the descriptions is the set of all possible combinations (Cartesian product).

Grammar description language

This language serves for the definition of grammars with errors. Meta-rules of a grammar are described as sequences of commands checking the conditions of the applicability of the meta-rule and forming the instance of the meta-rule for the given right-hand side symbols. The variables used in the meta-rule are: X for the left-hand side symbol of the rule, A,B for the first and second right-hand side symbol of the rule and P as a temporary variable for an element of the frameset.
The head of the meta-rule can contain attributes declaring the limitation of the meta-rule to the projective use only (PROJECTIVE), the limitation of non-projectivity of the meta-rule to the closest dominant symbol (CLOSEST), or negative-only meta-rules (NEGATIVE).

Commands forming the meta-rules are as follows:

The following example presents the simplified definition of the meta-rule for adjoining the adjective as the left congruent attribute to the noun:

          METARULE LeftCongruentAttribute
             A.SYNTCL=adj
             B.SYNTCL=noun
             A.GENDER ? B.GENDER GenderDisagreement
             A.CASE   ? B.CASE   CaseDisagreement
             A.NUMBER ? B.NUMBER NumberDisagreement
             X:=B
             OK
          END_METARULE
If the interpretation of the meta-rule succeeds and all soft-conditions were met then we get the rule of the positive grammar. If any of the soft conditions is not true then the interpretation of the meta-rule continues but the resulting rule will be the negative rule of the extended grammar with the appropriate error code.

The set of attributes and the set of values are not fixed, the author of the grammar can use her/his own attributes and values. The dictionaries of attributes and values are created during the loading of the grammar and the input sentence.

Running the proGRAM

Running the proGRAM you can simply run the batch _.bat giving it the name of file containing the sentence and the options, e.g.:
       _ sentence.dat Ng:0 Ne:0 op:23RE
Called without parameters the parser writes the list of its options:
Params:
    Input:
       mx: 0..9 max number of words [unlimited]
    Parsing:
       pr: -/+/R parser [-]
       io: t/./-/d output of new item [t]
    Analysis:
       Ng:0..9 restriction by Ng  [unlimited]
      dNg:0..9 restriction by dNg [unlimited]
       Ne:0..9 restriction by Ne  [unlimited]
       an:
             n restriction by  CLOSEST rules [off]
             p restrikce by PROJECTIVE rules [off]
    Output: op:
       s: statistics
       A: names of attributes -> ATRIBS.TAB
       E: names of errors     -> ERRORS.TAB
       V: names of values     -> VALUES.TAB
       R: names of rules      -> RULES.TAB
       C: translated grammar  -> CODE.BIN
       1: DR-trees  -> TREES.OUT
       2: DR-shapes -> TREES.OUT
       3: De-shapes -> TREES.OUT
       P: parsing => FS.OUT, POOL.OUT, UPOOL.OUT, DPOOL.OUT

Output

Output of the analyser looks like this:
----------------------------------------------------------------------
 Step.Delphi (C) Tomas Holan  1993, 2001      3.5.2001 9:15:50 .585
----------------------------------------------------------------------
Params: > Ng:0 Ne:0 op:23 <
parser = ChartParser
allowed Ng <= 0
allowed dNg <= 1000
allowed errors <= 0
new item = full print
new item = duplicity
fail of rule = empty
Max.length = unlimited
Grammar: extended
Sizes of tables:
   Symbols:   1000
   Items:  60000
   Complete items:   1000
   Duplicite items:  65000
Grammar: rules.dat
Rules = 8/1165
stará:1..11 žena:12..19 zalévala:20..27 květiny:28..31 v:32..33 rohu:34..37 své:38..47 zahrady:48..51 tečka:52..52
53/19     4:19 [1(0)]    1..2    B(A)  Coverage: XX.......
54/23     19:21 [5(0)]    2..3    C(B)  Coverage: .XX......
55/53     19:25 [5(0)]    2..3    C(B)  Coverage: .XX......
56/24     20:29 [4(0)]    3..4    C(D)  Coverage: ..XX.....
57/25     21:29 [4(0)]    3..4    C(D)  Coverage: ..XX.....
58/54     22:29 [4(0)]    3..4    C(D)  Coverage: ..XX.....
59/53     23:29 [4(0)]    3..4    C(D)  Coverage: ..XX.....
60/55     32:34 [3(0)]    5..6    F(E)  Coverage: ....XX...
61/49     40:49 [2(0)]    7..8    H(G)  Coverage: ......XX.
62/50     41:50 [2(0)]    7..8    H(G)  Coverage: ......XX.
63/51     47:51 [2(0)]    7..8    H(G)  Coverage: ......XX.
64/23     53:21 [5(0)]    1..3    C(B(A))  Coverage: XXX......
65/53     53:25 [5(0)]    1..3    C(B(A))  Coverage: XXX......
66/53     54:29 [4(0)]    2..4    C(BD)  Coverage: .XXX.....
67/53     19:57 [5(0)]    2..4    C(BD)  Coverage: .XXX.....
67/53     53:57 [5(0)]    1..4    C(B(A)D)  Coverage: XXXX.....
68/56     28:60 [6(0)]    4..6    D(F(E))  Coverage: ...XXX...
69/57     29:60 [6(0)]    4..6    D(F(E))  Coverage: ...XXX...
70/58     30:60 [6(0)]    4..6    D(F(E))  Coverage: ...XXX...
71/59     31:60 [6(0)]    4..6    D(F(E))  Coverage: ...XXX...
72/24     56:60 [6(0)]    3..6    C(DF(E))  Coverage: ..XXXX...
73/25     57:60 [6(0)]    3..6    C(DF(E))  Coverage: ..XXXX...
74/54     58:60 [6(0)]    3..6    C(DF(E))  Coverage: ..XXXX...
75/53     59:60 [6(0)]    3..6    C(DF(E))  Coverage: ..XXXX...
76/60     34:63 [7(0)]    6..8    F(H(G))  Coverage: .....XXX.
77/61     35:63 [7(0)]    6..8    F(H(G))  Coverage: .....XXX.
78/62     36:63 [7(0)]    6..8    F(H(G))  Coverage: .....XXX.
79/63     37:63 [7(0)]    6..8    F(H(G))  Coverage: .....XXX.
80/53     64:29 [4(0)]    1..4    C(B(A)D)  Coverage: XXXX.....
80/53     66:60 [6(0)]    2..6    C(BDF(E))  Coverage: .XXXXX...
81/53     67:60 [6(0)]    1..6    C(B(A)DF(E))  Coverage: XXXXXX...
82/24     20:69 [4(0)]    3..6    C(D(F(E)))  Coverage: ..XXXX...
82/25     21:69 [4(0)]    3..6    C(D(F(E)))  Coverage: ..XXXX...
82/54     22:69 [4(0)]    3..6    C(D(F(E)))  Coverage: ..XXXX...
82/53     23:69 [4(0)]    3..6    C(D(F(E)))  Coverage: ..XXXX...
82/53     54:69 [4(0)]    2..6    C(BD(F(E)))  Coverage: .XXXXX...
82/53     64:69 [4(0)]    1..6    C(B(A)D(F(E)))  Coverage: XXXXXX...
82/53     19:73 [5(0)]    2..6    C(BDF(E))  Coverage: .XXXXX...
82/53     53:73 [5(0)]    1..6    C(B(A)DF(E))  Coverage: XXXXXX...
82/64     32:76 [3(0)]    5..8    F(EH(G))  Coverage: ....XXXX.
83/56     28:82 [6(0)]    4..8    D(F(EH(G)))  Coverage: ...XXXXX.
84/57     29:82 [6(0)]    4..8    D(F(EH(G)))  Coverage: ...XXXXX.
85/58     30:82 [6(0)]    4..8    D(F(EH(G)))  Coverage: ...XXXXX.
86/59     31:82 [6(0)]    4..8    D(F(EH(G)))  Coverage: ...XXXXX.
87/24     56:82 [6(0)]    3..8    C(DF(EH(G)))  Coverage: ..XXXXXX.
88/25     57:82 [6(0)]    3..8    C(DF(EH(G)))  Coverage: ..XXXXXX.
89/54     58:82 [6(0)]    3..8    C(DF(EH(G)))  Coverage: ..XXXXXX.
90/53     59:82 [6(0)]    3..8    C(DF(EH(G)))  Coverage: ..XXXXXX.
91/53     66:82 [6(0)]    2..8    C(BDF(EH(G)))  Coverage: .XXXXXXX.
92/53     67:82 [6(0)]    1..8    C(B(A)DF(EH(G)))  Coverage: XXXXXXXX.
93/24     20:84 [4(0)]    3..8    C(D(F(EH(G))))  Coverage: ..XXXXXX.
93/25     21:84 [4(0)]    3..8    C(D(F(EH(G))))  Coverage: ..XXXXXX.
93/54     22:84 [4(0)]    3..8    C(D(F(EH(G))))  Coverage: ..XXXXXX.
93/53     23:84 [4(0)]    3..8    C(D(F(EH(G))))  Coverage: ..XXXXXX.
93/53     54:84 [4(0)]    2..8    C(BD(F(EH(G))))  Coverage: .XXXXXXX.
93/53     64:84 [4(0)]    1..8    C(B(A)D(F(EH(G))))  Coverage: XXXXXXXX.
93/65     87:52 [8(0)]    3..9    C(DF(EH(G))I)  Coverage: ..XXXXXXX
94/53     19:88 [5(0)]    2..8    C(BDF(EH(G)))  Coverage: .XXXXXXX.
94/66     88:52 [8(0)]    3..9    C(DF(EH(G))I)  Coverage: ..XXXXXXX
95/53     53:88 [5(0)]    1..8    C(B(A)DF(EH(G)))  Coverage: XXXXXXXX.
95/67     89:52 [8(0)]    3..9    C(DF(EH(G))I)  Coverage: ..XXXXXXX
96/68     90:52 [8(0)]    3..9    C(DF(EH(G))I)  Coverage: ..XXXXXXX
97/68     91:52 [8(0)]    2..9    C(BDF(EH(G))I)  Coverage: .XXXXXXXX
98/68     92:52 [8(0)]    1..9    C(B(A)DF(EH(G))I)  Coverage: XXXXXXXXX
      0.22s
FS..................... 68
Items ................. 97
Complete items ........ 1
Duplicite items ....... 18

DR-trees (5)
  0: U1(-16(64(53(4,19),21),84(29,82(32,76(34,63(47,51))))),52)
  1: U1(-18(53(4,19),-12(21,84(29,82(32,76(34,63(47,51)))))),52)
  2: U1(-18(53(4,19),88(57(21,29),82(32,76(34,63(47,51))))),52)
  3: U1(92(-2(64(53(4,19),21),29),82(32,76(34,63(47,51)))),52)
  4: U1(92(67(53(4,19),57(21,29)),82(32,76(34,63(47,51)))),52)

DR-Shapes (5)
  0: C:8(C:4(C:5(B:1(A,B),C),D:6(D,F:3(E,F:7(F,H:2(G,H))))),I)
  1: C:8(C:5(B:1(A,B),C:4(C,D:6(D,F:3(E,F:7(F,H:2(G,H)))))),I)
  2: C:8(C:5(B:1(A,B),C:6(C:4(C,D),F:3(E,F:7(F,H:2(G,H))))),I)
  3: C:8(C:6(C:4(C:5(B:1(A,B),C),D),F:3(E,F:7(F,H:2(G,H)))),I)
  4: C:8(C:6(C:5(B:1(A,B),C:4(C,D)),F:3(E,F:7(F,H:2(G,H)))),I)

De-Shapes (2)
  0: 0 C:0(B:5(A:1),D:4(F:6(E:3,H:7(G:2))),I:8)
  1: 0 C:0(B:5(A:1),D:4,F:6(E:3,H:7(G:2)),I:8)
      0.22s
(C) Tomas Holan  1993, 2001

TreeView

Program TreeView interprets the file TREES.OUT created by the analyser:

Sample data

Here is a sample of file containing the sentence (Czech sentence) "Stará žena zalévala květiny v rohu své zahrady" ("(the)Old woman watered (the)flowers in (the)corner of its garden."):
        stará 
   ?
      lexf:   stara2
      wcl:    adj
      syntcl: adj
      minx:   2
      adj_type: ord
      deg:      pos
      nomform:  no
      neg:      no
      ?
	 gender:   neut
	 num:      pl
	 du:       no
	 case:  ? acc , nom !
      ,
	 gender:  fem
	 num:     sg
	 du:      no
	 case:  ? voc , nom !
      !
      cycle:	yes
   ,
   lexf:   starat
   wcl:    vb
   syntcl: v
   v_cl:   full
   refl:   se
   aspect: impf
   FRAMESET: ? ( [  ACTANT:  act  CASE:  nom  PREP:  0  ]
                 [  ACTANT:  pat  CASE:  acc  PREP:  0  ] )
                 cycle: no
             , ( [  ACTANT:  act  CASE:  nom  PREP:  0  ]
                 [  ACTANT:  pat  CASE:  clause  PREP:  aby ] )
                 cycle: no
             , ( [  ACTANT:  pat  CASE:  acc  PREP:  0  ] )
                 cycle: no
             , ( [  ACTANT:  pat  CASE:  clause  PREP:  aby ] )
                 cycle: no
             , ( )
                 cycle: yes
             !

   mode:   ind
   neg:    no
   v_form: fin

   pers:   3
   num:    sg
   tense:  pres

   ,

   lexf:   stary2
   wcl:    noun
   syntcl: noun
   minx:   1
   tant:   0
   refl:   0
   gender: fem
   num:    sg
   du:     no
   case: ? voc , nom !
   cycle:  yes
   !

   END

   žena
   ?
      LEXF:    hna2t
      WCL:     vb
      SYNTCL:  v
      FRAMESET: ? ( [  ACTANT:  act  CASE:  nom  PREP:  0  ]
                    [  ACTANT:  pat  CASE:  acc  PREP:  0  ]
                    [  ACTANT:  oadv ] )
                cycle: no
                , ( [  ACTANT:  pat  CASE:  acc  PREP:  0  ] )
                cycle: no
                , ( [  ACTANT:  pat  CASE:  acc  PREP:  0  ]
                    [  ACTANT:  oadv ] )
                cycle: no
                , ( [  ACTANT:  act  CASE:  nom  PREP:  0  ] )
                cycle: no
                , ( [  ACTANT:  act  CASE:  nom  PREP:  0  ]
                    [  ACTANT:  oadv ] )
                cycle: no
                , ( [  ACTANT:  oadv ] )
                cycle: no
                , ( )
                cycle: yes
                !
      NEG:  no
      PERS:  tecka
      NUM:  sg
      RTENSE:  con
   ,
      LEXF:  z3ena
      WCL:   noun
      SYNTCL:  noun
      TANT:  0
      REFL:  0

      GENDER:  fem
      NUM:     sg
      DU:      no
      CASE:    nom

      depprn: yes
      depnum: yes
      RIGHTGEN: yes
      cycle: yes

   !
      END


      zalévala
      LEXF:    zale2vat
      WCL:     vb
      SYNTCL:  v
      FRAMESET:  ? ( [  ACTANT:  act  CASE:  nom  PREP:  0  ]
                     [  ACTANT:  pat  CASE:  acc  PREP:  0  ] )
                 cycle: no
                 , ( [  ACTANT:  pat  CASE:  acc  PREP:  0  ] )
                 cycle: no
                 , ( [  ACTANT:  act  CASE:  nom  PREP:  0  ] )
                 cycle: no
                 , ( )
                 cycle: yes
                 !

      ?
      GENDER: neut
      NUM: pl
      ,
      GENDER: fem
      NUM: sg
      !


      END

      květiny
      LEXF: kve3tina
      WCL: noun
      SYNTCL: noun
      TANT: 0
      GENDER: fem
      ?
         NUM: pl
         DU: no
         CASE: ? voc , acc , nom !
      ,
         NUM: sg
         DU: no
         CASE: gen
      !

      depprn: yes
      depnum: yes
      RIGHTGEN: yes
      CYCLE: yes

      END


      v
      LEXF: v
      WCL: prep
      SYNTCL: prep

      CASE: ? loc , acc !

      CYCLE: yes

      END

      rohu
      LEXF: roh
      WCL: noun
      SYNTCL: noun

      TANT: 0

      GENDER: inan
      NUM: sg
      DU: no
      CASE: ? loc , voc , dat , gen !

      depprn: yes
      depnum: yes
      RIGHTGEN: yes
      CYCLE: yes

      END

      své
      LEXF: svuj
      WCL: prn
      SYNTCL: adj

      ?
      GENDER: inan
      DU: pl
      CASE: ? acc , nom !
      ,
      GENDER: fem
      NUM: pl
      DU: no
      CASE: ? acc , nom !
      ,
      GENDER: anim
      NUM: pl
      DU: no
      CASE: nom
      ,
      GENDER: neut
      NUM: sg
      DU: no
      CASE: ? acc , nom !
      ,
      GENDER: fem
      NUM: sg
      DU: no
      CASE: ? loc , dat , gen !
      !

      CYCLE: yes

      END

      zahrady
      LEXF: zahrada
      WCL: noun
      SYNTCL: noun

      TANT: 0

      GENDER: fem
      ?
      NUM: pl
      DU: no
      CASE: ? voc , acc , nom !
      ,
      NUM: sg
      DU: no
      CASE: gen
      !

      depprn: yes
      depnum: yes
      RIGHTGEN: yes
      CYCLE: yes

      END

      tečka
      LEXF:   period
      WCL:    int
      SYNTCL: int

      END

Sample grammar

Very simple grammar is contained in the download package. It contain simple rules for

   

References

References

Download

download