top of page

Computational Linguistics Final Project

  • Writer: Morgan Lim
    Morgan Lim
  • Mar 19, 2021
  • 3 min read

Updated: Apr 2, 2021

The goal of the final project for this course was to create a project that would relate and expand upon the various topics covered throughout the quarter. I created a FSTA to check if trees adhere to subjacency and principle A. I chose this topic because I found FSTAs challenging and fun during this course and have always found syntax to be the same way. This project turned out to be a lot more challenging than I expected it to be, but I learned a lot. I had a lot of trouble implementing a tree grammar that was more complicated than what we did in assignment 8 and had an even harder time making sure the two trees were compatible for intersecting. At first, my grammars were extremely long and repetitive, but I managed to redesign them so that I could use as few transitions as possible. I also difficulty understanding how an intersection of two trees would work and had to spend a lot of time just sitting and wrapping my head around it.


The code I imported from the course was the code from Assignment 8. I used generates and under in order to check that my tree grammars were working the way they should. I also needed to define my Tree and Automaton type, which I took from TreeGrammars.hs.


Subjacency

fsta_sub is the tree grammar to enforce subjacency in a tree. This means that TPs, NPs, and CPs are bounding nodes, and wh-words cannot cross them more than once. CPs do not count if the specifier is empty. Bounding nodes are marked with a “**” in the tree, “spec t” marks an empty CP specifier, and “t” is where the wh-word starts movement. I wanted to focus on just the wh-word movement, so the tense/verb movement is not shown in the trees. Instead, I combined them into one category for the C head.

To test that fsta_sub worked, I created test cases with grammatical and ungrammatical sentences. These are the Haskell commands I used to test:


*Final> generates fsta_sub t1
True
*Final> generates fsta_sub t2
False
*Final> generates fsta_sub t3
True
*Final> generates fsta_sub t4
False

Principle A

fsta_pa enforces principle A, which states that an anaphor must be bound in its binding domain and cannot be the subject of the phrase heading the domain. Binding domains are marked with “**”. Principle A is enforced by ensuring that anaphors (APf or APm) are c-commanded by an antecedent (NPf or NPm) and only allowing the bound state (B) to be a final state (in addition to just plainWords).


Test cases:
*Final> generates fsta_pa t5
True
*Final> generates fsta_pa t6
False
*Final> generates fsta_pa t7
False


I originally had a symbol for every bounding node and every phrase heading the binding domain, which not only created a lot of transitions, it also made it impossible to intersect fsta_sub and fsta_pa. I realized that I could combine many of the transitions and then understood how useful it was to use “**” instead, which I didn’t entirely understand before. To make it possible to create an intersection of the grammars, I split my strings up into several categories that could be marked as a state that was relevant to what the grammar was enforcing, and if it was not it could be marked as a plain word.


Intersection of Tree Grammars

intersect takes two tree grammars and returns a grammar which accepts trees that both given grammars will accept. This function is done very similarly to the FSA intersection from the assignment 4 handout. The new states are tuples made up of the states from both given grammars such that each state from one grammar is paired with every state from the other. Then tuples of these tuples were also generated. The transition candidates were every possible pair from the previously generated tuples, every string, and the new states. They were then filtered out by checking that each element of the possible transitions had a corresponding string and new state that matched the tuple it is paired with.


Additionally, since there are no initial states, I realized that the leaf nodes of the trees would act as these and were not included in finding the transitions. These leaf nodes were found the same way the new transitions were found, but with an empty set instead of a pair of tuples.


To test intersect, I made two small grammars so that I could manually check that all the proper states and transitions were being returned. I then created trees that would be accepted by the intersection of fsta_sub and fsta_pa . I tested these trees in addition to the ones from testing the grammars alone.

Test cases:


*Final> intersect fsta_ex1 fsta_ex2
([(A,D),(A,E),(F,D),(F,E)],
["*","**","a","b"],
[(A,D)],
[([(A,D),(F,E)],"*",(F,E)),
([(A,E),(F,D)],"*",(F,D)),
([(F,D),(A,E)],"*",(A,E)),
([(F,E),(A,D)],"*",(A,D)),
([],"a",(A,D)),
([],"b",(F,E))])
*Final> generates (intersect fsta_sub fsta_pa) t8
True
*Final> generates (intersect fsta_sub fsta_pa) t9
True
*Final> generates (intersect fsta_sub fsta_pa) t1
True
*Final> generates (intersect fsta_sub fsta_pa) t5
True
*Final> generates (intersect fsta_sub fsta_pa) t6
False
*Final> generates (intersect fsta_sub fsta_pa) t2
False

 
 
 

Comments


bottom of page