The Linguistics Department Colloquium Series 2008-2009

 

Jason Riggle

University of Chicago

 Thursday, November 13th 2008
4:30 PM in Wells Hall A-607

 

"Complexity, Learnability, & Constraint-Based Grammars"

In this talk, I present results on the learnability of phonological grammars for two constraint-based models, Harmonic Grammar (HG; Legendre, Miyata, and Smolensky 1990) and Optimality Theory (OT; Prince and Smolensky 1993). I first establish that grammars in these models are learnable from reasonably sized samples of data and then present a learning algorithm for OT that is guaranteed to make no more than k log2 k mistakes when learning grammars with k constraints. I demonstrate that this mistake bound is within a logarithmic factor of the best possible mistake bound for any OT/HG learning algorithm. The proposed learning algorithm calculates the number of rankings that are consistent with a set of data.1 This makes possible a simple and effective Bayesian heuristic to guide learning – all else equal, choose candidates that are preferred by the highest number of rankings consistent with previous observations. This general strategy can be applied to OT, HG, or any parameterized model of grammar, and it associates with each language generated by the theory an abstract quantity, the p-volume, that measures the fraction of the
parameter space corresponding to grammars that generate that language.

The p-volume seems to encode 'restrictiveness' in a way similar to Tesar and Prince's (1999) r-measure. Preliminary investigations indicate that p-volume is significantly correlated with typological frequency (cf. Bane and Riggle 2008). This fact is neatly explained if language learners use a strategy that is sometimes called a Gibbs leaner wherein they keep track of the region of the parameter space consistent with previous observations but make guesses according to a single hypothesis grammar randomly selected from
that region. Upon making an error the Gibbs leaner updates the parameter region and randomly selects a new hypothesis grammar from that region. Following this strategy, learners will be predisposed towards grammars with large p-volume in cases where the hypotheses are underdetermined by the data. Moreover, priors other than the 'flat' distribution over rankings can be included to implement models of ranking bias.

One of the primary assets of this strategy is that it allows linguistic theory to be informed by the relative frequencies of patterns in linguistic typologies rather than only by the boolean distinction of whether or not a pattern is attested. Though some of the frequency asymmetries surely come from non-linguistic historical accidents, a model of learning that is able to account for some of the frequency variance is clearly of interest and makes a range of predictions that can be tested in experimental settings


 

 

 

Return to Colloquia Page || LSO Home || MSU Home

 

....