Blake H. Allen

    Homepage top
    Project index
    CV (pdf)

View my GitHub profile


I'm a software engineer at Yelp's San Francisco headquarters, where I work on the search data team.

Previously, I completed my Ph.D. program at the University of British Columbia’s Department of Linguistics. My academic research focuses on designing and developing software that allows computers to make informed, data-driven judgments in language-related tasks.

You can email me at

Project index

Jump to: dissertation | sublexical learner | TSL-2 phonotactics | constraint inference | allomorphy
(A list of papers and presentations in chronological order can be found in my CV.)

Dissertation: Sublexical Morphology
A computationally implemented model of inflectional morphology based on form-to-form mappings, with a learning algorithm that learns to predict probability distributions over candidates for unknown inflected forms.

Phonological CorpusTools
(Collaborators: Kathleen Currie Hall, Michael McAuliffe, Michael Fry, Scott Mackie)
Software designed to assist phonologists and other linguists in performing quantitative analyses of transcribed corpora, using either a GUI or a command-line interface.

Sublexical phonology learner
(Collaborator: Michael Becker)
An implemented learning algorithm for base-derivative morphological mappings. Uses multiple probabilistic (Maximum Entropy harmonic) grammars, and generates derivative predictions for novel words. Check out the online version here!

Learning long-distance phonotactics as tier-based strictly local languages
(Collaborator: Kevin McMullin)
We hypothesize that all phonotactics can be expressed as a conjoined set of tier-based strictly 2-local languages. To formalize and test this idea, we implement a learning algorithm for such TSL-2 languages with minimal input assumptions. We also extend the idea of TSL-2 languages into a probabilistic framework more amenable to real-world language data.

Constraint/feature inference
The traditional view that phonological constraints are stated over featurally defined natural classes of segments has been challenged on the basis of both experimental evidence and learnability issues. I investigate what alternatives might exist to natural class-based constraints, as well as how such constraints could be inferred from naturalistic data.

Learning allomorphy-based grammars
(Collaborators: Diana Archangeli and Douglas Pulleyblank)
An implementation of the allomorphy-based morpho-phonological grammar model of Archangeli & Pulleyblank, as well as an algorithm for learning these grammars. We show that this model is particularly amenable to representing "agglutinating" inflectional systems in a which a single word can contain numerous component morphemes. To the end of developing a learning algorithm for allomorphy-based grammars, we also propose an information-theoretic learnability criterion to be used as part of an objective function.

Japanese pitch accent
This project uses experimental data about Japanese pitch accent to support the concept of markedness conflation, in which markedness constraints can penalize subsets of each other. My results indicate, however, that such constraints must be evaluated using a formalism like MaxEnt Harmonic Grammar that allows cumulative ganging effects, rather than under Optimality Theoretic assumptions. These results also suggest that a difference in sonority between front unrounded vowels and back rounded vowels must be encoded in the constraints, contra typical assumptions about sonority. In a broader light, then, this paper argues that probabilistic models and constraint (feature) sets similar to those used in machine learning are also indispensable for explaining facts about natural language.
Pre-print version

English syllabification and grammar comparison
In addressing questions of how much data available in the lexicon is actually used by speakers in their grammars, researchers typically compare experimental data to lexical patterns directly. In this paper, I propose that complementary information can also be found by using statistical learning to generate grammars from both experimental data and lexical data and then comparing those grammars. The paper specifically looks at the question of whether English syllabification judgments can be predicted wholly on the basis of word-edge phonotactics in the English lexicon, using this topic to illustrate the grammar-comparison methodology.
Pre-print version

Phonotactic leak
(Collaborator: Gunnar Ó. Hansson)
Andrew Martin demonstrated that phonotactic restrictions that apply categorically at one level of morphology may also manifest as probabilistic patterns at another level of morphology. In this presentation, I show that this kind of phonotactic "leak" exhibits limitations in at least one case: that of Lyman's law (voiced obstruent coocurrence restrictions) in Japanese compounds. The differences between this pattern and the ones described by Martin provide guidance for future research on the typology of phonotactic leak.
Slides from presentation at NWLC 2013

Phonology-phonetics interface
I have also investigated the interface between phonology and articulatory phonetics.
  • A paper with Bryan Gick and Ian Stavness demonstrates using various articulatory measures and computational modeling that bracing of the tongue against other oral surfaces is a constant, a crucial aspect of speech production.
  • A paper on Yoruba vowel harmony with Douglas Pulleyblank and Oladiipo Ajiboye indicates that tongue root movement plays a crucial role in distinguishing vowels in the language, based on evidence from an ultrasound study.
  • Also using ultrasound, a presentation with Kathleen Hall and others at LabPhon 2014 provided evidence that the relative phonological contrast between segments affects the articulation of those segments.

eNunciate — web-based visual tools for linguistics and Japanese pedagogy
This project uses phonetics-based instruction and ultrasound video demonstrations to facilitate linguistics students' learning of the sounds of the International Phonetic Alphabet, as well as assisting students of Japanese in perceiving and producing numerous challenging contrasts in Japanese.
Project website


Phonological CorpusTools
A program with a graphical interface that enables phonologists to quickly and easily perform several types of quantitative analyses (functional load, predictability of distribution, acoustic similarity, etc.) on any corpus.

An implementation in Python of both the MaxEnt Grammar Tool and the UCLA Phonotactic Learner, with some differences in features.

Ultrasound data processing/analysis package [GitHub]
A set of scripts designed to automate the process of extracting and analyzing linguistic ultrasound data. Designed to work in conjunction with EdgeTrak and produce analyses using SSANOVA.

OT/HG constraint translator
A script that converts featurally defined Optimality Theory or Harmonic Grammar constraints into easily evaluable regular expressions.

(page license information)