Blake H. Allen

    Homepage top
    Recent/upcoming
    Project index
    Tools
    CV (pdf)

View my GitHub profile

Hello!

I'm a software engineer at Yelp's San Francisco headquarters, where I work on their data and search team.

I'm just about to wrap up my Ph.D. program at the University of British Columbia’s Department of Linguistics. My research focuses on designing and developing software that allows computers to make informed, data-driven judgments in language-related tasks. You can email me at b.allen@alumni.ubc.ca.



Upcoming and recent papers/presentations

  • Paper (in review) with Michael Becker: "Learning alternations from surface forms with sublexical phonology".

  • Paper (accepted by the Journal of Speech, Language, and Hearing Research) with Bryan Gick and Ian Stavness: "Speaking tongues are actively braced".

  • Workshop on Computational Methods for Descriptive and Theoretical Morphology, International Morphology Meeting (Mid-February, 2016 in Vienna): a workshop presentation titled "Sublexical Morphology: learning and generalizing probabilistic inflectional morphology".

  • International Morphology Meeting (Mid-February, 2016 in Vienna): a paper presentation titled "Multiple bases and empirical priors in paradigm inference: experimental evidence from Icelandic". Collaborator: Gunnar Ó. Hansson.

  • Annual Meeting on Phonology 2015 (October 10, 2015 in Vancouver): a poster presentation titled "Justified naivety: limits on constraint conjunction in inflectional morphology".

  • Workshop on Modeling Variability in Speech, Institute for Natural Language Processing (October 2, 2015 in Stuttgart): a paper presentation titled "Calculating functional load with pronunciation variants". (Presentation by Kathleen Currie Hall.)


  • Project index


    Jump to: dissertation | sublexical learner | TSL-2 phonotactics | constraint inference | allomorphy
    (A list of papers and presentations in chronological order can be found in my CV.)

    Dissertation: Sublexical Morphology
    A computationally implemented model of inflectional morphology based on form-to-form mappings, with a learning algorithm that learns to predict probability distributions over candidates for unknown inflected forms.

    Phonological CorpusTools
    (Collaborators: Kathleen Currie Hall, Michael McAuliffe, Michael Fry, Scott Mackie)
    Software designed to assist phonologists and other linguists in performing quantitative analyses of transcribed corpora, using either a GUI or a command-line interface.

    Sublexical phonology learner
    (Collaborator: Michael Becker)
    An implemented learning algorithm for base-derivative morphological mappings. Uses multiple probabilistic (Maximum Entropy harmonic) grammars, and generates derivative predictions for novel words. Check out the online version here!

    Learning long-distance phonotactics as tier-based strictly local languages
    (Collaborator: Kevin McMullin)
    We hypothesize that all phonotactics can be expressed as a conjoined set of tier-based strictly 2-local languages. To formalize and test this idea, we implement a learning algorithm for such TSL-2 languages with minimal input assumptions. We also extend the idea of TSL-2 languages into a probabilistic framework more amenable to real-world language data.

    Constraint/feature inference
    The traditional view that phonological constraints are stated over featurally defined natural classes of segments has been challenged on the basis of both experimental evidence and learnability issues. I investigate what alternatives might exist to natural class-based constraints, as well as how such constraints could be inferred from naturalistic data.

    Learning allomorphy-based grammars
    (Collaborators: Diana Archangeli and Douglas Pulleyblank)
    An implementation of the allomorphy-based morpho-phonological grammar model of Archangeli & Pulleyblank, as well as an algorithm for learning these grammars. We show that this model is particularly amenable to representing "agglutinating" inflectional systems in a which a single word can contain numerous component morphemes. To the end of developing a learning algorithm for allomorphy-based grammars, we also propose an information-theoretic learnability criterion to be used as part of an objective function.

    Japanese pitch accent
    This project uses experimental data about Japanese pitch accent to support the concept of markedness conflation, in which markedness constraints can penalize subsets of each other. My results indicate, however, that such constraints must be evaluated using a formalism like MaxEnt Harmonic Grammar that allows cumulative ganging effects, rather than under Optimality Theoretic assumptions. These results also suggest that a difference in sonority between front unrounded vowels and back rounded vowels must be encoded in the constraints, contra typical assumptions about sonority. In a broader light, then, this paper argues that probabilistic models and constraint (feature) sets similar to those used in machine learning are also indispensable for explaining facts about natural language.
    Pre-print version

    English syllabification and grammar comparison
    In addressing questions of how much data available in the lexicon is actually used by speakers in their grammars, researchers typically compare experimental data to lexical patterns directly. In this paper, I propose that complementary information can also be found by using statistical learning to generate grammars from both experimental data and lexical data and then comparing those grammars. The paper specifically looks at the question of whether English syllabification judgments can be predicted wholly on the basis of word-edge phonotactics in the English lexicon, using this topic to illustrate the grammar-comparison methodology.
    Pre-print version

    Phonotactic leak
    (Collaborator: Gunnar Ó. Hansson)
    Andrew Martin demonstrated that phonotactic restrictions that apply categorically at one level of morphology may also manifest as probabilistic patterns at another level of morphology. In this presentation, I show that this kind of phonotactic "leak" exhibits limitations in at least one case: that of Lyman's law (voiced obstruent coocurrence restrictions) in Japanese compounds. The differences between this pattern and the ones described by Martin provide guidance for future research on the typology of phonotactic leak.
    Slides from presentation at NWLC 2013

    Phonology-phonetics interface
    I have also investigated the interface between phonology and articulatory phonetics.
    • A paper with Bryan Gick and Ian Stavness demonstrates using various articulatory measures and computational modeling that bracing of the tongue against other oral surfaces is a constant, a crucial aspect of speech production.
    • A paper on Yoruba vowel harmony with Douglas Pulleyblank and Oladiipo Ajiboye indicates that tongue root movement plays a crucial role in distinguishing vowels in the language, based on evidence from an ultrasound study.
    • Also using ultrasound, a presentation with Kathleen Hall and others at LabPhon 2014 provided evidence that the relative phonological contrast between segments affects the articulation of those segments.

    eNunciate — web-based visual tools for linguistics and Japanese pedagogy
    This project uses phonetics-based instruction and ultrasound video demonstrations to facilitate linguistics students' learning of the sounds of the International Phonetic Alphabet, as well as assisting students of Japanese in perceiving and producing numerous challenging contrasts in Japanese.
    Project website


    Tools


    Phonological CorpusTools
    A program with a graphical interface that enables phonologists to quickly and easily perform several types of quantitative analyses (functional load, predictability of distribution, acoustic similarity, etc.) on any corpus.

    PhoMEnt
    An implementation in Python of both the MaxEnt Grammar Tool and the UCLA Phonotactic Learner, with some differences in features.

    Ultrasound data processing/analysis package [GitHub]
    A set of scripts designed to automate the process of extracting and analyzing linguistic ultrasound data. Designed to work in conjunction with EdgeTrak and produce analyses using SSANOVA.

    OT/HG constraint translator
    A script that converts featurally defined Optimality Theory or Harmonic Grammar constraints into easily evaluable regular expressions.

    (page license information)