\[ \definecolor{firebrick}{RGB}{178,34,34} \newcommand{\red}[1]{{\color{firebrick}{#1}}} \] \[ \definecolor{green}{RGB}{107,142,35} \newcommand{\green}[1]{{\color{green}{#1}}} \] \[ \definecolor{blue}{RGB}{0,0,205} \newcommand{\blue}[1]{{\color{blue}{#1}}} \] \[ \newcommand{\den}[1]{[\![#1]\!]} \] \[ \newcommand{\set}[1]{\{#1\}} \]

overview

 

  • what is probabilistic pragmatics?

 

  • example: vanilla rational speech act model

 

  • extensions & applications:
    • individual differences
    • embedded scalars

probabilistic pragmatics

what is probabilistic pragmatics?

  • definitely one thing: a piece of bad terminology
    • probabilities play a role, but they are not the main thing

 

  • blunt umbrella term for a set of approaches unified by family resemblance
    • game theoretic pragmatics, "Bayesian pragmatics"
    • modern post-Grice (Geurts, Lauer, Optimality or Relevance Theory)

 

  • gradient concept characterized by a property cluster

Franke & Jäger (2016), Probabilistic pragmatics, ZfS 35(1):3-44

key properties

  1. probabilistic
    • language users are usually uncertain about relevant contextual elements
    • probabilities are a good tool to capture uncertainty
  2. interactive
    • pragmatics is not only about readings of sentences
    • explicitly consider speaker and listener perspectives
  3. rationalistic
    • language use as goal-oriented, purposeful behavior
  4. computational
    • formally precise, implementable (likely: quantitative) formulations
  5. data-oriented
    • pragmatic theory feeds full data-generating model for experimental data

 

\(\Rightarrow\) Bayesian as a consequence of particular implementations of 1-3

levels of analysis

LoA

rational analysis

A rational analysis is an explanation of an aspect of human behavior based
on the assumption that it is optimized somehow to the structure of the 
environment. ... [T]he term does not imply any actual logical deduction in 
choosing optimal behavior, only that the behavior will be optimized. 
                                                    (Anderson 1991, p. 471)

example: reference game paradigm

  • speaker and listener observe a fixed set \(T\) of referents, e.g.:

    RSAstim

  • speaker knows which referent \(t \in T\) she wants to talk about

  • speaker can choose a message \(m\) from set \(M = \{ \text{blue}, \text{green}, \text{square}, \text{circle} \}\)

  • listener tries to recover intended referent based on message

  • communication is successful if guess matches intended referent

[signaling game!]

rational speech act model

dummy

literal listener picks literal interpretation (uniformly at random):

\[ P_{LL}(t \mid m) \propto P(t \mid [\![m]\!]) \]

dummy

Gricean speaker approximates informativity-maximization (with parameter \(\lambda\)):

\[ P_{S}(m \mid t \, ; \, \lambda) \propto \exp(\lambda \cdot \log P_{LL}(t \mid m)) \]

dummy

pragmatic listener uses Bayes' rule to infer likely world states:

\[ P_L(t \mid m \, ; \, \lambda) \propto P(t) \cdot P_S(m \mid t \, ; \, \lambda) \]

(c.f., Benz 2006, Frank & Goodman 2012)

individual variation

simple & complex reference games

 

RSAstim

Franke & Degen (2016), Reasoning in reference games, PLoS one 11(5)

complexity of reasoning

chains

individual variation in reasoning depth

formulas


typePredictions

(c.f., Camerer 2006, Franke 2011, Jäger 2014)

data

  • 60 subjects each for production & comprehension
  • 12 trials of each critical condition per subject
  • production:

    restults1

  • comprehension:

    results2

population-level modeling

  • maximum-likelihood fit of RSA model to population-level data
  • correlation \(r = 0.997\), \(p < 0.0001\)

     

restults1

data-generating model

modelgraph

results

bysubject


  • visual impression corroborated by Bayesian model comparison:
    • RSA-style Gricean-speaker only is tenable
    • RSA-style Gricean-listener only is not

upshot

 

  • how "rational" and "interactive" subjects are is to be settled by data
  • RSA-style pragmatics blends into statistical data-generating models

embedded scalars

a Quantity conflict

war

Quantity implicature: example

utterance: "I own some of Johnny Cash's albums."    
---implicates--->  I don't own them all.

dummy

godzilla

Traditionalism

reason why S didn't say "all" is because it's not true

KK

Grammaticalism

parse the sentence as:

I own O(some) of JC's albums.

where:

\[ O(x) = x \bigcap_{y \in ALT^+(x)} \neg y \]

alternative

KK

rational probabilistic reasoning about putative meaning enrichments

(joined work in progress with Leon Bergen)

nested Aristotelians

toy language

9 possible sentences

[none | some | all] of the monsters drank [none | some | all] of their water

dummy

7 possible worlds

100 0balls

010 0balls

110 0balls

011 0balls

101 0balls

001 0balls

111 0balls

semantics

##     NN NS NA SN SS SA AN AS AA
## 100  0  1  1  1  0  0  1  0  0
## 110  0  0  1  1  1  0  0  0  0
## 101  0  0  0  1  1  1  0  0  0
## 111  0  0  0  1  1  1  0  0  0
## 010  1  0  1  0  1  0  0  1  0
## 011  1  0  0  0  1  1  0  1  0
## 001  1  0  0  0  1  1  0  1  1

parses

000:       [ N | S | A]  of the ... drank   [ N | S | A]  of the ...
001:       [ N | S | A]  of the ... drank O([ N | S | A]) of the ...
010:     O([ N | S | A]) of the ... drank   [ N | S | A]  of the ...
011:     O([ N | S | A]) of the ... drank O([ N | S | A]) of the ...
100:  O(   [ N | S | A]  of the ... drank   [ N | S | A]  of the ...)
101:  O(   [ N | S | A]  of the ... drank O([ N | S | A]) of the ...)
110:  O( O([ N | S | A]) of the ... drank   [ N | S | A]  of the ...)
111:  O( O([ N | S | A]) of the ... drank O([ N | S | A]) of the ...)

lexical exhaustification

O(N) = N        O(A) = A        O(S) = "some but not all"    

sentential exhausticifation

\[ O(x) = x \bigcap_{y \in ALT^+(x)} \neg y \ \ \ , \text{if consistent} \]

where \(ALT^+(x)\) contains all sentences that are stronger than \(x\) obtainable by replacing N,S, or A for one another

possible readings

   sentence parse s100 s110 s101 s111 s010 s011 s001
       NN  p000    0    0    0    0    1    1    1
       NN  p100    0    0    0    0    1    1    0       
       
       NS  p000    1    0    0    0    0    0    0
       NS  p001    1    0    1    0    0    0    1
       NS  p101    0    0    1    0    0    0    1
       
      xNA  p000    1    1    0    0    1    0    0
      xNA  p100    0    1    0    0    1    0    0
      
       SN  p000    1    1    1    1    0    0    0
       SN  p010    0    1    1    1    0    0    0
       
       SS  p000    0    1    1    1    1    1    1
       SS  p001    0    1    0    1    1    1    0
       SS  p010    0    1    1    1    0    0    0
       SS  p011    0    1    0    1    0    1    0
       SS  p100    0    1    0    0    0    0    0
       
       SA  p000    0    0    1    1    0    1    1
       SA  p010    0    0    1    1    0    1    0
       
       AS  p000    0    0    0    0    1    1    1
       AS  p100    0    0    0    0    1    1    0
       AS  p001    0    0    0    0    1    0    0

disambiguation

dummy

godzilla

Traditionalism

  • only "parses"" 000 and 100
  • contextual selection given:
    • speaker knowledge
    • contextual relevance

KK

Grammaticalism

  • all parses possible
  • some unspecified syntactic disambiguation mechanism

a bunch of models

Rational Speech Act model

dummy

literal listener picks literal interpretation (uniformly at random):

\[ P_{LL}(t \mid m) \propto P(t \mid [\![m]\!]) \]

dummy

Gricean speaker approximates informativity-maximization (with parameter \(\lambda\)):

\[ P_{S}(m \mid t \, ; \, \lambda) \propto \exp(\lambda \cdot \log P_{LL}(t \mid m)) \]

dummy

pragmatic listener uses Bayes' rule to infer likely world states:

\[ P_L(t \mid m \, ; \, \lambda) \propto P(t) \cdot P_S(m \mid t \, ; \, \lambda) \]

(c.f., Benz 2006, Frank & Goodman 2012)

lexical uncertainty

idea: reasoning over lexicon \(\red{l}\) of the speaker

dummy

\[ P_{LL}(t \mid m) \propto P(t \mid [\![m]\!]^{\red{l}}) \]

dummy

\[ P_{S}(m \mid t \, , \red{l} \, ; \, \lambda) \propto \exp(\lambda \cdot \log P_{LL}(t \mid m)) \]

dummy

\[ P_L(t, \red{l} \mid m \, ; \, \lambda) \propto P(t) \cdot P(\red{l}) \cdot P_{S}(m \mid t \, , \red{l} \, ; \, \lambda) \]

dummy

l in {p000, p011}

(c.f., Bergen et al. to appear, Potts et al. 2015)

speaker-intended lexical narrowing

idea: speaker can choose to intend a lexical narrowing \(\red{l}\)

dummy

\[ P_{LL}(t \mid m) \propto P(t \mid [\![m]\!]^{\red{l}}) \]

dummy

\[ P_{S}(m, \red{l} \mid t \, ; \, \lambda) \propto \exp(\lambda \cdot \log P_{LL}(t \mid m)) \]

dummy

\[ P_L(t, \red{l} \mid m \, ; \, \lambda) \propto P(t) \cdot P(\red{l}) \cdot P_{S}(m, \red{l} \mid t \, ; \, \lambda) \]

dummy

l in {p000, p001, p010, p011}

speaker-intended parses

idea: speaker can choose to intend a syntactic parse \(\red{p}\)

dummy

\[ P_{LL}(t \mid m) \propto P(t \mid [\![m]\!]^{\red{p}}) \]

dummy

\[ P_{S}(m, \red{p} \mid t \, ; \, \lambda) \propto \exp(\lambda \cdot \log P_{LL}(t \mid m)) \]

dummy

\[ P_L(t, \red{p} \mid m \, ; \, \lambda) \propto P(t) \cdot P(\red{p}) \cdot P_{S}(m, \red{p} \mid t \, ; \, \lambda) \]

dummy

p in {p000, p001, p010, p011, p100, p101, p110, p111}

disambiguation by logical strength

idea: weigted parses introduce graded semantics

dummy

\[ P_S(m \mid t \, ; \, \lambda) = \sum_{p \in \mathcal{P}} \exp(\lambda \cdot w(m,p)) \cdot \delta_{t \in \den{m}^p} \]

dummy

\(w(m,p))\) is the rank of \(\den{m}^p\) in the ordering of \(\set{\den{m}^p \mid p \in \mathcal{P}}\) by logical strength

experiment

design & participants

  • 80 participants via MTurk
  • two parts:
    1. production
    2. comprehension

productionScreenshot

production

productionScreenshot

productionScreenshot

comprehension

productionScreenshot

results: production

results comprehension

      100  110  101  111  010  011  001
 NN   0.26 0.09 0.06 0.05 0.20 0.15 0.19
 NS   0.53 0.07 0.12 0.05 0.05 0.04 0.14
xNA   0.29 0.25 0.03 0.04 0.30 0.04 0.05
 SN   0.15 0.25 0.26 0.24 0.03 0.04 0.03
 SS   0.02 0.23 0.11 0.21 0.14 0.22 0.06
 SA   0.03 0.04 0.26 0.25 0.03 0.25 0.14
 AN   0.69 0.07 0.05 0.04 0.05 0.03 0.06
 AS   0.03 0.04 0.03 0.05 0.41 0.27 0.17
 AA   0.07 0.03 0.05 0.06 0.04 0.05 0.71

model comparison

model comparison by BIC


derived approximate Bayes factors

           lex_int  exh      SM       lex_unc  rsa
lex_int    1        -        -        -        -
exh        8.89     1        -        -        -
StrongMean 71       7.98     1        -        -
lex_unc    239      269      3.37     1        -
rsa        6.63e+09 7.45e+08 9.33e+07 2.76e+07 1

preferred parses

parse-choice model: \(P_L(p \mid m ; \hat{\lambda})\)

     p000  p001  p010  p011  p100  p101  p110  p111
 NN  0.131 0.131 0.131 0.131 0.119 0.119 0.119 0.119
 NS  0.105 0.158 0.105 0.158 0.105 0.132 0.105 0.132
xNA  0.133 0.133 0.133 0.133 0.117 0.117 0.117 0.117
 SN  0.136 0.136 0.121 0.121 0.121 0.121 0.121 0.121
 SS  0.159 0.139 0.118 0.121 0.085 0.139 0.118 0.121
 SA  0.134 0.134 0.122 0.122 0.122 0.122 0.122 0.122
 AN  0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125
 AS  0.151 0.106 0.151 0.106 0.136 0.106 0.136 0.106
 AA  0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125

weights in strongest meaning model:

strongest   2nd     3rd     4th
0.362       0.274   0.207   0.156

upshot

 

  • probabilistic pragmatics can complement "traditional" analyses
  • model comparison is tantamount

conclusions

conclusions

dummy

  • probabilistic pragmatics connects:
    • linguistic theory,
    • statistical analyses, and
    • experimental data
  • individuated by cluster concept:
    • probabilistic
    • interactive
    • rationalistic
    • computational
    • data-oriented

dummy