Semantic Domains
 A semantic field corresponds to words grouped by their meaning
 it consists of words from some domain
 e.g. in English the field "Rain" may include words like "rain", "drizzle", "downpour", "raindrop", "puddle", all of these words can be used to talk about rain
Words within one domain are related by lexical relations
 there are two kinds of lexical relations:
 term lexical relations: relations between words, typically about words that collocate (frequently used together: e.g. words "bird" and "fly")
 paradigm form: synonyms, antonyms, etc. E.g. words "big" and "large"
Forming lexical relations:
 we learn lexical relations to speak correctly
 each of us has a big mental dictionary organized into a big network of lexical relations
 there are clusters in this network: semantic field
Semantic Domains:
The Theory of Semantic Fields
Lexicon (words of a natural language) is structured into Semantic Fields
 Semantic Fields: "clusters" of semantically related terms
 relations among concepts that belong to the same Semantic Field are very dense
 and concept from different Semantic Field are typically unrelated
 Theory of Semantic Fields: it claims that words are structured into a set of semantic fields
Structural Semantics models relations between words, like in WordNet
 The Theory of Semantic Fields goes further by introducing an additional aggregation level
 Semantic Fields form higherlevel abstractions
 relations between them are much more stable than between words
 even if word senses change over time, the field still stays the same
 so, Semantic Fields are usually consistent across languages, cultures and time
 there's a strong connection between Semantic Fields of different Languages: such connections don't exist among the terms themselves
Limitation of this theory:
 it doesn't provide any objective criteria for identifying Semantic Fields in the language
 solution to this problem: Semantic Domains, it's a computational approach for finding Semantic Fields
 use the lexical coherence assumption: words from the same field should cooccur in texts
Semantic Domains
Semantic Domains are Semantic Fields that are characterized by set of domain words which often occur in texts about corresponding domain
 words belonging to the same lexical field are called "domain words"
 usually large potion of the language terminology is characterized by domain words
Lexical Coherence Assumption
Basic hypothesis:
 a great percentage of the concepts expressed in the same text belongs to the same domain
 it's a basic property of any natural language: domainspecific words cooccur with each other in the same text, this property is called "lexical coherence"
 There are common areas of human knowledge such as Economics, Politics, Law, Science, etc. All these areas demonstrate lexical coherence
So, what about Semantic Fields?
 Semantic Fields are lexically coherent: words in one SF tend to cooccur in texts
 We call these fields "Semantic Domains": they are Semantic Fields characterized by lexically coherent words
Lexical coherence assumption:
 We assume that realworld documents are lexically coherent
 this guarantees the existence of Semantic Domains
 it's also proven by experiments: in real texts if you count the percentage of words that belong to the same domain, you'll see that the most belong to one domain
There are 3 types of words
 TextRelated Domain words: words that have at least one sense that contributes to determining the domain of the whore text
 e.g. word "bank" in a text about economy
 TextUnrelated Domain words: words that are from some nongeneric domain, but don't contribute to the domain of the text
 e.g. word "church" in a text about economy
 TextUnrelated Generic words: don't bring any relevant domain information
Let's put the lexical coherence assumption more formally:
 "One domain per discourse" ($\approx$ text, document) assumption
 if a word is used in one sense in some discourse
 then other occurrences of this word should also have the same sense
 smart way of putting it: "multiple occurrences of a word in coherent portions of texts tend to share the same domain"
The lexical coherence assumption allows us to represent Semantic Domains by the set of domainspecific texts
Role of Semantic Domains
Characterizing word senses (i.e. lexical concepts)
 typically by assigning domain labels to words in a lexicon
 e.g. Crane has senses in Zoology and Construction
 WordNet Domains  extension of WordNet that adds the information about domain
Characterizing texts
 can use Semantic Domains for text categorization
 at the textual level, semantic domains are clusters of texts on similar topics
 so can see Semantic Domains as a collection of domainspecific texts
practical points of view: Semantic Domains are lists of related terms that describe a particular subject/area
Representation
Domain Sets
 domain relations: two words are domainrelated if they belong to the same domain
 domain set is used to describe semantic classes of texts
 semantic classes of strongly related lexical concepts are domain concepts
 so a domain set should relate each word to one or more domain sets
Requirements of an "ideal" domain set:
 completeness: all possible texts should be assigned to at least one domain
 balancement: number of texts belonging to each domain should be uniform
 separability: the same text/concept can't be assigned to more than one domain
Usually not achievable:
 it's quite difficult to define a complete domain set, general enough to represent all possible aspects of human knowledge
 and it's also not possible to collect a corpus that contains all the human knowledge
 a certain degree of overlapping is unavoidable (e.g. math/physics)
Domain Model
We can easily obtain termbased representation of documents e.g. by using Vector Space Models
 but VSMs have lexical ambiguity problem
 domain terms are typically highly correlated within texts: they tend to cooccur inside the same types of text
 this is justified by the lexical coherence property of natural languages (Leacock96)
Domain model is a computational model for Semantic Domains to represent domain information
 it describes relations at the term level
 it does that by defining a set of term clusters (see also Term Clustering)
 each cluster represent a semantic domain: set of terms that often cooccur in texts with similar topics
 it's a way to represent domain information at the textual level
Domain Model:
 is a matrix that describes the degree of association between terms in the vocabulary and Semantic Domains
 rows are indexed by words
 columns are the corresponding domains
Domain Model is a shallow model for lexical semantics, but it capture ambiguity and variability
DM is represented by a $n \times k$ rectangular matrix $D$
 $D$ contains the domain relevance for each term w.r.t each domain
E.g.

Medicine 
CS

HIV 
1 
0

AIDS 
1 
0

virus 
0.5 
0.5

laptop 
0 
1

Formally,
 let $\mathcal D = \{ D_1 , \ ... \ , D_k \}$ be a set of domains
 and we have $n$ words $V = \{ w_1, \ ... \ , w_n \}$ ($n$  vocabulary size)
 then $D$ is a $n \times k$ matrix, where $D_{iz}$ is domain relevance of term $w_i$ w.r.t. domain $D_z$
 let $R(D_z, o)$ denote domain relevance of domain $D_z$ w.r.t. some linguistic object $o$ (text, term, concept)
 it gives a measure of association between $D_z$ and $o$
 typically higher values indicate higher association and often the value ranges from 0 to 1
DMs can describe ambiguity and variability:
 ambiguity: by associating one term to several domains
 variability: by associating different terms to the same domain
A domain Model defines a Domain Space
Obtaining Domain Models
Obtaining Domain Models
 Domain Models can be obtained from unsupervised learning or manual annotation
 can use WordNet Domain
 or by performing Term Clustering
domain relations among terms can be detected by analyzing cooccurrence in the corpus
 motivated by the lexical coherence assumption
 cooccurring terms have a good chance to show domain relations
WordNet Based Domain Model
WordNet Domains is an extension of WordNet:
 each synset here is annotated with one or more domain labels
 it has ~ 200 domain labels
Using WordNet Domain for building a domain model:
 if $\mathcal D = \{ D_1 , \ ... \ , D_k \}$ are domains of the word net domains
 and $\mathcal C = \{ c_1 , \ ... \ , c_s \}$ are concepts (synsets) from WordNet
 then let $\text{senses}(w)$ be a set of all synsets that contain $w$: $\text{senses}(w) = \{c \mid c \in \mathcal C, \text{$c$ is a sense of $w$} \}$
 let $R_s: \mathcal D \times \mathcal C \to \mathbb R$ be a domain relevance function for concepts
 $\text{dom}(c)$ is a domain assignment function, $\text{dom}(c) \subseteq \mathcal D$: returns a set of domains associated with a synset $c$
 $R_s(D, c) = \begin{cases}
1 / \text{dom}(c) & \text{ if } D \in \text{dom}(c) \\
1 / k & \text{ if } \text{dom}(c) \equiv \{ \text{Factotum} \} \\
0 & \text{ otherwise } \\
\end{cases}$
 Factotum = generic concept for all nondomain words
 $k$  cardinality of $\mathcal D$
 $R_s(D, c) \approx$ estimated prior probability of the domain given the concept
This is for synsets, not words
 now let $V = \{ w_1 , \ ... \ , w_n \}$ the vocabulary
 then domain relevance of a word is a function $R: \mathcal D \times V \to \mathbb R$
 define $R$ as $$R(D_z, w_i) = \cfrac{1}{ \text{senses}(w_i) } \sum\limits_{c \in \text{senses}(w_i)} R(D_z, c)$$
 so it's average relevance of all $w_i$'s senses
 if $w$ has only one sense, then $R(D_z, w) = R_s(D_z, c)$
 a word with several senses ("polysemous") will be less relevant than a word with few senses
 words with just one sense are ("monosemic")  they will be the most relevant: they provide more information about the domain
This is consistent with the phenomenon that less frequent words are more informative: because they have fewer senses
The domain model $D$ is defined as $D_{ij} = R(D_j, w_i)$
Limitations:
 $\mathcal D$ is fixed because WordNet Domains is fixed
 WordNet Domains is limited: not complete
 and lexicon in WordNet Domains is also limited
CorpusBased Acquisition of Domain Models
We want automatically extract domain models from corpus:
 to avoid subjectivity
 to find more flexible models
Term Clustering techniques are usually used for this
 usually need soft clustering techniques for this: want one term to be in several clusters
 there are several ways:
 Fuzzy CMeans, Information bottleneck method, etc
 we'll use Latent Semantic Analysis
LSA is done by projecting TermVSM and TextVSM to a common LSA space using some linear transformations
 firstorder (shallow) relations between terms: their cooccurrence in texts
 it takes into account both secondorder relations: their semantics, established by cooccurrence
DO SVD:
 $T = W \Sigma P^T$
 $W$ (for Words) are orthogonal eigenvectors of $T T^T$: word vectors
 $P$ (for Passages) are orthogonal eigenvectors of $T^T T$: document vectors
 Truncated SVD: use $\Sigma_k$: first $k$ singular values and the rest set to 0
 $T_k = W \Sigma_k P^T \approx T$ the best approximation
Now let's define the domain matrix
 $D = I^{\text{N}} W \sqrt{\Sigma}$
 $I^{\text{N}}$ is a diagonal matrix s.t. $I^{\text{N}}_{ii} = \cfrac{1}{\ w_i \}$
 $w_i$ is $i$th column of $W \sqrt{\Sigma}$  principal components ($W \sqrt{\Sigma}$ are loadings for words)
Domain Space
Domain Models define the Domain Space
Once a DM is determined, we can define a Domain Space
 it's a geometric space where terms and documents can be represented as vectors
 it's a Vector Space Model
There are some problems of VSMs:
 TextVSM can't deal with lexical ambiguity and variability
 e.g.: "he's affected by AIDS" and "HIV is a virus" don't have any words in common
 so in the TextVSM the similarity is 0: these vectors are orthogonal even though the concepts are related
 on the other hand, similarity between "the laptop has a virus" and "HIV is a virus" is not 0: due to the ambiguity of "virus"
Term VSM:
 feature sparseness
 if we want to model domain relations, we're mostly interested in domainspecific words
 such words are quite infrequent compared to nondomain words, so vectors for these words are very sparse, esp in large corpus
 so similarity between domain words would tend to 0
 and the results overall will not be very meaningful and interesting
Domain Spaces ftw
so a Domain Space is a clusterbased representation for estimating term and text meaning
 it's a vector space where both terms and texts can be compared
 once a domain space is defined by a matrix $D$, can represent both terms and texts by domain vectors
 domain vectors  vectors that represent relevance among linguistic objects and each domain
Domain space is
 it's an instance of Generalized Vector Space Model
 for text $t_i$ in the Text VSM
 $t_i' = t_i (I^{\text{idf}} D)$ (TODO: why left multiplication? )
 where $I^{\text{idf}}$ is a diagonal matrix s.t. $I^{\text{idf}}_{ii} = \text{idf}(w_i)$  it's inverse document frequency of word $w_i$ (see TFIDF)
 so we define a mapping function and thus have a generalized VSM
In the domain space the vector representation of terms and documents is "augmented" by domain relations represented by the domain model
Geometrically:
Domain Space allows to reduce the impact of ambiguity and variability:
 by introducing nonsparse space
So advantages of DS:
 lower dimensionality
 sparseness is avoided
 duality: allows direct and uniform similarity between texts and terms
Domain Kernel
Domain Kernel is a similarity function for terms and documents in the domain space. Domain Kernel is a Mercer Kernel, so it can be used in any kernelbased algorithm.
This kernel is represented by a DOmain Model matrix $D$
 $K : \mathbb R^n \cup V \to \mathbb R^k$
 maps texts $t \in \mathbb R^n$ and terms $w \in V$ into Domain Space: $t' \in \mathbb R^k$ and $w' \in \mathbb R^k$
$K$ is defined as
 $K(w) = w_i'$ if $w = w_i \in V$
 $K(w) = \cfrac{\sum_{t \in T} \text{tf}(w, t) \cdot t'} {\ \sum_{t \in T} \text{tf}(w, t) \cdot t' \}$ if $w \not \in V$
 $K(t) = t (I^{text{idf}} D) = t'$ for documents
 $\text{tf}(w, t)$ is a term frequency of $w$ in text $t$
 $I^{\text{idf}}$ is a diagonal matrix with IDFs: $I^{\text{idf}}_{ii} = \cfrac{1}{ \{ t \in T \mid \text{tf}(w_i, t) > 0\} }$
Can compute the similarity using cosine
$K$ is defined for any term and text
 $K$ is a mercer kernel by construction: it's a dot product, but unlike many other kernels, it reduces the dimensionality instead of increasing it
Usage
 after that we can use Domain Models for many NLP task
 can use domain model to estimate topic similarity
Domain Kernels can be used for any instancebased algorithm in many NLP applications:
References
 Leacock, Claudia, et al. "Towards building contextual representations of word senses using statistical models." (1996). [1]
Sources