We present CoSimRank, a graphtheoretic similarity measure that is efficient because it can compute a single node similarity without having to compute the similarities of the entire graph. We present equivalent formalizations that show CoSimRank’s close relationship to Personalized PageRank and SimRank and also show how we can take advantage of fast matrix multiplication algorithms to compute CoSimRank. Another advantage of CoSimRank is that it can be flexibly extended from basic nodenode similarity to several other graphtheoretic similarity measures. In an experimental evaluation on the tasks of synonym extraction and bilingual lexicon extraction, CoSimRank is faster or more accurate than previous approaches.
Graphtheoretic algorithms have been successfully applied to many problems in NLP [23]. These algorithms are often based on PageRank [2] and other centrality measures (e.g., [7]). An alternative for tasks involving similarity is SimRank [15]. SimRank is based on the simple intuition that nodes in a graph should be considered as similar to the extent that their neighbors are similar. Unfortunately, SimRank has time complexity $\mathcal{O}\left({n}^{3}\right)$ (where $n$ is the number of nodes in the graph) and therefore does not scale to the large graphs that are typical of NLP.
This paper introduces CoSimRank,^{1}^{1}Code available at code.google.com/p/cistern a new graphtheoretic algorithm for computing node similarity that combines features of SimRank and PageRank. Our key observation is that to compute the similarity of two nodes, we need not consider all other nodes in the graph as SimRank does; instead, CoSimRank starts random walks from the two nodes and computes their similarity at each time step. This offers large savings in computation time if we only need the similarities of a small subset of all ${n}^{2}$ node similarities.
These two cases – computing a few similarities and computing many similarities – correspond to two different representations we can compute CoSimRank on: a vector representation, which is fast for only a few similarities, and a matrix representation, which can take advantage of fast matrix multiplication algorithms.
CoSimRank can be used to compute many variations of basic node similarity – including similarity for graphs with weighted and typed edges and similarity for sets of nodes. Thus, CoSimRank has the added advantage of being a flexible tool for different types of applications.
The extension of CoSimRank to similarity across graphs is important for the application of bilingual lexicon extraction: given a set of correspondences between nodes in two graphs $A$ and $B$ (corresponding to two different languages), a pair of nodes $\left(a\in A,b\in B\right)$ is a good candidate for a translation pair if their node similarity is high. In an experimental evaluation, we show that CoSimRank is more efficient and more accurate than both SimRank and PageRankbased algorithms.
This paper is structured as follows. Section 2 discusses related work. Section 3 introduces CoSimRank. In Section 4, we compare CoSimRank and SimRank. By providing some useful extensions, we demonstrate the great flexibility of CoSimRank (Section 5). We perform an experimental evaluation of CoSimRank in Section 6. Section 7 summarizes the paper.
Our work is unsupervised. We therefore do not review graphbased methods that make extensive use of supervised learning (e.g., de Melo and Weikum (2012)).
Since the original version of SimRank [15] has complexity $\mathcal{O}\left({n}^{4}\right)$, many extensions have been proposed to speed up its calculation. A Monte Carlo algorithm, which is scalable to the whole web, was suggested by Fogaras and Rácz (2005). However, in an evaluation of this algorithm we found that it does not give competitive results (see Section 6). A matrix representation of SimRank called SimFusion [33] improves the computational complexity from $\mathcal{O}\left({n}^{4}\right)$ to $\mathcal{O}\left({n}^{3}\right)$. Lizorkin et al. (2010) also reduce complexity to $\mathcal{O}\left({n}^{3}\right)$ by selecting essential node pairs and using partial sums. They also give a useful overview of SimRank, SimFusion and the Monte Carlo methods of Fogaras and Rácz (2005). A noniterative computation for SimRank was introduced by Li et al. (2010). This is especially useful for dynamic graphs. However, all of these methods have to run SimRank on the entire graph and are not efficient enough for very large graphs. We are interested in applications that only need a fraction of all $\mathcal{O}\left({n}^{2}\right)$ pairwise similarities. The algorithm we propose below is an order of magnitude faster in such applications because it is based on a local formulation of the similarity measure.^{2}^{2}A reviewer suggests that CoSimRank is an efficient version of SimRank in a way analogous to SALSA’s [19] relationship to HITS [16] in that different aspects of similarity are decoupled.
Apart from SimRank, many other similarity measures have been proposed. Leicht et al. (2006) introduce a similarity measure that is also based on the idea that nodes are similar when their neighbors are, but that is designed for bipartite graphs. However, most graphs in NLP are not bipartite and Jeh and Widom (2002) also proposed a SimRank variant for bipartite graphs.
Another important similarity measure is cosine similarity of Personalized PageRank (PPR) vectors. We will refer to this measure as PPR+cos. Hughes and Ramage (2007) find that PPR+cos has high correlation with human similarity judgments on WordNetbased graphs. Agirre et al. (2009) use PPR+cos for WordNet and for crosslingual studies. Like CoSimRank, PPR+cos is efficient when computing single node pair similarities; we therefore use it as one of our baselines below. This method is also used by Chang et al. (2013) for semantic relatedness. They also experimented with Euclidean distance and KLdivergence. Interestingly, a simpler method performed best when comparing with human similarity judgments. In this method only the entries corresponding to the compared nodes are used for a similarity score. Rao et al. (2008) compared PPR+cos to other graph based similarity measures like shortestpath and boundedlength random walks. PPR+cos performed best except for a new similarity measure based on commute time. We do not compare against this new measure as it uses the graph Laplacian and so cannot be computed for a single node pair.
One reason CoSimRank is efficient is that we need only compute a few iterations of the random walk. This is often true of this type of algorithm; cf. [32].
LexRank [7] is similar to PPR+cos in that it combines PageRank and cosine; it initializes the sentence similarity matrix of a document using cosine and then applies PageRank to compute lexical centrality. Despite this superficial relatedness, applications like lexicon extraction that look for similar entities and applications that look for central entities are quite different.
In addition to faster versions of SimRank, there has also been work on extensions of SimRank. Dorow et al. (2009) and Laws et al. (2010) extend SimRank to edge weights, edge labels and multiple graphs. We use their MultiEdge Extraction (MEE) algorithm as one of our baselines below. A similar graph of dependency structures was built by Minkov and Cohen (2008). They applied different similarity measures, e.g., cosine of dependency vectors or a new algorithm called pathconstrained graph walk, on synonym extraction [26]. We compare CoSimRank with their results in our experiments (see Section 6).
Some other applications of SimRank or other graph based similarity measures in NLP include work on document similarity [21], the transfer of sentiment information between languages [31] and named entity disambiguation [9]. Hoang and Kan (2010) use SimRank for related work summarization. Muthukrishnan et al. (2010) combine link based similarity and content based similarity for document clustering and classification.
These approaches use at least one of cosine similarity, PageRank and SimRank. CoSimRank can either be interpreted as an efficient version of SimRank or as a version of Personalized PageRank for similarity measurement. The novelty is that we compute similarity for vectors that are induced using a new algorithm, so that the similarity measurement is much more efficient when an application only needs a fraction of all $\mathcal{O}\left({n}^{2}\right)$ pairwise similarities.
We first first give an intuitive introduction of CoSimRank as a Personalized PageRank (PPR) derivative. Later on, we will give a matrix formulation to compare CoSimRank with SimRank.
Haveliwala (2002) introduced Personalized PageRank – or topicsensitive PageRank – based on the idea that the uniform damping vector ${p}^{\left(0\right)}$ can be replaced by a personalized vector, which depends on node $i$. We usually set ${p}^{\left(0\right)}\left(i\right)={e}_{i}$, with ${e}_{i}$ being a vector of the standard basis, i.e., the $i{}^{\text{th}}$ entry is 1 and all other entries are 0. The PPR vector of node $i$ is given by:
$${p}^{\left(k\right)}\left(i\right)=dA{p}^{\left(k1\right)}\left(i\right)+\left(1d\right){p}^{\left(0\right)}\left(i\right)$$  (1) 
where $A$ is the stochastic matrix of the Markov chain, i.e., the row normalized adjacency matrix. The damping factor $d\in \left(0,1\right)$ ensures that the computation converges. The PPR vector after $k$ iterations is given by ${p}^{\left(k\right)}$.
To visualize this formula, one can imagine a random surfer starting at node $i$ and following one of the links with probability $d$ or jumping back to the starting node $i$ with probability $\left(1d\right)$. Entry $i$ of the converged PPR vector represents the probability that the random surfer is on node $i$ after an unlimited number of steps.
To simulate the behavior of SimRank we will simplify this equation and set the damping factor $d=1$. We will readd a damping factor later in the calculation.
$${p}^{\left(k\right)}=A{p}^{\left(k1\right)}$$  (2) 
Note that the personalization vector ${p}^{\left(0\right)}$ was eliminated, but is still present as the starting vector of the iteration.
Let $p\left(i\right)$ be the PPR vector of node $i$. The cosine of two vectors $u$ and $v$ is computed by dividing the inner product $\u27e8u,v\u27e9$ by the lengths of the vectors. The cosine of two PPR vectors can be used as a similarity measure for the corresponding nodes [12, 1]:
$$s\left(i,j\right)=\frac{\u27e8p\left(i\right),p\left(j\right)\u27e9}{\leftp\left(i\right)\right\leftp\left(j\right)\right}$$  (3) 
This measure $s\left(i,j\right)$ looks at the probability that a random walker is on a certain edge after an unlimited number of steps. This is potentially problematic as the example in Figure 1 shows. The PPR vectors of suit and dress will have some weight on tailor, which is good. However, the PPR vector of law will also have a nonzero weight for tailor. So law and dress are similar because of the node tailor. This is undesirable.
We can prevent this type of spurious similarity by taking into account the path the random surfer took to get to a particular node. We formalize this by defining CoSimRank $s\left(i,j\right)$ as follows:
$$s\left(i,j\right)=\sum _{k=0}^{\mathrm{\infty}}{c}^{k}\u27e8{p}^{\left(k\right)}\left(i\right),{p}^{\left(k\right)}\left(j\right)\u27e9$$  (4) 
where ${p}^{\left(k\right)}\left(i\right)$ is the PPR vector of node $i$ from Eq. 2 after $k$ iterations. We compare the PPR vectors at each time step $k$. The sum of all similarities is the value of CoSimRank, i.e., the final similarity. We add a damping factor $c$, so that early meetings are more valuable than later meetings.
To compute the similarity of two vectors $u$ and $v$ we use the inner product $\u27e8\cdot ,\cdot \u27e9$ in Eq. 4 for two reasons:
This is similar to cosine similarity except that the 1norm is used instead of the 2norm. Since our vectors are probability vectors, we have
$$\frac{\u27e8p\left(i\right),p\left(j\right)\u27e9}{\leftp\left(i\right)\right\leftp\left(j\right)\right}=\u27e8p\left(i\right),p\left(j\right)\u27e9$$ 
for the 1norm.^{3}^{3}This type of similarity measure has also been used and investigated by Ó Séaghdha and Copestake (2008), Cha (2007), Jebara et al. (2004) (probability product kernel) and [13] (Fisher kernel) among others.
Without expensive normalization, we can give a simple matrix formalization of CoSimRank and compute it efficiently using fast matrix multiplication algorithms.
Later on, the following iterative computation of CoSimRank will prove useful:
$${s}^{\left(k\right)}\left(i,j\right)={c}^{k}\u27e8{p}^{\left(k\right)}\left(i\right),{p}^{\left(k\right)}\left(j\right)\u27e9+{s}^{\left(k1\right)}\left(i,j\right)$$  (5) 
The matrix formulation of CoSimRank is:
${S}^{\left(0\right)}$  $=E$  
${S}^{\left(1\right)}$  $=cA{A}^{T}+{S}^{\left(0\right)}$  
${S}^{\left(2\right)}$  $={c}^{2}{A}^{2}{\left({A}^{T}\right)}^{2}+{S}^{\left(1\right)}$  
$\mathrm{\dots}$  
${S}^{\left(k\right)}$  $={c}^{k}{A}^{k}{\left({A}^{T}\right)}^{k}+{S}^{\left(k1\right)}$  (6) 
We will see in Section 5 that this formulation is the basis for a very efficient version of CoSimRank.
As the PPR vectors have only positive values, we can easily see in Eq. 4 that the CoSimRank of one node pair is monotonically nondecreasing. For the dot product of two vectors, the CauchySchwarz inequality gives the upper bound:
$$\u27e8u,v\u27e9\le \parallel u\parallel \parallel v\parallel $$ 
where $\parallel x\parallel $ is the norm of $x$. From Eq. 2 we get ${\parallel {p}^{\left(k\right)}\parallel}_{1}=1$, where $\parallel \cdot {\parallel}_{1}$ is the 1norm. We also know from elementary functional analysis that the 1norm is the biggest of all pnorms and so one has $\parallel {p}^{\left(k\right)}\parallel \le 1$. It follows that CoSimRank grows more slowly than a geometric series and converges if $$:
$$s\left(i,j\right)\le \sum _{k=0}^{\mathrm{\infty}}{c}^{k}=\frac{1}{1c}$$ 
If an upper bound of $1$ is desired for $s\left(i,j\right)$ (instead of $1/\left(1c\right)$), then we can use ${s}^{\prime}$:
$${s}^{\prime}\left(i,j\right)=\left(1c\right)s\left(i,j\right)$$ 
The original SimRank equation can be written as follows [15]:
$r\left(i,j\right)=\{\begin{array}{cc}1,\hfill & \text{if}i=j\hfill \\ {\displaystyle \frac{c}{\leftN\left(i\right)\right\leftN\left(j\right)\right}}{\displaystyle \sum _{{\scriptscriptstyle \begin{array}{c}k\in N\left(i\right)\\ l\in N\left(j\right)\end{array}}}}r\left(k,l\right),\hfill & \text{else}\hfill \end{array}$ 
where $N\left(i\right)$ denotes the nodes connected to $i$. SimRank is computed iteratively. With $A$ being the normalized adjacency matrix we can write SimRank in matrix formulation:
${R}^{\left(0\right)}$  $=E$  
${R}^{\left(k\right)}$  $=max\left\{cA{R}^{\left(k1\right)}{A}^{T},{R}^{\left(0\right)}\right\}$  (7) 
where the maximum of two matrices refers to the elementwise maximum. We will now prove by induction that the matrix formulation of CoSimRank (Eq. 6) is equivalent to:
$${S}^{\mathrm{\prime}\left(k\right)}=cA{S}^{\mathrm{\prime}\left(k1\right)}{A}^{T}+{S}^{\left(0\right)}$$  (8) 
and thus very similar to SimRank (Eq. 7).
The base case ${S}^{\left(1\right)}={S}^{\mathrm{\prime}\left(1\right)}$ is trivial. Inductive step:
${S}^{\mathrm{\prime}\left(k\right)}\stackrel{\left(\text{8}\right)}{=}cA{S}^{\mathrm{\prime}\left(k1\right)}{A}^{T}+{S}^{\left(0\right)}$  
$=cA\left({c}^{k1}{A}^{k1}{\left({A}^{T}\right)}^{k1}+{S}^{\left(k2\right)}\right){A}^{T}+{S}^{\left(0\right)}$  
$={c}^{k}{A}^{k}{\left({A}^{T}\right)}^{k}+cA{S}^{\left(k2\right)}{A}^{T}+{S}^{\left(0\right)}$  
$={c}^{k}{A}^{k}{\left({A}^{T}\right)}^{k}+{S}^{\left(k1\right)}\stackrel{\left(\text{6}\right)}{=}{S}^{\left(k\right)}$  \qed 
Comparing Eqs. 7 and 8, we see that SimRank and CoSimRank are very similar except that they initialize the similarities on the diagonal differently. Whereas SimRank sets each of these entries back to one at each iteration, CoSimRank adds one. Thus, when computing the two similarity measures iteratively, the diagonal element $\left(i,i\right)$ will be set to 1 by both methods for those initial iterations for which this entry is 0 for $cA{S}^{\left(k1\right)}{A}^{T}$ (i.e., before applying either max or add). The methods diverge when the entry is $\ne 0$ for the first time.
Complexity of computing all ${n}^{\mathrm{2}}$ similarities. The matrix formulas of both SimRank (Eq. 7) and CoSimRank (Eq. 8) have time complexity $\mathcal{O}\left({n}^{3}\right)$ or – if we want to take the higher efficiency of computation for sparse graphs into account – $\mathcal{O}\left(d{n}^{2}\right)$ where $n$ is the number of nodes and $d$ the average degree. Space complexity is $\mathcal{O}\left({n}^{2}\right)$ for both algorithms.
Complexity of computing ${k}^{\mathrm{2}}\mathrm{\ll}{n}^{\mathrm{2}}$ similarities. In most cases, we only want to compute ${k}^{2}$ similarities for $k$ nodes. For CoSimRank, we compute the $k$ PPR vectors in $\mathcal{O}\left(kdn\right)$ (Eq. 2) and compute the ${k}^{2}$ similarities in $\mathcal{O}\left({k}^{2}n\right)$ (Eq. 5). If $$, then the time complexity of CoSimRank is $\mathcal{O}\left({k}^{2}n\right)$. If we only compute a single similarity, then the complexity is $\mathcal{O}\left(dn\right)$. In contrast, the complexity of SimRank is the same as in the allsimilarities case: $\mathcal{O}\left(d{n}^{2}\right)$. It is not obvious how to design a lowercomplexity version of SimRank for this case. Thus, we have reduced SimRank’s cubic time complexity to a quadratic time complexity for CoSimRank or – assuming that the average degree $d$ does not depend on $n$ – SimRank’s quadratic time complexity to linear time complexity for the case of computing few similarities.
Space complexity for computing ${k}^{2}$ similarities is $\mathcal{O}\left(kn\right)$ since we need only store $k$ vectors, not the complete similarity matrix. This complexity can be exploited even for the all similarities application: If the matrix formulation cannot be used because the $\mathcal{O}\left({n}^{2}\right)$ similarity matrix is too big for available memory, then we can compute all similarities in batches – and if desired in parallel – whose size is chosen such that the vectors of each batch still fit in memory.
In summary, CoSimRank and SimRank have similar space and time complexities for computing all ${n}^{2}$ similarities. For the more typical case that we only want to compute a fraction of all similarities, we have recast the global SimRank formulation as a local CoSimRank formulation. As a result, time and space complexities are much improved. In Section 6, we will show that this is also true in practice.
We will show now that the basic CoSimRank algorithm can be extended in a number of ways and is thus a flexible tool for different NLP applications.
The use of weighted edges was first proposed in the PageRank patent. It is straightforward and easy to implement by replacing the row normalized adjacency matrix $A$ with an arbitrary stochastic matrix $P$. We can use this edge weighted PageRank for CoSimRank.
We often want to compute the similarity of nodes in two different graphs with a known nodenode correspondence; this is the scenario we are faced with in the lexicon extraction task (see Section 6). A variant of SimRank for this task was presented by Dorow et al. (2009). We will now present an equivalent method for CoSimRank. We denote the number of nodes in the two graphs $U$ and $V$ by $\leftU\right$ and $\leftV\right$, respectively. We compute PPR vectors $p\in {\mathbb{R}}^{\leftU\right}$ and $q\in {\mathbb{R}}^{\leftV\right}$ for each graph. Let ${S}^{\left(0\right)}\in {\mathbb{R}}^{\leftU\right\times \leftV\right}$ be the known nodenode correspondences. The analog of CoSimRank (Eq. 4) for two graphs is then:
$$s\left(i,j\right)=\sum _{k=0}^{\mathrm{\infty}}{c}^{k}\sum _{\left(u,v\right)\in {S}^{\left(0\right)}}{p}_{u}^{\left(k\right)}\left(i\right){q}_{v}^{\left(k\right)}\left(j\right)$$  (9) 
The matrix formulation (cf. Eq. 6) is:
$${S}^{\left(k\right)}={c}^{k}{A}^{k}{S}^{\left(0\right)}{\left({B}^{T}\right)}^{k}+{S}^{\left(k1\right)}$$  (10) 
where $A$ and $B$ are rownormalized adjacency matrices. We can interpret ${S}^{\left(0\right)}$ as a change of basis. A similar approach for word embeddings was published by Mikolov et al. (2013). They call ${S}^{\left(0\right)}$ the translation matrix.
To be able to directly compare to prior work in our experiments, we also present a method to integrate a set of typed edges $\mathcal{T}$ in the CoSimRank calculation. For this we will compute a similarity matrix for each edge type $\tau $ and merge them into one matrix for the next iteration:
$${S}^{\left(k\right)}=\left(\frac{c}{\left\mathcal{T}\right}\sum _{\tau \in \mathcal{T}}{A}_{\tau}{S}^{\left(k1\right)}{B}_{\tau}^{T}\right)+{S}^{\left(0\right)}$$  (11) 
This formula is identical to the random surfer model where two surfers only meet iff they are on the same node and used the same edge type to get there. A more strict claim would be to use the same edge type at any time of their journey:
${S}^{\left(k\right)}=$  $\frac{{c}^{k}}{{\left\mathcal{T}\right}^{k}}}{\displaystyle \sum _{\tau \in {\mathcal{T}}^{k}}}\left({\displaystyle \prod _{i=1}^{k}}{A}_{{\tau}_{i}}\right){S}^{\left(0\right)}\left({\displaystyle \prod _{i=0}^{k1}}{B}_{{\tau}_{ki}}^{T}\right)$  
$+{S}^{\left(k1\right)}$  (12) 
We will not use Eq. 5.3 due to its space complexity.
CoSimRank can also be used to compute the similarity $s\left(V,W\right)$ of two sets $V$ and $W$ of nodes, e.g., short text snippets. We are not including this method in our experiments, but we will give the equation here, as traditional document similarity measures (e.g., cosine similarity) perform poorly on this task although there also are known alternatives with good results [30]. For a set $V$, the initial PPR vector is given by:
${p}_{i}^{\left(0\right)}\left(V\right)=\{\begin{array}{cc}{\displaystyle \frac{1}{\leftV\right}},\hfill & \text{if}i\in V\hfill \\ 0,\hfill & \text{else}\hfill \end{array}$ 
We then reuse Eq. 4 to compute $s\left(V,W\right)$:
$$s\left(V,W\right)=\sum _{k=0}^{\mathrm{\infty}}{c}^{k}\u27e8{p}^{\left(k\right)}\left(V\right),{p}^{\left(k\right)}\left(W\right)\u27e9$$ 
In summary, modifications proposed for SimRank (weighted and typed edges, similarity across graphs) as well as modifications proposed for PageRank (sets of nodes) can also be applied to CoSimRank. This makes CoSimRank a very flexible similarity measure.
We will test the first three extensions experimentally in the next section and leave similarity of node sets for future work.
We evaluate CoSimRank for the tasks of synonym extraction and bilingual lexicon extraction. We use the basic version of CoSimRank (Eq. 4) for synonym extraction and the twograph version (Eq. 9) for lexicon extraction, both with weighted edges. Our motivation for this application is that two words that are synonyms of each other should have similar lexical neighbors and that two words that are translations of each other should have neighbors that correspond to each other; thus, in each case the nodes should be similar in the graphtheoretic sense and CoSimRank should be able to identify this similarity.
We use the English and German graphs published by Laws et al. (2010), including edge weighting and normalization. Nodes are nouns, adjectives and verbs occurring in Wikipedia. There are three types of edges, corresponding to three types of syntactic configurations extracted from the parsed Wikipedias: adjectivenoun, verbobject and nounnoun coordination. Table 1 gives examples and number of nodes and edges.
Edge types  
 
Graph statistics  

We propose CoSimRank as an efficient algorithm for computing the similarity of nodes in a graph. Consequently, we compare against the two main methods for this task in NLP: SimRank and extensions of PageRank.
We also compare against the MEE (MultiEdge Extraction) variant of SimRank [6], which handles labeled edges more efficiently than SimRank:
${S}^{\mathrm{\prime}\left(k\right)}$  $={\displaystyle \frac{c}{\left\mathcal{T}\right}}{\displaystyle \sum _{\tau \in \mathcal{T}}}{A}_{\tau}{S}^{\left(k1\right)}{B}_{\tau}^{T}$  
${S}^{\left(k\right)}$  $=max\left\{{S}^{\mathrm{\prime}\left(k\right)},{S}^{\left(0\right)}\right\}$ 
where ${A}_{\tau}$ is the rownormalized adjacency matrix for edge type $\tau $ (see edge types in Table 1).
Apart from SimRank, extensions of PageRank are the main methods for computing the similarity of nodes in graphs in NLP (e.g., Hughes and Ramage (2007), Agirre et al. (2009) and other papers discussed in related work). Generally, these methods compute the Personalized PageRank for each node (see Eq. 1). When the computation has converged, the similarity of two nodes is given by the cosine similarity of the Personalized PageRank vectors. We implemented this method for our experiments and call it PPR+cos.
We use TS68, a test set of 68 synonym pairs published by Minkov and Cohen (2012) for evaluation. This gold standard lists a single word as the correct synonym even if there are several equally acceptable nearsynonyms (see Table 3 for examples). We call this the onesynonym evaluation. Three native English speakers were asked to mark synonyms, that were proposed by a baseline or by CoSimRank, i.e. ranked in the top 10. If all three of them agreed on one word as being a synonym in at least one meaning, we added this as a correct answer to the test set. We call this the “extended” evaluation (see Table 2).
Synonym extraction is run on the English graph. To calculate PPR+cos, we computed $20$ iterations with a decay factor of $0.8$ and used the cosine similarity with the 2norm in the denominator to compare two vectors. For the other three methods, we also used a decay factor of $0.8$ and computed $5$ iterations. Recall that CoSimRank uses the simple inner product $\u27e8\cdot ,\cdot \u27e9$ to compare vectors.
P@1  P@10  MRR  
onesynonym  
PPR+cos  20.6%  52.9%  0.32 
SimRank  25.0%  61.8%  0.37 
CoSimRank  25.0%  61.8%  0.37 
Typed CoSimRank  23.5%  63.2%  0.37 
extended  
PPR+cos  32.6%  73.5%  0.48 
SimRank  45.6%  83.8%  0.59 
CoSimRank  45.6%  83.8%  0.59 
Typed CoSimRank  44.1%  83.8%  0.59 
Our evaluation measures are proportion of words correctly translated by word in the top position (P@1), proportion of words correctly translated by a word in one of the top 10 positions (P@10) and Mean Reciprocal Rank (MRR). CoSimRank’s MRR scores of 0.37 (onesynonym) and 0.59 (extended) are the same or better than all baselines (see Table 2). CoSimRank and SimRank have the same P@1 and P@10 accuracy (although they differed on some decisions). CoSimRank is better than PPR+cos on both evaluations, but as this test set is very small, the results are not significant. Table 3 shows a sample of synonyms proposed by CoSimRank.
Minkov and Cohen (2012) tested cosine and randomwalk measures on grammatical relationships (similar to our setup) as well as on cooccurrence statistics. The MRR scores for these methods range from 0.29 to 0.59. (MRR is equivalent to MAP as reported by Minkov and Cohen (2012) when there is only one correct answer.) Their best number (0.59) is better than our onesynonym result; however, they performed manual postprocessing of results – e.g., discarding words that are morphologically or semantically related to other words in the list – so our fully automatic results cannot be directly compared.
keyword  expected  extracted 

movie  film  film 
modern  contemporary  contemporary 
demonstrate  protest  show 
attractive  appealing  beautiful 
economic  profitable  financial 
close  shut  open 
We evaluate lexicon extraction on TS1000, a test set of 1000 items, [17] each consisting of an English word and its German translations. For lexicon extraction, we use the same parameters as in the synonym extraction task for all four similarity measures. We use a seed dictionary of 12,630 word pairs to establish nodenode correspondences between the two graphs. We remove a search keyword from the seed dictionary before calculating similarities for it, something that the architecture of CoSimRank makes easy because we can use a different seed dictionary ${S}^{\left(0\right)}$ for every keyword.
Both CoSimRank methods outperform SimRank significantly (see Table 4). The difference between CoSimRank with and without typed edges is not significant. (This observation was also made for SimRank on a smaller graph and test set [17].)
PPR+cos’s performance at $14.8\mathrm{\%}$ correct translations is much lower than SimRank and CoSimRank. The disadvantage of this similarity measure is significant and even more visible on bilingual lexicon extraction than on synonym extraction (see Table 2). The reason might be that we are not comparing the whole PPR vector anymore, but only entries which occur in the seed dictionary (see Eq. 9). As the seed dictionary contains 12,630 word pairs, this means that only every fourth entry of the PPR vector (the German graph has 47,439 nodes) is used for similarity calculation. This is also true for CoSimRank, but it seems that CoSimRank is more stable because we compare more than one vector.^{2}^{2}significantly worse than CoSimRank (${\alpha}{=}{0.05}$, onetailed ZTest)
P@1  P@10  
PPR+cos  14.8%${}^{\u2020}$  45.7%${}^{\u2020}$ 
SimRank MEE  48.0%${}^{\u2020}$  76.0%${}^{\u2020}$ 
CoSimRank  61.1%  84.0% 
Typed CoSimRank  61.4%  83.9% 
We also experimented with the method of Fogaras and Rácz (2005). We tried a number of different ways of modifying it for weighted graphs: (i) running the random walks with the weighted adjacency matrix as Markov matrix, (ii) storing the weight (product of each edge weight) of a random walk and using it as a factor if two walks meet and (iii) a combination of both. We needed about 10,000 random walks in all three conditions. As a result, the computational time was approximately 30 minutes per test word, so this method is even slower than SimRank for our application. The accuracies P@1 and P@10 were worse in all experiments than those of CoSimRank.
Table 5 compares the run time performance of CoSimRank with the baselines. We ran all experiments on a 64bit Linux machine with 64 Intel Xenon X7560 2.27Ghz CPUs and 1TB RAM. The calculated time is the sum of the time spent in user mode and the time spent in kernel mode. The actual wall clock time was significantly lower as we used up to 64 CPUs.
Compared to SimRank, CoSimRank is more than 40 times faster on synonym extraction and six times faster on lexicon extraction. SimRank is at a disadvantage because it computes all similarities in the graph regardless of the size of the test set; it is particularly inefficient on synonym extraction because the English graph contains a large number of edges (see Table 1).
Compared to PPR+cos, CoSimRank is roughly four times faster on synonym extraction and has comparable performance on lexicon extraction. We compute 20 iterations of PPR+cos to reach convergence and then calculate a single cosine similarity. For CoSimRank, we need only compute five iterations to reach convergence, but we have to compute a vector similarity in each iteration. The counteracting effects of fewer iterations and more vector similarity computations can give either CoSimRank or PPR+cos an advantage, as is the case for synonym extraction and lexicon extraction, respectively.
CoSimRank should generally be three times faster than typed CoSimRank since the typed version has to repeat the computation for each of the three types. This effect is only visible on the larger test set (lexicon extraction) because the general computation overhead is about the same on a smaller test set.
synonym extraction  lexicon extraction  

(68 word pairs)  (1000 word pairs)  
PPR+cos  2,228  2,195 
SimRank  23,423  14,418 
CoSimRank  524  2,342 
Typed CoSimRank  615  6,108 
Here we address inducing a bilingual lexicon from a seed set based on grammatical relations found by a parser. An alternative approach is to induce a bilingual lexicon from Wikipedia’s interwiki links [29]. These two approaches have different strengths and weaknesses; e.g., the interwikilinkbased approach does not require a seed set, but it can only be applied to comparable corpora that consist of corresponding – although not necessarily “parallel” – documents.
Despite these differences it is still interesting to compare the two algorithms. Rapp et al. (2012) kindly provided their test set to us. It contains 1000 English words and a single correct German translation for each. We evaluate on a subset we call TS774 that consists of the 774 test word pairs that are in the intersection of words covered by the WINTIAN Wikipedia data [29] and words covered by our data. Most of the 226 missing word pairs are adverbs, prepositions and plural forms that are not covered by our graphs due to the construction algorithm we use: lemmatization, restriction to adjectives, nouns and verbs etc.
Table 6 shows that CoSimRank is slightly, but not significantly worse than WINTIAN on P@1 (43.0 vs 43.8), but significantly better on P@10 (73.6 vs 55.4).^{4}^{4}We achieved better results for CoSimRank by optimizing the damping factor, but in this paper, we only present results for a fixed damping factor of 0.8. The reason could be that CoSimRank is a more effective algorithm than WINTIAN; but the different initializations (seed set vs interwiki links) or the different linguistic representations (grammatical relations vs bagofwords) could also be responsible.
P@1  P@10  

Wintian  43.8%  55.4%${}^{\u2020}$ 
CoSimRank  43.0%  73.6% 
The results on TS774 can be considered conservative since only one translation is accepted as being correct. In reality other translations might also be acceptable (e.g., both street and road for Straße). In contrast, TS1000 accepts more than one correct translation. Additionally, TS774 was created by translating English words into German (using Google translate). We are now testing the reverse direction. So we are doomed to fail if the original English word is a less common translation of an ambiguous German word. For example, the English word gulf was translated by Google to Golf, but the most common sense of Golf is the sport. Hence our algorithm will incorrectly translate it back to golf.
As we can see in Table 7, we also face the problems discussed by Laws et al. (2010): the algorithm sometimes picks cohyponyms (which can still be seen as reasonable) and antonyms (which are clear errors).
keyword  gold standard  CoSimRank 

arm  poor  impoverished 
erreichen  reach  achieve 
gehen  go  walk 
direkt  directly  direct 
weit  far  further 
breit  wide  narrow 
reduzieren  reduce  increase 
Stunde  hour  second 
Westen  west  southwest 
Junge  boy  child 
Contrary to our intuition, the edgetyped variant of CoSimRank did not perform significantly better than the nonedgetyped version. Looking at Table 1, we see that there is only one edge type connecting adjectives. The same is true for verbs. The random surfer only has a real choice between different edge types when she is on a noun node. Combined with the fact that only the last edge type is important this has absolutely no effect for a random surfer meeting at adjectives or verbs.
Two possible solutions would be (i) to use more finegrained edge types, (ii) to apply Eq. 5.3, in which the edge type of each step is important. However, this will increase the memory needed for calculation.
We have presented CoSimRank, a new similarity measure that can be computed for a single node pair without relying on the similarities in the whole graph. We gave two different formalizations of CoSimRank: (i) a derivation from Personalized PageRank and (ii) a matrix representation that can take advantage of fast matrix multiplication algorithms. We also presented extensions of CoSimRank for a number of applications, thus demonstrating the flexibility of CoSimRank as a similarity measure.
We showed that CoSimRank is superior to SimRank in time and space complexity; and we demonstrated that CoSimRank performs better than PPR+cos on two similarity computation tasks.
Acknowledgments. This work was supported by DFG (SCHU 2246/22).