Similarity rank

From Knoesis wiki
Revision as of 14:25, 20 July 2010 by Paul (Talk | contribs) (Created page with 'Similarity rank is designed to rank similar items in a graph based on Wikipedia. First the wikipedia graph is transformed into its matrix transformation, and cosine similarity is…')

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Similarity rank is designed to rank similar items in a graph based on Wikipedia. First the wikipedia graph is transformed into its matrix transformation, and cosine similarity is used to calculate the value.


Wikipedia Graph

A DirectedSparseMultigraph is used to create the graph, from the jung package. There are four types of edges: RegularEdge, SeeAlsoEdge, RegularBackEdge, SeeAlsoBackEdge. This is transformed into a matrix. Where each row and column represent a vertix in the graph and the value of the cell is the wieght of the edge. Since a matrix cannot have more than one edge, a single weight must be chosen. First it selects the highest weight and uses that, but if it is doubly-linked, then a different(higher) value is used, chosen from the constants DoubleRegularWeight and DoubleSeeAlsoWeight. The matrix is sparse and uses a SortedArrayMap for the rows.


The ranking

Cosine similarity is used to calculate the ranking:


 \text{similarity} = \cos(\theta) = {A \cdot B \over \|A\| \|B\|}.

It takes advantege of the sorted rows to calculate the inserection of two ros.