Similarity rank

Similarity rank is designed to rank similar items in a graph based on Wikipedia. First the wikipedia graph is transformed into its matrix transformation, and cosine similarity is used to calculate the value.

Wikipedia Graph

A DirectedSparseMultigraph is used to create the graph, from the jung package. There are four types of edges: RegularEdge, SeeAlsoEdge, RegularBackEdge, SeeAlsoBackEdge. This is transformed into a matrix. Where each row and column represent a vertix in the graph and the value of the cell is the wieght of the edge. Since a matrix cannot have more than one edge, a single weight must be chosen. First it selects the highest weight and uses that, but if it is doubly-linked, then a different(higher) value is used, chosen from the constants DoubleRegularWeight and DoubleSeeAlsoWeight. The matrix is sparse and uses a SortedArrayMap for the rows.

The ranking

Cosine similarity is used to calculate the ranking:

\text{similarity} = \cos(\theta) = {A \cdot B \over \|A\| \|B\|}.

It takes advantege of the sorted rows to calculate the inserection of two ros.

Similarity rank

Wikipedia Graph

The ranking

Navigation menu

Views

Personal tools

Navigation

Homepage

Search

Tools