The Spearmans Rank Correlation Coefficient is one of the several correlation calculations used to calculate the 'strength' of a relationship between 2 sets of data, this calculated value is in the range -1..1. For more information on correlations look here.
It works by giving each value in each set of data a rank between 1 and at most n (in descending order), if 2 or more values are eqaul then the rank is averaged between these values, so for instance a set of 7 values and the respective ranks:
x | 72 | 112 | 46 | 97 | 46 | 46 | 52 |
---|---|---|---|---|---|---|---|
rx | 3 | 1 | 6 | 2 | 6 | 6 | 4 |
As you can see the value of 46 is given the rank 6, because (5+6+7)/3 = 6, now that I have described how sets of values are ranked, lets look at the equation:
Where d = the difference in rank between the 2 sets of data, n = the number of values in each set (7)
So using the example above as our first set if we now include a second set of data (must also be 7 values):
y | 20 | 2 | 7 | 11 | 4 | 12 | 7 |
---|---|---|---|---|---|---|---|
ry | 1 | 7 | 4.5 | 3 | 6 | 2 | 4.5 |
As before the value 7 shares 4th and 5th (4+5/2). Now we can sum the squares of the differences between ranks:
= 22 + -62 + 1.52 + -12 + 02 + 42 + -0.52
= 4 + 36 + 2.25 + 1 + 0 + 16 + 0.25
= 59.5
This gives a final result of: ρ = -0.0625 (a very slight negative correlation). Now the write a program which can calculate this for LINQ, firstly we need to order the values in descending order, numbered from 1 to n, the best way to do this is by using fluent syntax, as this supports the selecting of a value along with its index (something that isn't possible with query syntax). We then can group the values if the numbers are the same:
double[] rankings = (from n in values join grp in groupedValues on n equals grp.Key select grp.Average(g => g.IndexedRank)).ToArray();With the rankings we can combine them and the calculate the sum of the differences by using aggregation:
int n = rankings.Length; // Note that I have expanded the n^2(n - 1) to use less code double rho = 1 - ((6 * sigmaDiff) / (Math.Pow(n, 3) - n));Putting it all together, using our 2 sets as a test harness:
using System; using System.Collections.Generic; using System.Linq; public static class Extensions { public static double SpearmansCoeff(this IEnumerable<int> current, IEnumerable<int> other) { if (current.Count() != other.Count()) throw new ArgumentException("Both collections of data must contain an equal number of elements"); double[] ranksX = GetRanking(current); double[] ranksY = GetRanking(other); var diffPair = ranksX.Zip(ranksY, (x, y) => new { x, y }); double sigmaDiff = diffPair.Sum(s => Math.Pow(s.x - s.y, 2)); int n = current.Count(); // Spearman's Coefficient of Correlation // ρ = 1 - ((6 * sum of rank differences^2) / (n(n^2 - 1)) double rho = 1 - ((6 * sigmaDiff) / (Math.Pow(n, 3) - n)); return rho; } static double[] GetRanking(IEnumerable<int> values) { var groupedValues = values.OrderByDescending(n => n) .Select((val, i) => new { Value = val, IndexedRank = i + 1 }) .GroupBy(i => i.Value); double[] rankings = (from n in values join grp in groupedValues on n equals grp.Key select grp.Average(g => g.IndexedRank)).ToArray(); return rankings; } } class Program { static void Main(string[] args) { int[] setX = { 72, 112, 46, 97, 46, 46, 52 }; int[] setY = { 20, 2, 7, 11, 4, 12, 7 }; double rho = setX.SpearmansCoeff(setY); Console.WriteLine("Spearman's Rank Correlation Coefficient: {0}", rho); } }Spearman's Rank Correlation Coefficient: -0.0625