The Spearmans Rank Correlation Coefficient is one of the several correlation calculations used to calculate the 'strength' of a relationship between 2 sets of data, this calculated value is in the range -1..1. For more information on correlations look here.
It works by giving each value in each set of data a rank between 1 and at most n (in descending order), if 2 or more values are eqaul then the rank is averaged between these values, so for instance a set of 7 values and the respective ranks:
| x | 72 | 112 | 46 | 97 | 46 | 46 | 52 |
|---|---|---|---|---|---|---|---|
| rx | 3 | 1 | 6 | 2 | 6 | 6 | 4 |
As you can see the value of 46 is given the rank 6, because (5+6+7)/3 = 6, now that I have described how sets of values are ranked, lets look at the equation:
Where d = the difference in rank between the 2 sets of data, n = the number of values in each set (7)
So using the example above as our first set if we now include a second set of data (must also be 7 values):
| y | 20 | 2 | 7 | 11 | 4 | 12 | 7 |
|---|---|---|---|---|---|---|---|
| ry | 1 | 7 | 4.5 | 3 | 6 | 2 | 4.5 |
As before the value 7 shares 4th and 5th (4+5/2). Now we can sum the squares of the differences between ranks:
= 22 + -62 + 1.52 + -12 + 02 + 42 + -0.52
= 4 + 36 + 2.25 + 1 + 0 + 16 + 0.25
= 59.5
This gives a final result of: ρ = -0.0625 (a very slight negative correlation). Now the write a program which can calculate this for LINQ, firstly we need to order the values in descending order, numbered from 1 to n, the best way to do this is by using fluent syntax, as this supports the selecting of a value along with its index (something that isn't possible with query syntax). We then can group the values if the numbers are the same:
double[] rankings = (from n in values join grp in groupedValues on n equals grp.Key select grp.Average(g => g.IndexedRank)).ToArray();With the rankings we can combine them and the calculate the sum of the differences by using aggregation:
int n = rankings.Length; // Note that I have expanded the n^2(n - 1) to use less code double rho = 1 - ((6 * sigmaDiff) / (Math.Pow(n, 3) - n));Putting it all together, using our 2 sets as a test harness:
using System;
using System.Collections.Generic;
using System.Linq;
public static class Extensions
{
public static double SpearmansCoeff(this IEnumerable<int> current, IEnumerable<int> other)
{
if (current.Count() != other.Count())
throw new ArgumentException("Both collections of data must contain an equal number of elements");
double[] ranksX = GetRanking(current);
double[] ranksY = GetRanking(other);
var diffPair = ranksX.Zip(ranksY, (x, y) => new { x, y });
double sigmaDiff = diffPair.Sum(s => Math.Pow(s.x - s.y, 2));
int n = current.Count();
// Spearman's Coefficient of Correlation
// ρ = 1 - ((6 * sum of rank differences^2) / (n(n^2 - 1))
double rho = 1 - ((6 * sigmaDiff) / (Math.Pow(n, 3) - n));
return rho;
}
static double[] GetRanking(IEnumerable<int> values)
{
var groupedValues = values.OrderByDescending(n => n)
.Select((val, i) => new { Value = val, IndexedRank = i + 1 })
.GroupBy(i => i.Value);
double[] rankings = (from n in values
join grp in groupedValues on n equals grp.Key
select grp.Average(g => g.IndexedRank)).ToArray();
return rankings;
}
}
class Program
{
static void Main(string[] args)
{
int[] setX = { 72, 112, 46, 97, 46, 46, 52 };
int[] setY = { 20, 2, 7, 11, 4, 12, 7 };
double rho = setX.SpearmansCoeff(setY);
Console.WriteLine("Spearman's Rank Correlation Coefficient: {0}", rho);
}
}
Spearman's Rank Correlation Coefficient: -0.0625

1 comment:
Ryan,
Very impressive. I am looking for someone to do a LINQ project for me. It involves writing several basic and some advanced risk functions to analyze return data. Would you be interested?
You can contact me at mansoor.razzaq@gmail.com
Post a Comment