## Thursday, October 16, 2008

### How Long The Long Tail?

In The Long Tail, Chris Anderson shows how much the shift from brick and mortar to internet stores increases selection and how much that increased selection accounts for in terms of sales (see chart).

Aside: As you can see in the graph from the July 8, 2008 edition of The Long Tail, Rhapsody has 4.5 million tracks, Netflix 90,000 DVDs and Amazon 5 million book titles. In the July 11, 2006 hard cover edition those numbers were 1.5 million, 55,000 and 3.7 million respectively and in the original October 2004 Wired article they were 735,000, 25,000 and 2.3 million. Amazing the increase in just 4 years.

But, after finishing the book I was still left with three questions.

First, how do you quantify the value to the consumer of this additional selection?

Second, how many titles account for the top and bottom 50% of sales in the brick and mortar stores and how much does that increase for internet stores?

Third, how long is the long tail? By which I mean did the top 1% of titles account for 5%, 20% or 40% of sales? Does this differ between books, music, and movies and if so, why? As the number of books, songs and movies created each year increases, does that change the shape of the long tail?

While I am still not sure of the answers to these questions, I was able to find a mathematical framework and estimate how long the long tail is. In the book The Black Swan, the author describes the Pareto power law distribution and gives some estimates of the exponent for certain phenomenon.

Knowing the exponent, you can determine what % of the total share the top x% of items will get using this formula : Y = X^(1-1/A), where Y = % of share, X = top x%, and A = exponent (you have no idea how long it took me to figure out this formula).

Here is a table with the share of the top 1%, 20% and 50% for different exponents:

For example, the exponent of the number of books sold in the US is estimated at 1.5. Plugging the values in, you see that the top 1% of book titles should account for 22% of sales (22% = .01^(1-1/1.5)). The top 20% of book titles should account for 58% of sales and the top 50%, 79%. Strangely, this doesn't quite mesh with the values from the Long Tail book on book publishing. According to those numbers, .1% of the book titles accounted for 23% of sales which would be an exponent of 1.27 and 1.86% accounted for 65% for an exponent of 1.12.

The power law distribution is famous for the 80/20 rule that 80% of the effects comes from 20% of the causes. If you are curious what the exponent needed to make the 80/20 rule fit (top 20% gets 80% of the value) that would be Log45 or 1.16.

I would be interested to know what the best estimate for the exponent of the book, music and movie industry are, and how it has changed with the advent of long tail retailers.