Long tail


In statistics and business, a long tail of some distributions of numbers is the portion of the distribution having many occurrences far from the "head" or central part of the distribution. The distribution could involve popularities, random numbers of occurrences of events with various probabilities, etc. The term is often used loosely, with no definition or arbitrary definition, but precise definitions are possible.
In statistics, the term long-tailed distribution has a narrow technical meaning, and is a subtype of heavy-tailed distribution. Intuitively, a distribution is long-tailed if, for any fixed amount, when a quantity exceeds a high level, it almost certainly exceeds it by at least that amount: large quantities are probably even larger. Note that there is no sense of the "long tail" of a distribution, but only the property of a distribution being long-tailed.
In business, the term long tail is applied to rank-size distributions or rank-frequency distributions, which often form power laws and are thus long-tailed distributions in the statistical sense. This is used to describe the retailing strategy of selling many unique items with relatively small quantities sold of each —usually in addition to selling fewer popular items in large quantities. Sometimes an intermediate category is also included, variously called the body, belly, torso, or middle. The specific cutoff of what part of a distribution is the "long tail" is often arbitrary, but in some cases may be specified objectively; see segmentation of rank-size distributions.
The long tail concept has found some ground for application, research, and experimentation. It is a term used in online business, mass media, micro-finance, user-driven innovation, knowledge management, and social network mechanisms, economic models, marketing, and IT Security threat hunting within a SOC.

History

s with long tails have been studied by statisticians since at least 1946. The term has also been used in the finance and insurance business for many years. The work of Benoît Mandelbrot in the 1950s and later has led to him being referred to as "the father of long tails".
The long tail was popularized by Chris Anderson in an October 2004 Wired magazine article, in which he mentioned Amazon.com, Apple and Yahoo! as examples of businesses applying this strategy. Anderson elaborated the concept in his book .

Business

The distribution and inventory costs of businesses successfully applying a long tail strategy allow them to realize significant profit out of selling small volumes of hard-to-find items to many customers instead of only selling large volumes of a reduced number of popular items. The total sales of this large number of "non-hit items" is called "the long tail".
Given enough choice, a large population of customers, and negligible stocking and distribution costs, the selection and buying pattern of the population results in the demand across products having a power law distribution or Pareto distribution.
It is important to understand why some distributions are normal vs. long tail distributions. Chris Anderson argues that while quantities such as human height or IQ follow a normal distribution, in scale-free networks with preferential attachments, power law distributions are created, i.e. because some nodes are more connected than others.

Statistical meaning

The long tail is the name for a long-known feature of some statistical distributions. In "long-tailed" distributions a high-frequency or high-amplitude population is followed by a low-frequency or low-amplitude population which gradually "tails off" asymptotically. The events at the far end of the tail have a very low probability of occurrence.
As a rule of thumb, for such population distributions the majority of occurrences are accounted for by the first 20% of items in the distribution. What is unusual about a long-tailed distribution is that the most frequently occurring 20% of items represent less than 50% of occurrences; or in other words, the least frequently occurring 80% of items are more important as a proportion of the total population.
Power law distributions or functions characterize an important number of behaviors from nature and human endeavor. This fact has given rise to a keen scientific and social interest in such distributions, and the relationships that create them. The observation of such a distribution often points to specific kinds of mechanisms, and can often indicate a deep connection with other, seemingly unrelated systems. Examples of behaviors that exhibit long-tailed distribution are the occurrence of certain words in a given language, the income distribution of a business or the intensity of earthquakes.
Chris Anderson's and Clay Shirky's articles highlight special cases in which we are able to modify the underlying relationships and evaluate the impact on the frequency of events. In those cases the infrequent, low-amplitude events – the long tail, represented here by the portion of the curve to the right of the 20th percentile – can become the largest area under the line. This suggests that a variation of one mechanism or relationship can significantly shift the frequency of occurrence of certain events in the distribution. The shift has a crucial effect in probability and in the customer demographics of businesses like mass media and online sellers.
However, the long tails characterizing distributions such as the Gutenberg–Richter law or the words-occurrence Zipf's law, and those highlighted by Anderson and Shirky are of very different, if not opposite, nature: Anderson and Shirky refer to frequency-rank relations, whereas the Gutenberg–Richter law and the Zipf's law are probability distributions. Therefore, in these latter cases "tails" correspond to large-intensity events such as large earthquakes and most popular words, who dominate the distributions. By contrast, the long tails in the frequency-rank plots highlighted by Anderson and Shirky would rather correspond to short tails in the associated probability distributions, and therefore illustrate an opposite phenomenon compared to the Gutenberg–Richter and the Zipf's laws.

Chris Anderson and Clay Shirky

Use of the phrase the long tail in business as "the notion of looking at the tail itself as a new market" of consumers was first coined by Chris Anderson. The concept drew in part from a February 2003 essay by Clay Shirky, "Power Laws, Weblogs and Inequality", which noted that a relative handful of weblogs have many links going into them but "the long tail" of millions of weblogs may have only a handful of links going into them. Anderson described the effects of the long tail on current and future business models beginning with a series of speeches in early 2004 and with the publication of a Wired magazine article in October 2004. Anderson later extended it into the book The Long Tail: Why the Future of Business is Selling Less of More.
Anderson argues that products in low demand or that have a low sales volume can collectively make up a market share that rivals or exceeds the relatively few current bestsellers and blockbusters, if the store or distribution channel is large enough. Anderson cites earlier research by Erik Brynjolfsson, Yu Hu, and Michael D. Smith, that showed that a significant portion of Amazon.com's sales come from obscure books that are not available in brick-and-mortar stores. The long tail is a potential market and, as the examples illustrate, the distribution and sales channel opportunities created by the Internet often enable businesses to tap that market successfully.
In his Wired article Anderson opens with an anecdote about creating a niche market for books on Amazon. He writes about a book titled Touching the Void about a near-death mountain climbing accident that took place in the Peruvian Andes. Anderson states the book got good reviews, but didn't have much commercial success. However, ten years later a book titled Into Thin Air by Jon Krakauer was published and Touching the Void began to sell again. Anderson realized that this was due to Amazon's recommendations. This created a niche market for those who enjoy books about mountain climbing even though it is not considered a popular genre supporting the long tail theory.
An Amazon employee described the long tail as follows: "We sold more books today that didn't sell at all yesterday than we sold today of all the books that did sell yesterday."
Anderson has explained the term as a reference to the tail of a demand curve. The term has since been rederived from an XY graph that is created when charting popularity to inventory. In the graph shown above, Amazon's book sales would be represented along the vertical axis, while the book or movie ranks are along the horizontal axis. The total volume of low popularity items exceeds the volume of high popularity items.

Academic research

Effects of online access

In his Wired article, Chris Anderson cites earlier research by Erik Brynjolfsson, Yu Hu, and Michael D. Smith, who first used a log-linear curve on an XY graph to describe the relationship between Amazon.com sales and sales ranking. They found that a large proportion of Amazon.com's book sales come from obscure books that were not available in brick-and-mortar stores. They then quantified the potential value of the long tail to consumers. In an article published in 2003, these authors showed that, while most of the discussion about the value of the Internet to consumers has revolved around lower prices, consumer benefit from access to increased product variety in online book stores is ten times larger than their benefit from access to lower prices online. Thus, the primary value of the internet to consumers comes from releasing new sources of value by providing access to products in the long tail.

The longer tail over time

A study by Erik Brynjolfsson, Yu Hu, and Michael D. Smith finds that the long tail has grown longer over time, with niche books accounting for a larger share of total sales. Their analyses suggested that by 2008, niche books accounted for 36.7% of Amazon's sales while the consumer surplus generated by niche books has increased at least fivefold from 2000 to 2008. In addition, their new methodology finds that, while the widely used power laws are a good first approximation for the rank-sales relationship, the slope may not be constant for all book ranks, with the slope becoming progressively steeper for more obscure books.
In support of their findings, Wenqi Zhou and Wenjing Duan not only find a longer tail but also a fatter tail by an in-depth analysis on consumer software downloading pattern in their paper "Online user reviews, product variety, and the long tail". The demand for all products decreases, but the decrease for the hits is more pronounced, indicating the demand shifting from the hits to the niches over time. In addition, they also observe a superstar effect in the presence of the long tail. A small number of very popular products still dominates the demand.

Goodbye Pareto principle, welcome the new distribution

In a 2006 working paper titled "Goodbye Pareto Principle, Hello Long Tail", Erik Brynjolfsson, Yu Hu, and Duncan Simester found that, by greatly lowering search costs, information technology in general and Internet markets in particular could substantially increase the collective share of hard-to-find products, thereby creating a longer tail in the distribution of sales.
They used a theoretical model to show how a reduction in search costs will affect the concentration in product sales. By analyzing data collected from a multi-channel retailing company, they showed empirical evidence that the Internet channel exhibits a significantly less concentrated sales distribution, when compared with traditional channels. An 80/20 rule fits the distribution of product sales in the catalog channel quite well, but in the Internet channel, this rule needs to be modified to a 72/28 rule in order to fit the distribution of product sales in that channel. The difference in the sales distribution is highly significant, even after controlling for consumer differences.

Demand-side and supply-side drivers

The key supply-side factor that determines whether a sales distribution has a long tail is the cost of inventory storage and distribution. Where inventory storage and distribution costs are insignificant, it becomes economically viable to sell relatively unpopular products; however, when storage and distribution costs are high, only the most popular products can be sold. For example, a traditional movie rental store has limited shelf space, which it pays for in the form of building overhead; to maximize its profits, it must stock only the most popular movies to ensure that no shelf space is wasted. Because online video rental provider stocks movies in centralized warehouses, its storage costs are far lower and its distribution costs are the same for a popular or unpopular movie. It is therefore able to build a viable business stocking a far wider range of movies than a traditional movie rental store. Those economics of storage and distribution then enable the advantageous use of the long tail: for example, Netflix finds that in aggregate, "unpopular" movies are rented more than popular movies.
An MIT Sloan Management Review article titled "From Niches to Riches: Anatomy of the Long Tail" examined the long tail from both the supply side and the demand side and identifies several key drivers. On the supply side, the authors point out how e-tailers' expanded, centralized warehousing allows for more offerings, thus making it possible for them to cater to more varied tastes.
On the demand side, tools such as search engines, recommendation software, and sampling tools are allowing customers to find products outside their geographic area. The authors also look toward the future to discuss second-order, amplified effects of Long Tail, including the growth of markets serving smaller niches.
Not all recommender systems are equal, however, when it comes to expanding the long tail. Some recommenders can exhibit a bias toward popular products, creating positive feedback, and actually reduce the long tail. A Wharton study details this phenomenon along with several ideas that may promote the long tail and greater diversity.
A recent study conducted by Wenqi Zhou and Wenjing Duan further points out that the demand side factor and the supply side factor interplay to influence the long tail formation of user choices. Consumers' reliance on online user reviews to choose products is significantly influenced by the quantity of products available. Specifically, they find that the impacts of both positive and negative user reviews are weakened as product variety goes up. In addition, the increase in product variety reduces the impact of user reviews on popular products more than it does on niche products.

Networks, crowds, and the long tail

The "crowds" of customers, users and small companies that inhabit the long-tail distribution can perform collaborative and assignment work. Some relevant forms of these new production models are:
The demand-side factors that lead to the long tail can be amplified by the "networks of products" which are created by hyperlinked recommendations across products. An MIS Quarterly article by Gal Oestreicher-Singer and Arun Sundararajan shows that categories of books on Amazon.com which are more central and thus influenced more by their recommendation network have significantly more pronounced long-tail distributions. Their data across 200 subject areas shows that a doubling of this influence leads to a 50% increase in revenues from the least popular one-fifth of books.

Turnover within the long tail

The long-tail distribution applies at a given point in time, but over time the relative popularity of the sales of the individual products will change. Although the distribution of sales may appear to be similar over time, the positions of the individual items within it will vary. For example, new items constantly enter most fashion markets. A recent fashion-based model of consumer choice, which is capable of generating power law distributions of sales similar to those observed in practice, takes into account turnover in the relative sales of a given set of items, as well as innovation, in the sense that entirely new items become offered for sale.
There may be an optimal inventory size, given the balance between sales and the cost of keeping up with the turnover. An analysis based on this pure fashion model indicates that, even for digital retailers, the optimal inventory may in many cases be less than the millions of items that they can potentially offer. In other words, by proceeding further and further into the long tail, sales may become so small that the marginal cost of tracking them in rank order, even at a digital scale, might be optimised well before a million titles, and certainly before infinite titles. This model can provide further predictions into markets with long-tail distribution, such as the basis for a model for optimizing the number of each individual item ordered, given its current sales rank and the total number of different titles stocked.

Business models

Competitive impact

Before a long tail works, only the most popular products are generally offered. When the cost of inventory storage and distribution fall, a wide range of products become available. This can, in turn, have the effect of reducing demand for the most popular products. For example, a small website that focuses on niches of content can be threatened by a larger website which has a variety of information Web content. The big website covers more variety while the small website has only a few niches to choose from.
The competitive threat from these niche sites is reduced by the cost of establishing and maintaining them and the effort required for readers to track multiple small web sites. These factors have been transformed by easy and cheap web site software and the spread of RSS. Similarly, mass-market distributors like Blockbuster may be threatened by distributors like LoveFilm, which supply the titles that Blockbuster doesn't offer because they are not already very popular.

Internet companies

Some of the most successful Internet businesses have used the long tail as part of their business strategy. Examples include eBay, Yahoo! and Google, Amazon, and iTunes Store, amongst the major companies, along with smaller Internet companies like Audible and LoveFilm. These purely digital retailers also have almost no marginal cost, which is benefiting the online services, unlike physical retailers that have fixed limits on their products. The internet can still sell physical goods, but at an unlimited selection and with reviews and recommendations. The internet has opened up larger territories to sell and provide its products without being confined to just the "local Markets" such as physical retailers like Target or even Walmart. With the digital and hybrid retailers there is no longer a perimeter on market demands.

Video and multiplayer online games

The adoption of video games and massively multiplayer online games such as Second Life as tools for education and training is starting to show a long-tailed pattern. It costs significantly less to modify a game than it has been to create unique training applications, such as those for training in business, commercial flight, and military missions. This has led some to envision a time in which game-based training devices or simulations will be available for thousands of different job descriptions.

Microfinance and microcredit

The banking business has used internet technology to reach an increasing number of customers. The most important shift in business model due to the long tail has come from the various forms of microfinance developed.
As opposed to e-tailers, micro-finance is a distinctly low technology business. Its aim is to offer very small credits to lower-middle to lower class and poor people, that would otherwise be ignored by the traditional banking business. The banks that have followed this strategy of selling services to the low-frequency long tail of the sector have found out that it can be an important niche, long ignored by consumer banks. The recipients of small credits tend to be very good payers of loans, despite their non-existent credit history. They are also willing to pay higher interest rates than the standard bank or credit card customer. It also is a business model that fills an important developmental role in an economy.
Grameen Bank in Bangladesh has successfully followed this business model. In Mexico the banks Compartamos and Banco Azteca also service this customer demographic, with an emphasis on consumer credit. Kiva.org is an organization that provides micro credits to people worldwide, by using intermediaries called small microfinance organizations to distribute crowd sourced donations made by Kiva.org lenders.

User-driven innovation

According to the user-driven innovation model, companies can rely on users of their products and services to do a significant part of the innovation work. Users want products that are customized to their needs. They are willing to tell the manufacturer what they really want and how it should work. Companies can make use of a series of tools, such as interactive and internet based technologies, to give their users a voice and to enable them to do innovation work that is useful to the company.
Given the diminishing cost of communication and information sharing, long-tailed user driven innovation will gain importance for businesses.
In following a long-tailed innovation strategy, the company is using the model to tap into a large group of users that are in the low-intensity area of the distribution. It is their collaboration and aggregated work that results in an innovation effort. Social innovation communities formed by groups of users can perform rapidly the trial and error process of innovation, share information, test and diffuse the results.
Eric von Hippel of MIT's Sloan School of Management defined the user-led innovation model in his book Democratizing Innovation. Among his conclusions is the insight that as innovation becomes more user-centered the information needs to flow freely, in a more democratic way, creating a "rich intellectual commons" and "attacking a major structure of the social division of labor".

Marketing

The drive to build a market and obtain revenue from the consumer demographic of the long tail has led businesses to implement a series of long-tail marketing techniques, most of them based on extensive use of internet technologies. Among the most representative are:
The long tail has possible implications for culture and politics. Where the opportunity cost of inventory storage and distribution is high, only the most popular products are sold. But where the long tail works, minority tastes become available and individuals are presented with a wider array of choices. The long tail presents opportunities for various suppliers to introduce products in the niche category. These encourage the diversification of products. These niche products open opportunities for suppliers while concomitantly satisfying the demands of many individuals – therefore lengthening the tail portion of the long tail. In situations where popularity is currently determined by the lowest common denominator, a long-tail model may lead to improvement in a society's level of culture. The opportunities that arise because of the long tail greatly affect society's cultures because suppliers have unlimited capabilities due to infinite storage and demands that were unable to be met prior to the long tail are realized. At the end of the long tail, the conventional profit-making business model ceases to exist; instead, people tend to come up with products for varied reasons like expression rather than monetary benefit. In this way, the long tail opens up a large space for authentic works of creativity.

Cultural diversity

is a good example of this: Chris Anderson defines long-tail TV in the context of "content that is not available through traditional distribution channels but could nevertheless find an audience." Thus, the advent of services such as television on demand, pay-per-view and even premium cable subscription services such as HBO and Showtime open up the opportunity for niche content to reach the right audiences, in an otherwise mass medium. These may not always attract the highest level of viewership, but their business distribution models make that of less importance. As the opportunity cost goes down, the choice of TV programs grows and greater cultural diversity rises.

Distribution of independent content

Often presented as a phenomenon of interest primarily to mass market retailers and web-based businesses, the long tail also has implications for the producers of content, especially those whose products could not – for economic reasons – find a place in pre-Internet information distribution channels controlled by book publishers, record companies, movie studios, and television networks. Looked at from the producers' side, the long tail has made possible a flowering of creativity across all fields of human endeavour. One example of this is YouTube, where thousands of diverse videos – whose content, production value or lack of popularity make them inappropriate for traditional television – are easily accessible to a wide range of viewers.

Contemporary literature

The intersection of viral marketing, online communities and new technologies that operate within the long tail of consumers and business is described in the novel by William Gibson, Pattern Recognition.

Military applications and security

In military thinking, John Robb applies the long tail to the developments in insurgency and terrorist movements, showing how technology and networking allows the long tail of disgruntled groups and criminals to take on the nation state and have a chance to win.

Criticisms

A 2008 study by Anita Elberse, professor of business administration at Harvard Business School, calls the long tail theory into question, citing sales data which shows that the Web magnifies the importance of blockbuster hits. On his blog, Chris Anderson responded to the study, praising Elberse and the academic rigor with which she explores the issue but drawing a distinction between their respective interpretations of where the "head" and "tail" begin. Elberse defined head and tail using percentages, while Anderson uses absolute numbers. Similar results were published by Serguei Netessine and Tom F. Tan, who suggest that head and tail should be defined by percentages rather than absolute numbers.
Also in 2008, a sales analysis of an unnamed UK digital music service by economist Will Page and high-tech entrepreneur Andrew Bud found that sales exhibited a log-normal distribution rather than a power law; they reported that 80% of the music tracks available sold no copies at all over a one-year period. Anderson responded by stating that the study's findings are difficult to assess without access to its data.

Footnotes