Sulston score

The Sulston score is an equation used in DNA mapping to numerically assess the likelihood that a given "fingerprint" similarity between two DNA clones is merely a result of chance. Used as such, it is a test of statistical significance. That is, low values imply that similarity is significant, suggesting that two DNA clones overlap one another and that the given similarity is not just a chance event. The name is an eponym that refers to John Sulston by virtue of his being the lead author of the paper that first proposed the equation's use.

The overlap problem in mapping

Each clone in a DNA mapping project has a "fingerprint", i.e. a set of DNA fragment lengths inferred from enzymatically digesting the clone, separating these fragments on a gel, and estimating their lengths based on gel location. For each pairwise clone comparison, one can establish how many lengths from each set match-up. Cases having at least 1 match indicate that the clones might overlap because matches may represent the same DNA. However, the underlying sequences for each match are not known. Consequently, two fragments whose lengths match may still represent different sequences. In other words, matches do not conclusively indicate overlaps. The problem is instead one of using matches to probabilistically classify overlap status.

Mathematical scores in overlap assessment

Biologists have used a variety of means to discern clone overlaps in DNA mapping projects. While many are biological, i.e. looking for shared markers, others are basically mathematical, usually adopting probabilistic and/or statistical approaches.

Sulston score exposition

The Sulston score is rooted in the concepts of Bernoulli and binomial processes, as follows. Consider two clones, and, having and measured fragment lengths, respectively, where. That is, clone has at least as many fragments as clone, but usually more. The Sulston score is the probability that at least fragment lengths on clone will be matched by any combination of lengths on. Intuitively, we see that, at most, there can be matches. Thus, for a given comparison between two clones, one can measure the statistical significance of a match of fragments, i.e. how likely it is that this match occurred simply as a result of random chance. Very low values would indicate a significant match that is highly unlikely to have arisen by pure chance, while higher values would suggest that the given match could be just a coincidence.
In what follows, let us refer to individual fragment lengths simply as lengths. Consider a specific length on clone and a specific length on clone. These two lengths are arbitrarily selected from their respective sets and. We assume that the gel location of fragment has been determined and we want
the probability of the event that the location of fragment will match that of. Geometrically, will be declared to match if it falls inside the window of size around. Since fragment could occur anywhere in the gel of length, we have. The probability that does not match is simply the complement, i.e., since it must either match or not match.
Now, let us expand this to compute the probability that no length on clone matches the single particular length on clone. This is simply the intersection of all individual trials where the event occurs, i.e.. This can be restated verbally as: length 1 on clone does not match length on clone and length 2 does not match length and length 3 does not match, etc. Since each of these trials is assumed to be independent, the probability is simply
Of course, the actual event of interest is the complement: i.e. there is not "no matches". In other words, the probability of one or more matches is. Formally, is the probability that at least one band on clone matches band on clone.
This event is taken as a Bernoulli trial having a "success" probability of for band. However, we want to describe the process over all the bands on clone. Since is constant, the number of matches is distributed binomially. Given observed matches, the Sulston score is simply the probability of obtaining at least matches by chance according to
where are binomial coefficients.

Mathematical refinement

In a 2005 paper, Michael Wendl gave an example showing that the assumption of independent trials is not valid. So, although the traditional Sulston score does indeed represent a probability distribution, it is not actually the distribution characteristic of the fingerprint problem. Wendl went on to give the general solution for this problem in terms of the Bell polynomials, showing the traditional score overpredicts P-values by orders of magnitude. This solution provides a basis for determining when a problem has sufficient information content to be treated by the probabilistic approach and is also a general solution to the birthday problem of 2 types.
A disadvantage of the exact solution is that its evaluation is computationally intensive and, in fact, is not feasible for comparing large clones. Some fast approximations for this problem have been proposed.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...