Next Generation Sequencing techniques have brought new insights into -omics data analysis, mostly thanks to their reliability in detecting biological variants. This reliability is usually measured using a value called Phred quality score (or Q score).
The Phred score of a base is an integer value that represents the estimated probability of an error in base calling. Mathematically, a Q score is logarithmically related to the base-calling error probabilities P, and can be calculated using the following formula:
Q = -10 log10 P
In the real world, a quality score of 20 means that there is a possibility in 100 that the base in incorrect; a quality score of 40 means the chances that the base is called incorrectly is 1 in 10000.
The Phred score is also inversely related to the base call accuracy, thus a higher Q score means a more reliable base call. Here is a useful table which shows this simple relationship:
Phred Quality Score | Incorrect base call prob | Base call accuracy |
---|---|---|
10 | 1 in 10 | 90% |
20 | 1 in 100 | 99% |
30 | 1 in 1000 | 99.9% |
40 | 1 in 10000 | 99.99% |
In fastq files, Phred quality scores are usually represented using ASCII characters, such that the quality score of each base can be specified using a single character. While older Illumina data used to apply the ASCII_BASE 64, nowadays the ASCII_BASE 33 table has been universally adopted for NGS data:
Q Score | ASCII char | Q Score | ASCII char | Q Score | ASCII char | Q Score | ASCII char |
---|---|---|---|---|---|---|---|
0 | ! | 11 | , | 22 | 7 | 32 | A |
1 | " | 12 | - | 23 | 8 | 33 | B |
2 | # | 13 | . | 24 | 9 | 34 | C |
3 | $ | 14 | / | 25 | : | 35 | D |
4 | % | 15 | 0 | 26 | ; | 36 | E |
5 | & | 16 | 1 | 27 | < | 37 | F |
6 | ' | 17 | 2 | 28 | = | 38 | G |
7 | ( | 18 | 3 | 29 | > | 39 | H |
8 | ) | 19 | 4 | 30 | ? | 40 | I |
9 | * | 20 | 5 | 31 | @ | 41 | J |
10 | + | 21 | 6 |
Even though there are lots of Python, Biopython and stand-alone softwares for dealing with Phred quality scores, a simple command to convert an ASCII character to its correspondent quality score is the following (from the terminal):
python3 -c 'print(ord("<ASCII>")-33)'
Or, when working in a Python3 session:
print(ord("<ASCII>")-33)
In both cases, just replace <ASCII>
with the actual ASCII character and that will do the trick.
Top comments (0)