Introduction
I've been working on some tooling for pulling certificate information out of the certificate transparency logs on and off for a while. I started looking at this again after a few weeks away from it and I've forgotten quite a lot! I've even forgotten some of the basics of what makes a certificate. In this post I want to dive into the structure of a certificate, what it is made of at a high level. I won't talk much about how certificates are used in protocols (e.g. Transport Layer Security (TLS)).
This post started as a reference for myself but other folks may find it interesting or useful. It has a lot of external references to RFC's which are stored in footnotes. As of now there is a small bug with how footnotes work where the footnote itself is hidden behind the top bar, see dev.to #7760 for more info. I might try fix it myself this weekend.
Quick note, SSL certificates are X.509 certificates. The term SSL certificate is deeply ingrained on the web, and even though the SSL protocol should no longer be used this term is still used everywhere.
Information in a certificate
We'll use the openssl
cli to retrieve a certificate, then we can start looking into its structure. If the openssl
cli is not installed you should be able to install it through your operating system's package manager.
$ openssl s_client -connect google.com:443 2>/dev/null < /dev/null \
| sed -n '/BEGIN CERTIFICATE/,/END CERTIFICATE/p' > google.com.crt
$ cat google.com.crt
-----BEGIN CERTIFICATE-----
MIIJRDCCCCygAwIBAgIRAJvxi9ebRliZAgAAAABjmHEwDQYJKoZIhvcNAQELBQAw
QjELMAkGA1UEBhMCVVMxHjAcBgNVBAoTFUdvb2dsZSBUcnVzdCBTZXJ2aWNlczET
MBEGA1UEAxMKR1RTIENBIDFPMTAeFw0yMDA0MTUyMDE2NDdaFw0yMDA3MDgyMDE2
...
V+hT9mqgeN10ryOWyN74CvBaw73K3hobSkDAyQS1HkbAqJP9VTuvjZl4PE0ndaIN
yiz/84k5xbSwxO++BuJgMUwj+WaLcvDW
-----END CERTIFICATE-----
$ openssl x509 -text -noout -in google.com.crt
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
9b:f1:8b:d7:9b:46:58:99:02:00:00:00:00:63:98:71
Signature Algorithm: sha256WithRSAEncryption
Issuer: C = US, O = Google Trust Services, CN = GTS CA 1O1
Validity
Not Before: Apr 15 20:16:47 2020 GMT
Not After : Jul 8 20:16:47 2020 GMT
Subject: C = US, ST = California, L = Mountain View, O = Google LLC, CN = *.google.com
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit)
pub:
04:8e:a4:03:0d:0c:a7:1d:52:28:80:ba:89:51:b9:
45:7a:7a:60:33:a5:ab:25:a4:05:c8:32:d9:b6:5c:
...
ASN1 OID: prime256v1
NIST CURVE: P-256
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature
X509v3 Extended Key Usage:
TLS Web Server Authentication
X509v3 Basic Constraints: critical
CA:FALSE
X509v3 Subject Key Identifier:
D0:7D:02:36:9B:CD:47:0B:C5:9C:51:0F:27:A7:70:65:5A:C5:50:E9
X509v3 Authority Key Identifier:
keyid:98:D1:F8:6E:10:EB:CF:9B:EC:60:9F:18:90:1B:A0:EB:7D:09:FD:2B
Authority Information Access:
OCSP - URI:http://ocsp.pki.goog/gts1o1
CA Issuers - URI:http://pki.goog/gsr2/GTS1O1.crt
X509v3 Subject Alternative Name:
DNS:*.google.com, DNS:*.android.com, DNS:*.appengine.google.com, ...
X509v3 Certificate Policies:
Policy: 2.23.140.1.2.2
Policy: 1.3.6.1.4.1.11129.2.5.3
X509v3 CRL Distribution Points:
Full Name:
URI:http://crl.pki.goog/GTS1O1.crl
CT Precertificate SCTs:
Signed Certificate Timestamp:
Version : v1 (0x0)
Log ID : B2:1E:05:CC:8B:A2:CD:8A:20:4E:87:66:F9:2B:B9:8A:
25:20:67:6B:DA:FA:70:E7:B2:49:53:2D:EF:8B:90:5E
Timestamp : Apr 15 21:16:49.089 2020 GMT
Extensions: none
Signature : ecdsa-with-SHA256
30:45:02:20:26:77:E7:A4:C6:F9:D3:C0:0E:95:15:3C:
A2:08:F0:DB:77:9F:1F:7A:EC:7A:26:9B:E8:82:95:33:
...
Signed Certificate Timestamp:
Version : v1 (0x0)
Log ID : 5E:A7:73:F9:DF:56:C0:E7:B5:36:48:7D:D0:49:E0:32:
7A:91:9A:0C:84:A1:12:12:84:18:75:96:81:71:45:58
Timestamp : Apr 15 21:16:49.137 2020 GMT
Extensions: none
Signature : ecdsa-with-SHA256
30:45:02:21:00:FC:5A:10:9B:63:81:BB:16:81:8B:D5:
88:AF:09:A1:D8:83:FD:C3:86:CB:B1:CD:55:71:FF:76:
...
Signature Algorithm: sha256WithRSAEncryption
20:69:ba:0b:e5:b4:7a:36:f7:4f:d2:b2:0f:0d:c1:10:b0:12:
7e:13:f9:f1:ca:6c:a0:c2:46:21:fb:8a:fd:a8:66:a9:96:43:
...
There's quite a lot of information in the certificate. Before we break this down, a quick side note on the openssl
command we used in the above code block. Feel free to skip this next section if you understood it.
Side note on the openssl command
The command we used was:
$ openssl s_client -connect google.com:443 2>/dev/null < /dev/null \
| sed -n '/BEGIN CERTIFICATE/,/END CERTIFICATE/p' > google.com.crt
openssl s_client
is used for connecting to hosts over TLS (and originally SSL, but no server should be using this anymore...). With -connect
we tell it to connect to google.com
on port 443
. 2>/dev/null
says to redirect anything that goes to stderr
in the output of the openssl s_client
command into /dev/null
. Essentially this says ignore stderr
. < /dev/null
says read /dev/null
into the stdin
of the process, in thi s case stdin
of openssl
. Doing this always returns an end of file.
+openssl s_client
is used for connecting to hosts over TLS (and originally SSL, but no site should be using this anymore...).
If you don't do this stdin
redirect the openssl s_client
command will hang waiting for input on stdin
, doing this redirect means that once it tries to read from stdin
it will read an end of file and the read will finish. The last part of the command, | sed -n '/BEGIN CERTIFICATE/,/END CERTIFICATE/p' > google.com.crt
, pipes the output from openssl s_client
into sed
which pulls the certificate out of the output and redirects it to a file called google.com.crt
.
So why does it try to wait for input? There are few commands you can send it. Normally I'd just send Q
to quit, however you can also tell it to send a http
request. Try it! Run openssl s_client -connect google.com:443
then when it hangs type in:
GET / HTTP/1.1
And press enter
twice. It should return the JS/html/css that makes up google.com. That was a bit of a digression, back to certs now.
A breakdown of the main fields
Now we have google.com's cert, what do all the fields mean? Let's break it down one by one. As we do this I will also mention the type and structure of the given field in the ASN.1 (Abstract Syntax Notation1) specification for certificates and try to explain things which may not be obvious. Wherever I outline a part of the definition I will start it with a comment # ASN.1
. I will also strip out pieces of information which are not relevant to a particular section and replace them with ...
. The full ASN.1 definition can be found in Appendix A.1 of RFC 5280 - X.509 Public Key Infrastructure.
Note that the ASN.1 spec for certificates describes a high level representation of certificate information. As bytes, this information is further encoded as DER (Distinguished Encoding Rules). I won't go into the details of ASN.1 -> DER encoding2, but this post should lay some groundwork to make this encoding clearer in a future post.
Above we used openssl
to pull out the information from an existing cert. The fields mentioned there can have many possible values. The main structure is defined as follows in ASN.1 - don't worry if you don't understand it, we will cover each field in depth.
# ASN.1
Certificate ::= SEQUENCE {
tbsCertificate TBSCertificate,
signatureAlgorithm AlgorithmIdentifier,
signature BIT STRING }
TBSCertificate ::= SEQUENCE {
version [0] Version DEFAULT v1,
serialNumber CertificateSerialNumber,
signature AlgorithmIdentifier,
issuer Name,
validity Validity,
subject Name,
subjectPublicKeyInfo SubjectPublicKeyInfo,
issuerUniqueID [1] IMPLICIT UniqueIdentifier OPTIONAL,
-- If present, version MUST be v2 or v3
subjectUniqueID [2] IMPLICIT UniqueIdentifier OPTIONAL,
-- If present, version MUST be v2 or v3
extensions [3] Extensions OPTIONAL
-- If present, version MUST be v3 -- }
Certificate
Certificate
is an ASN.1 SEQUENCE
. A SEQUENCE
is an ordered list of values. In this case a list with tbsCertificate
, signatureAlgorithm
and signature
. First let's look at signature
and signatureAlgorithm
.
Signature and Signature Algorithm
In the openssl
output for google.com's cert both Signature
3 and Signature Algorithm
4 can be seen right at the bottom:
Signature Algorithm: sha256WithRSAEncryption
20:69:ba:0b:e5:b4:7a:36:f7:4f:d2:b2:0f:0d:c1:10:b0:12:
7e:13:f9:f1:ca:6c:a0:c2:46:21:fb:8a:fd:a8:66:a9:96:43:
...
I've stripped out part of the Signature
and replaced it with ...
as it's not relevant. The Signature Algorithm
field indicates the algorithm used by the issuing Certificate Authority (CA) to sign this certificate. Here its value is sha256WithRSAEncryption
. For more information on this algorithm see Section 5 of RFC 4055 - Additional Algorithms and Identifiers for RSA ... and RFC 2313 - PKCS #1: RSA Encryption. Signature
and SignatureAlgorithm
are defined as follows in the ASN.1 spec:
# ASN.1
Certificate ::= SEQUENCE {
...
signatureAlgorithm AlgorithmIdentifier,
signature BIT STRING }
signature
has type BIT STRING
, this is simply a string of bits. In this case those bits contain a digital signature computed from the tbsCertificate
field of this cert, using the algorithm defined in signatureAlgorithm
. signatureAlgorithm
is of type AlgorithmIdentifier
. openssl
encodes this as :
separated hexadecimal, e.g. 20:69:ba:0b:e5:b4...
. AlgorithmIdentifier
itself is a SEQUENCE
, an ordered list with the given items. In this case a list with two values, algorithm
and parameters
.
# ASN.1
AlgorithmIdentifier ::= SEQUENCE {
algorithm OBJECT IDENTIFIER,
parameters ANY DEFINED BY algorithm OPTIONAL }
-- contains a value of the type
-- registered for use with the
-- algorithm object identifier value
Here algorithm
has type OBJECT IDENTIFIER
. An OBJECT IDENTIFIER
, or OID
, is a standard way of identifying objects defined by the International Telecommunications Union (ITU). It is defined in RFC 3061 - A URN Namespace of Object Identifiers and is definitely not something I'll go into more detail on in this post, it is a big topic! For the curious, the OID for sha256WithRSAEncryption
is 1.2.840.113549.1.1.11
5. It is defined in RFC 4055 Section 5, to fully understand that definition you may have to go down a rabbit hole of RFC's...
parameters
defines parameters to the specified algorithm. We will see an example of this in the Subject and Subject Public Key Info section.
TBSCertificate
TBSCertificate
contains the information on the subject of the certificate and the issuing CA. We will cover all fields except extensions
, issuerUniqueID
and subjectUniqueID
. There is quite a lot in extensions
, so I will leave it for another post. issuerUniqueID
and subjectUniqueID
are optional and do not appear in google.com's cert. It is also recommended that these not be set6.
Version
In the cert we decoded, the Version
7 is as follows:
Version: 3 (0x2)
Version
is defined as:
# ASN.1
TBSCertificate ::= SEQUENCE {
version [0] Version DEFAULT v1,
...
}
...
Version ::= INTEGER { v1(0), v2(1), v3(2) }
This says the version
field of a certificate can be one of three values - 0 which means the version is v1, 1 which means the version is v2 and 2 which means the version is v3. It defaults to v1, which is 0. This explains why there is a 3
and a 0x2
in the output from openssl
, it's showing the version number (3) and the real value of the field (2).
So what does version
mean? There are three versions of X.509 certificates out in the wild, it just indicates which version of the X.509 spec a given cert is using. I will gloss over the differences in the versions in this post as its not too important for an overview. You will mainly see X.509 v3 certificates in the wild.
Serial Number
The Serial Number
8 is a unique integer given to the certificate. It is unique for all certificates issued from the same CA, e.g. Digicert or Lets Encrypt, not globally unique. Meaning two certs from different CA's could potentially have the same Serial Number
. Serial Number
looks as follows in the cert we decoded:
Serial Number:
9b:f1:8b:d7:9b:46:58:99:02:00:00:00:00:63:98:71
And its ASN.1 definition:
# ASN.1
TBSCertificate ::= SEQUENCE {
...
serialNumber CertificateSerialNumber,
...
}
CertificateSerialNumber ::= INTEGER
It is of type CertificateSerialNumber
, which is just an alias for an integer.
Signature Algorithm
This field 9 must contain the same algorithm as the signatureAlgorithm
field outside the tbsCertficate
which we discussed in the Signature and Signature Algorithm section. And in this case it does:
Signature Algorithm: sha256WithRSAEncryption
This is the signature
field in the TBSCertificate
:
# ASN.1
TBSCertificate ::= SEQUENCE {
...
signature AlgorithmIdentifier,
...
}
We looked at AlgorithmIdentifier
in Signature and Signature Algorithm.
This field is duplicated, why?
This was interesting as the reason for the duplication was not clear. We saw the Signature Algorithm
appear in the top level Certificate
and it also appears inside the TBSCertificate
. Section 4.1.2.3 of RFC 5280, which talks about the signature
field within TBSCertificate
, states:
This field MUST contain the same algorithm identifier as the signatureAlgorithm field in the sequence Certificate (Section 4.1.1.2)
But nowhere in that RFC does it give a reason. Section 1 of RFC 6211 - Cryptographic Message Syntax (CMS) gives some clue as to a possible reason - to prevent algorithm substitution attacks:
The Cryptographic Message Syntax [CMS], unlike X.509/PKIX certificates [RFC5280], is vulnerable to algorithm substitution attacks. In an algorithm substitution attack, the attacker changes either the algorithm being used or the parameters of the algorithm in order to change the result of a signature verification process. In X.509 certificates, the signature algorithm is protected because it is duplicated in the TBSCertificate.signature field with the proviso that the validator is to compare both fields as part of the signature validation process.
I don't know enough about this to comment further. But I am currently investigating. Once I understand more I will write about it.
Issuer
The Issuer
10 field is a unique identifier for the CA issuing this certificate.
Issuer: C = US, O = Google Trust Services, CN = GTS CA 1O1
The cert we decoded was issued by Google Trust Services. Google have a number of CA's under Google Trust Services see https://pki.goog/ for more details. The Issuer
field along with the Serial Number
will uniquely identify a certificate, as long as the Issuer
is a globally trusted CA.
Issuer
is defined as a Name
in the spec:
# ASN.1
TBSCertificate ::= SEQUENCE {
...
issuer Name,
...
}
Name
itself is a bit weird:
# ASN.1
Name ::= CHOICE { -- only one possibility for now --
rdnSequence RDNSequence }
RDNSequence ::= SEQUENCE OF RelativeDistinguishedName
...
RelativeDistinguishedName ::= SET SIZE (1..MAX) OF AttributeTypeAndValue
# AttributeTypeAndValue is defined as follows
AttributeTypeAndValue ::= SEQUENCE {
type AttributeType,
value AttributeValue }
AttributeType ::= OBJECT IDENTIFIER
AttributeValue ::= ANY -- DEFINED BY AttributeType
CHOICE
defines a list of options to pick from. So, Name
is a choice with a single option, an RDNSequence
, and it even says there is only one possibility for now. I don't really know why this is, it could be for forwards compatibility on changes to the spec.
Have a look at the definition of RDNSequence
, it is defined as SEQUENCE OF RelativeDistinguishedName
. This is different than SEQUENCE
which we saw in the Signature Algorithm section. SEQUENCE OF
defines a list of values which are all the same type, in this case they are all RelativeDistinguishedName
.
We haven't seen SET
yet either. This defines a set of objects. SET SIZE (1..MAX) OF AttributeTypeAndValue
defines a set of AttributeTypeAndValue
's of max size MAX
, I'm not sure what value MAX
is bound to here either, will edit and add this information once I realise what it is.
In short, Name
is a list of OID/value pairs where the value is some object bound by that OID. Section 2.3 of RFC 2253 - LDAP v3 describes RelativeDistinguishedName
and its encoding in more depth.
Validity
Validity
11 specifies the time window a certificate is valid between.
Validity
Not Before: Apr 15 20:16:47 2020 GMT
Not After : Jul 8 20:16:47 2020 GMT
The cert we decoded is valid from Apr 15 20:16:47 2020 GMT
to Jul 8 20:16:47 2020 GMT
. The certificate is invalid outside of this timeframe.
# ASN.1
TBSCertificate ::= SEQUENCE {
...
validity Validity,
...
}
...
Validity ::= SEQUENCE {
notBefore Time,
notAfter Time }
Time ::= CHOICE {
utcTime UTCTime,
generalTime GeneralizedTime }
Validity
has two fields which are both of type Time
. Time
can be one of two types, UTCTime
12 or GeneralizedTime
13. Dates before the year 2050 must be encoded as UTCTime
and dates on or after the year 2050 must be encoded as GeneralizedTime
. This is outlined in the definition of Validity in RFC 5280.
Certificate validity and the Brazilian government
I stumbled across this issue on github while on my certificate information seeking adventures. As mentioned above, dates before the year 2050 should be encoded as UTCTime
, but the Brazilian government had their own specification which required the use of GeneralizedTime
for all dates. This is a good example of what you see in an RFC/specification not being exactly what you see in a real system.
Subject and Subject Public Key Info
Subject
14 identifies the owner of the public key in the Subject Public Key Info
15 section , which defines the "thing" this certificate identifies. In this case it's identifying google domains.
Subject: C = US, ST = California, L = Mountain View, O = Google LLC, CN = *.google.com
Its definition is:
# ASN.1
TBSCertificate ::= SEQUENCE {
...
subject Name,
...
}
Name
was explained in the Issuer section, so it should be clear what this is. Subject Public Key Info
has a few fields. Let's look at its ASN.1 definition first.
# ASN.1
TBSCertificate ::= SEQUENCE {
...
subjectPublicKeyInfo SubjectPublicKeyInfo,
...
}
...
SubjectPublicKeyInfo ::= SEQUENCE {
algorithm AlgorithmIdentifier,
subjectPublicKey BIT STRING }
...
Let's see the actual value of Subject Public Key Info
in the certifcate again:
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit)
pub:
04:0f:45:4e:2f:0c:a7:88:9a:b9:24:ff:57:50:dc:
f1:ab:6e:dd:3e:7f:82:26:30:a7:12:9f:81:8a:27:
9d:7d:06:2e:d3:e2:50:3b:ce:6c:2d:2e:5b:32:ce:
7d:eb:86:06:7c:8c:29:2b:47:61:de:f0:ca:f8:b7:
98:00:21:6a:34
ASN1 OID: prime256v1
NIST CURVE: P-256
Public Key Algorithm
defines the algorithm the key can be used with. This is the algorithm
field in the SubjectPublicKeyInfo
field defined above. The rest is the BIT STRING
. The BIT STRING
itself is constrained by the type of algorithm
. The information in the fields the BIT STRING
decodes to is beyond this post. If interested see Section 2.3.5 of RFC 3279 - Algorithms and Identifiers X.509 PKI ... for more on id-ecPublicKey
.
Conclusion
If you made it to here, kudos! I hope this has given you a better understanding of what really makes up a certificate, and the sheer complexity around X.509 in general. There is a lot I glossed over in this post which I hope to drill deeper into in the future.
In the next post I'll cover the TBSCertificate
extensions
fields.
-
ASN.1 - see Introduction to ASN.1. ↩
-
ASN.1 to DER encoding - see ASN.1 encoding rules. ↩
-
Signature
- see Section 4.1.1.3 of RFC 5280. ↩ -
Signature Algorithm
inCertificate
- see Section 4.1.2.3 of RFC 5280. ↩ -
OID for
sha256WithRSAEncryption
- see https://oidref.com/1.2.840.113549.1.1.11. ↩ -
issuerUniqueID
andsubjectUniqueID
- see Section 4.1.2.8 of RFC 5280. ↩ -
Version
- see Section 4.1.2.1 of RFC 5280. ↩ -
Serial Number
- see Section 4.1.2.2 of RFC 5280. ↩ -
Signature Algorithm
inTBSCertificate
- see Section 4.1.1.2 of RFC 5280. ↩ -
Issuer
- see Secion 4.2.1.4 of RFC 5280 ↩ -
Validity
- see Secion 4.2.1.5 of RFC 5280 ↩ -
UTCTime
- see Section 4.1.2.5.1 of RFC 5280. ↩ -
GeneralizedTime
- see Section 4.1.2.5.2 of RFC 5280. ↩ -
Subject
- see Secion 4.2.1.6 of RFC 5280 ↩ -
Subject Public Key
- see Secion 4.2.1.6 of RFC 5280 ↩
Top comments (1)
Great breakdown of an X.509 certificate! I learned that the presence of two Signature Algorithm fields is an intentional duplication and not, as I assumed, one reference to the algorithm used to sign this certificate and another reference to the way the certificate subject signs their own messages to others.
By the way, the pervasiveness of the use of the term SSL vs. TLS in the industry is likely due to its instant familiarity, while using TLS alone (vs. with SSL, as SSL/TLS) takes a moment to spark recognition.
Interestingly, however, TLS technically is SSL! We can all agree that SSL 3.0 and earlier should never again be used, but SSL 3.3 and 3.4 are perfectly fine to use, given TLS 1.0 and 1.1 are also known as SSL 3.1 and 3.2, which are also deprecated.
In other words, the currently recommended versions, TLS 1.2 and 1.3, are also known as SSL 3.3 and 3.4.