Cover image by Shannon Potter on Unsplash
If you do web development you will, at some point, encounter 3 particular term: URI, URL and URN (this is not so familiar but you may have encountered ARNs in AWS).
You may also have seen URI and URL being used interchangeably, but it's important to note they are not the same thing even if they are used for very similar purposes: finding things and finding things on the internet.
Let's break down what do those acronyms mean:
- URI stands for Uniform Resource Indicator
- URL stands for Uniform Resource Locator
- URN stands for Uniform Resource Name
URLs and URNs are specific classifications of URIs.
It happens that URIs are very different between each other (from rfc3986#section-1.1.2):
ftp://ftp.is.co.za/rfc/rfc1808.txt
http://www.ietf.org/rfc/rfc2396.txt
ldap://[2001:db8::7]/c=GB?objectClass?one
mailto:John.Doe@example.com
news:comp.infosystems.www.servers.unix
tel:+1-816-555-1212
telnet://192.0.2.16:80/
urn:oasis:names:specification:docbook:dtd:xml:4.1.2
Follow me in a deep dive into URIs, URLs and URNs and some good old RFC digging. Put on your safety π₯½, grab your β and let's go!!
What's a URI
A Uniform Resource Identifier is a generic way to uniquely identify any resource.
The complete definition is in RFC 3986, where you can hunt for all the details.
It takes the form of a string with this syntax:
URI = scheme ":" ["//" authority] path ["?" query] ["#" fragment]
There are 5 components:
-
scheme
(required), arbitrary but there are popular ones likemailto
,https
,ftp
orarn
-
authority
(optional), for user information and top level namespace (usually a domain or IP address) with the syntax
authority = [ userinfo "@" ] host [ ":" port ]
path
(required but can be empty - you know, parsers π€·), a hierarchical structure separated by/
query
(optional), it starts with?
and can contain?
and/
(likepath
)fragment
(optional), it starts with#
until the end of the URI
This is quite convoluted and the RFC is incredibly detailed. The Wikipedia page for URI helps!
It's important to understand the URI as it's the foundation on which URL and URN are based.
What's a URL
The Uniform Resource Locator is a string representation to for a resource available via the Internet.
It has it's own RFC, RFC 1738, where we find all the familiar names and strings we see as Web developers.
It defines some specific schemas we all know and love:
ftp File Transfer protocol
http Hypertext Transfer Protocol
gopher The Gopher protocol
mailto Electronic mail address
news USENET news
nntp USENET news using NNTP access
telnet Reference to interactive sessions
wais Wide Area Information Servers
file Host-specific file names
prospero Prospero Directory Service
(wait, what is wais
??? Think I'm too young for that!)
and it defines the usual "Internet" scheme syntax for all URLs schemes that involve usage of an IP-based protocol:
//<user>:<password>@<host>:<port>/<url-path>
I'm young enough that I basically only used ftp
, http
and mailto
! (And well.. watching Star Wars over telnet
π) Did you use some of the others? Let me know in the comment, I want to read your story!!
What's a URN
Quite simply, it's a URI with the urn
scheme. URNs are location indipendent and persistent identifiers.
This means there is only 1 unique URN for a given resource in a given namespace forever (or until that resource doesn't exist any more).
URNs are defined by RFC 8141.
Their properties of being location indipendent and persistent makes them useful for some very interesting use cases, especially.
Their syntax definition (rfc8141#section-2) is quite more complex, here a simplified version:
URN = "urn" ":" NID ":" NSS [ "?+" r-component ] [ "?=" q-component ] [ "#" f-component ]
This is more easily akin to a URI with multiple components:
-
urn
is thescheme
-
NID
(required), the namespace identifier -
NSS
(required), the namespace specific string -
r-component
(optional), query parameters to pass to URL resolution services, note that it's used is discouraged: "Thus, r-components SHOULD NOT be used for URNs before their semantics have been standardized." -
q-component
(optional), query parameters for the named resource or the service supplying the named resource -
f-component
(optional), a fragment representing the location or region for the named resource, ignored during URN equivalence operations.
It should be noted that a public registry for URNs namespaces exists, and is maintained at IANA.
Does it mean you need to register a namespace before using it? No if you plan to use it internally, yes if you want it to be internet-global (like xmpp
or uuid
So cool but where to use them?
AWS
If you have experience with Amazon Web Services you will have encountered ARNs: Amazon Resource Names. By their definition:
Amazon Resource Names (ARNs) uniquely identify AWS resources. We require an ARN when you need to specify a resource unambiguously across all of AWS, such as in IAM policies, Amazon Relational Database Service (Amazon RDS) tags, and API calls.
Sounds familiar? The format too is very URN-like (there are different formats, look at the docs!):
arn:partition:service:region:account-id:resource-type/resource-id
From the look of it it does not seem to be a RFC-compliant URN, but it's extremely similar.
GCP
Google Cloud Platform relies on URIs to identify resources on the platform.
(Resources names](https://cloud.google.com/apis/design/resource_names) are schema-less URIs similar to:
logging.googleapis.com/projects/myproject123/locations/global/buckets/my-bucket
logging.googleapis.com
is the authority
, the path
the resource. Being the path
hierarchical is possible to represent GCP resource structure this way (project -> collection -> resource).
Another at-scale example is LinkedIn:
URNs are used to represent foreign associations to an entity (persons, organizations, and so on) in an API. A URN is a string-based identifier with the format:
urn:{namespace}:{entityType}:{id}
Express foreign keys
Simple relational database design generally rely on (autoincrementing) int
for rows IDs in tables. This system is effective and works in a single database scenario.
When scaling to multiple DBs or distributed applications (es microservices) using integers is not enough anymore. Some common problems are:
- conflicting autoincrementing numbers: being auto incremental they are exposed to possible race conditions when creating records
- too generic: the system (or it's operators) is not able to know only by looking at the ID what kind of resource that ID refers to. If you think is not that important, Atlassian recently blew up 883 customer's websites due to a similar confusion: a script included IDs for websites and not apps in the Atlassian backend ecosystem. Those IDs were then used for deletion, but the thing deleted wasn't, as expected, the customer app instance but their entire website.
Do you have any other examples of URNs being used in systems? I'm curious to know about them so please let me know in the comments!
Top comments (0)