A fundamental skill today is being able to consume APIs, be they RESTful or otherwise. From using standard libraries in C++ to the hyper-object-oriented world of the .NET SDK’s huge array of built-in functionality, APIs are everywhere you look. However, We’re going to focus on a specific kind of API.

A not-insignificant portion of my time is spent figuring out how a given web-based API is shaped; that is, what kinds of requests it can respond to, and what kinds of data can it accept and return. The general term for this kind of exploration has a sort of negative connotation in some circles: reverse engineering. Those two words might bring to mind images of hex editors, command lines and hoodies, but it really is just a general term for figuring out how a system, program or service operates.

In order to do get useful information out of an API, you have to model it. This usually involves writing out data structures which correspond to the requests you need to send and the responses you expect to get back. Converting those structures to your domain is a slightly different topic, but we’ll go over that, as well.

In this series, we’ll talk about my method for reverse engineering and modeling web-based APIs, specifically those using HTTP. There are plenty of non-HTTP-based APIs that operate over the Web, but the simplest place to start is with the simplest kind of API: The REST(ful) API.

Additionally, while I might pick a specific language to show code snippets here, the concepts in this series should be completely agnostic. We’re talking about how to model and consume APIs, not the best possible way to model and consume APIs for your language of choice.

Prerequisites

This series is not for brand new programmers. For this series, you’ll need to have some prior knowledge. I’m going to assume you have basic understanding of the following concepts:

  • Sending and receiving HTTP requests.
  • Plaintext serialization formats, like JSON or XML.
  • Converting to and from JSON (or XML, or whatever serialization format you need to use).
    • This can mean using the built-in serializing functionality in whatever framework you use (System.Text.Json in .NET, e.g.), or using whatever JSON reader functionality exists (Kotlin’s JsonReader, e.g.). Heck, if you’re serializing your own JSON, you probably don’t need this series, but you’re welcome to keep reading!
  • Some knowledge of what REST(ful) APIs are. We’ll get to that in a second.
  • How to use curl. This isn’t a requirement, you can just copy-paste the curl commands I use, but it’s always good to learn what a command is doing before you paste it into your terminal!

Essentially, I’m targeting folks who are getting started on their first job in full-stack/back-end software development, or at least the first one where they need to concern themselves with consunming network resources.

As stated before, the code that follows will be in a specific language: C#. You don’t need to know C# in order to follow along, but it can’t hurt! Any code that is .NET specific will have some explanation on what it does.

Why C# instead of, say, Python?

Because I said so, dangit. Types are fun. Python doesn’t have real types, it has hash maps pretending to be types. Though, more importantly (and seriously), having to handle various data types, especially mismatched types is a really important concept, and, also, .NET has a great built-in JSON parser.1 😃

REST APIs for Fun and Profit

I’m not going to waste your time explaining what a REST(ful) API is. In fact, I’m going to drop the parenthetical there and just refer to them as “REST APIs”, because you can learn all of that somewhere else. There is a multitude of developers who will get very pedantic and explain that most “REST” APIs are hardly RESTful, and then go into detail about how much they hate modern web development, all while never actually telling you what a RESTful API is.

For our purposes, and in flagrant violation of the sacred pact of web developers, I’m going to define a REST API as follows:

Any HTTP-based API that can be used to gather data.

Yep. That’s it. We’re going to agree, for the rest of this article, that all a REST API needs to do is be able to receive some kind of HTTP request, and respond with some kind of response. I’m not going to get particular about whether resources are sorted into correct hierarchies based on featureset and the path, whether GETs should have side effects, whether query strings should be used instead of POST body content, or the cool (imo) possibility of getting a SEARCH HTTP verb. If you can send a request off to a server and get some data back, that’s our “RESTAPI.

Which API?

We’re going to be using an API from one of my favorite services: The Internet Archive API.

For full transparency, the extent of my knowledge of this API comes from my (very brief) work on an Android app I started building. The pourpose of this app was not, indeed, to be an example of how to properly map an API, but working on it is actually one of the things that inspired this blog series.

Where do we start?

The first step is always to check the documentation. Most APIs have documentation. Many even have useful documentation. Most have fairly bad documentation. The Internet Archive’s docs sit somewhere in the middle, and that’s actually a big reason why I decided to use them!

Note, this isn’t going to be a deep dive into how IA’s docs are structures, or how the IA API works.

Key Terms

One of the best ways to tackle any sort of research (this is a life lesson, my dudes) is to know how to find what you’re looking for. Very often, you don’t actually know what you’re looking for, you just know what kind of task you’re trying to accomplish. This is why people use the term “Google-Fu”. Knowing how to craft a search query is really, really important. The best way to craft a good search query is to know the key terms you’re looking for. So, let’s define a few key terms right here:

  • HTTP methods - If you find an HTTP method in documentation, you’ve probably just found the format of the requests they expect to receive. You’ll also most likely find examples of responses that come back when you send those requests. This is the Good Stuff™️. This is the first bit of information we’re going to need to start actually querying our API.
    • GET
    • HEAD
    • POST
    • etc.
  • schema - a schema describes the structure of data. If you’ve ever worked with a database, you know what a schema is, and you know that that is exactly what you’re looking for. This is the Good Stuff™️. This is likely going to describe in detail what kind of data we’re receiving, and will likely be tied to the request information we found in the previous term. This can be a breadcrumb to our request/response documentation.
  • metadata - usually, if you find the term “metadata”, it’ll be tied to the sorts of headers and queries that you can use to access the data the metadata describes.

Let’s take a look at IA’s documentation and see what we can find!

Documentation exploration with a little bit of curling

IA's Documentation sidebar

Right there on the sidebar we can see some promising candidates. The “Item Metadata API” listings in particular should make you click them, especially because it has both read and write sections. That indicates that we’re going to be able to access some form of metadata using the info on those pages.

The overview page is very helpful, but doesn’t give us much to get started with. We’ll circle back around to it later. The Read page is much more promising. Right at the top of the page, they have an example URL complete with the prescription of using a GET request, and then they provide an example GET request as the very next thing! Hey, let’s toss that into curl and see what happens!

This is where things get good. curl https://archive.org/metadata/xfetch > xfetch.json gives us a nice, juicy blob of JSON to work with. Let’s break it down:

{
    "created": 1660841107,
    "d1": "ia600308.us.archive.org",
    "d2": "ia800308.us.archive.org",
    "dir": "/21/items/xfetch",
    ...
}

The first few members of the JSON tell us when this specific Item was created, and tells us two servers where it is hosted. That is an implementation detail of the IA API, so we won’t look too hard at that. Importantly, though, we’ve been told where we can find the resource in question with the dir member. After that, we see a member that’s called files and has a list of JSON objects, all of which look more or less like this:

{
    "name": "xfetch.pdf",
    "source": "original",
    "format": "Text PDF",
    "mtime": "1479169618",
    "size": "419170",
    "md5": "ee68c235c2cfc1007a5ab998d21d643c",
    "crc32": "4db3022d",
    "sha1": "abceccfc74a577ed06a5aeb7bebe365e9ff8946d"
}

That name certainly looks like a filename to me! So, now that we’ve parsed through a few chunks of information about the xfetch item on IA, let’s start putting stuff together. One of the best methods for mapping out an API is just doing it. We have a server name (two of them!), we have a directory path, and we have a file name. What happens if we put those together? Cue curl: curl -LO https://ia600308.us.archive.org/21/items/xfetch/xfetch.pdf. Would you look at that!

xfetch pdf

Using nothing but the documentation and the information gleaned from the metadata, we were able to intuit the process of grabbing the actual data that we want from the API. In this case, that data is a PDF file, but it could be anything!


I think that’s a good enough starting point to be a full post. We established our scoped definition for what a “RESTful API” is, we’ve talked a bit about the tools and knowledge we’re going to need moving forward, and we got started with figuring out what kinds of data we’re going to be dealing with via our API’s documentation. Next time, we’ll dive in to modeling the requests and responses we plan to use to interact with the API.

  1. Just feel lucky I’m not doing this in F#, the greatest programming language in the world.