The increasing complexity of containerized software libraries means the author consumer divide has never been wider.
February 21, 2024
Think about your favorite software library. How did you learn to use it?
You probably started with the README to get an idea of what the thing is, installed it via your favorite package manager, and then did some combination of the following:
looking at examples
reading docs
exploring the function signatures in your IDE
With time, you wanted more from the library and started to expand your knowledge of it. The rationale behind initially confusing functions became apparent. Eventually, you might have even become a power user.
When you first started, there was a knowledge divide between the library's author, who knows where everything is and how it works, and you, the consumer. You needed to bridge some of that divide to start with the library. As time passed and you wanted more from the library, you closed that gap further. Let’s call this gap the author-consumer divide.
The author-consumer divide is unavoidable, as it must be crossed by anyone looking to use any piece of software. I assert that the author-consumer divide is much harder to cross today than it was 10 years ago, and this is significantly hampering modern software delivery.
To see why, let’s first consider how to cross the divide in the world of software libraries.
You can explore source code in your IDE
Functions and variables are (hopefully) named well and commented
Some languages have formal protocols to generate documentation from comments (e.g. Javadocs, Python docstrings)
Statically-typed languages have types for variables and function returns.
With many software libraries, it's quite feasible to learn a library simply by importing it and exploring the public API.
Now consider the modern world of containerized services. A containerized service can be thought of as the cloud-native equivalent of a software library: it is instantiated, it has functions (endpoints) that are callable, the functions have arguments and return values which have a structure to them, and its API is likely versioned.
Yet you (the consumer) have to work much harder to accomplish the same outcome on a containerized service.
Containerized service “library” vs. a regular software library:
Your desired outcome
What you need to do for a regular software library
What you need to do for a containerized service “library”
Get a runtime for the library
Run the library using your language's toolchain. Resource allocation and separation is handled by the OS.
First, get a Docker or Kubernetes cluster.
Then, create an environment (e.g. a Docker network or a Kubernetes Namespace).
Assemble a library’s dependencies
Depending on the library, use your language’s package manager. Transitive dependencies of the correct version are pulled in automatically.
Read the docs to determine the dependency containers (assuming the author has even documented them).
Then, try to find the versions of the dependency containers that work with the container you want to instantiate.
Instantiate the library’s dependencies + the library itself
Look for constructors in the API and call them. Use type hints and in-code comments as necessary.
First, provide a pile of ENTRYPOINT, CMD, and ENV strings - usually with some volume mounts - for all the container’s dependencies.
In the best case, use the Dockerfile for hints on what these need to be.
In the normal case, hope the author remembered to put these in the docs.
Recurse the above for all dependencies, until you’ve instantiated the container you want.
Call library functions
Write libraryObject. in your IDE and tab-complete.
First, figure out what port numbers the container is listening on.
Then, decipher what each port number represents.
Then, figure out the protocol of the port (HTTP? GRPC? Binary?).
Then, get a client that can talk that protocol (curl? psql?).
Then, figure out how to access the port (Do I have to bind the port locally? Do I have conflicts? Is there an endpoint on the web somewhere? Do I need auth?).
Then, figure out the shape of argument data is needed to call the function.
Then, figure out what shape of response data comes back.
Connect one library to another
Pass the objects from Library A to Library B.
Figure out the IP address and port information for Service A.
Then ensure that network connectivity is allowed between the two.
Then ensure that auth is permitted between the two.
Then figure out where and how to pass the IP address and port into Service B.
Inspect library logs
Usually, pass in the same logger that you're using for your application to the library. Search using your IDE.
If you're lucky, your organization invested the money and SRE time to configure a log aggregator that you can use to search.
If not, you need to first find the container ID, then use docker logs or kubectl logs to get the logs, and then filter using grep.
You may need to shell into the container because the logs are getting written to the container filepath.
Debug library logic
Add a breakpoint in library code. Step through with the debugger as needed.
Either figure out how to do remote debugging from your IDE, connected to the container, or resort to rebuilding the container on each iteration loop with print statements.
Sometimes you need to shell into the container to explore the state of the filesystem.
Debug library performance
Use the profiler of your IDE.
If you're lucky, your organization invested the money and SRE time to configure an APM.
If not, first find the container ID, then monitor docker stats or kubectl top to see how the container is performing.
From the size of the rightmost column alone, it’s no wonder that many developers prefer to work with raw binaries over containers.
Fortunately, these problems largely result from the recency of containerization relative to regular software libraries and the corresponding gap in tooling development. For example, the package managers that are now ubiquitous in programming language ecosystems were only developed in the 1990s, the first documentation generators were developed around the same time, and modern IDE debuggers are more recent still. The containerized world has yet to catch up, and the pain of crossing the author-consumer divide is but one symptom.
We’re building the Kurtosis tooling to solve this pain and make building distributed applications as easy as building single-server apps. The next post in this series will discuss the principles we're designing around as we build.
The increasing complexity of containerized software libraries means the author consumer divide has never been wider.
February 21, 2024
In This Article
Join our newsletter
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Think about your favorite software library. How did you learn to use it?
You probably started with the README to get an idea of what the thing is, installed it via your favorite package manager, and then did some combination of the following:
looking at examples
reading docs
exploring the function signatures in your IDE
With time, you wanted more from the library and started to expand your knowledge of it. The rationale behind initially confusing functions became apparent. Eventually, you might have even become a power user.
When you first started, there was a knowledge divide between the library's author, who knows where everything is and how it works, and you, the consumer. You needed to bridge some of that divide to start with the library. As time passed and you wanted more from the library, you closed that gap further. Let’s call this gap the author-consumer divide.
The author-consumer divide is unavoidable, as it must be crossed by anyone looking to use any piece of software. I assert that the author-consumer divide is much harder to cross today than it was 10 years ago, and this is significantly hampering modern software delivery.
To see why, let’s first consider how to cross the divide in the world of software libraries.
You can explore source code in your IDE
Functions and variables are (hopefully) named well and commented
Some languages have formal protocols to generate documentation from comments (e.g. Javadocs, Python docstrings)
Statically-typed languages have types for variables and function returns.
With many software libraries, it's quite feasible to learn a library simply by importing it and exploring the public API.
Now consider the modern world of containerized services. A containerized service can be thought of as the cloud-native equivalent of a software library: it is instantiated, it has functions (endpoints) that are callable, the functions have arguments and return values which have a structure to them, and its API is likely versioned.
Yet you (the consumer) have to work much harder to accomplish the same outcome on a containerized service.
Containerized service “library” vs. a regular software library:
Your desired outcome
What you need to do for a regular software library
What you need to do for a containerized service “library”
Get a runtime for the library
Run the library using your language's toolchain. Resource allocation and separation is handled by the OS.
First, get a Docker or Kubernetes cluster.
Then, create an environment (e.g. a Docker network or a Kubernetes Namespace).
Assemble a library’s dependencies
Depending on the library, use your language’s package manager. Transitive dependencies of the correct version are pulled in automatically.
Read the docs to determine the dependency containers (assuming the author has even documented them).
Then, try to find the versions of the dependency containers that work with the container you want to instantiate.
Instantiate the library’s dependencies + the library itself
Look for constructors in the API and call them. Use type hints and in-code comments as necessary.
First, provide a pile of ENTRYPOINT, CMD, and ENV strings - usually with some volume mounts - for all the container’s dependencies.
In the best case, use the Dockerfile for hints on what these need to be.
In the normal case, hope the author remembered to put these in the docs.
Recurse the above for all dependencies, until you’ve instantiated the container you want.
Call library functions
Write libraryObject. in your IDE and tab-complete.
First, figure out what port numbers the container is listening on.
Then, decipher what each port number represents.
Then, figure out the protocol of the port (HTTP? GRPC? Binary?).
Then, get a client that can talk that protocol (curl? psql?).
Then, figure out how to access the port (Do I have to bind the port locally? Do I have conflicts? Is there an endpoint on the web somewhere? Do I need auth?).
Then, figure out the shape of argument data is needed to call the function.
Then, figure out what shape of response data comes back.
Connect one library to another
Pass the objects from Library A to Library B.
Figure out the IP address and port information for Service A.
Then ensure that network connectivity is allowed between the two.
Then ensure that auth is permitted between the two.
Then figure out where and how to pass the IP address and port into Service B.
Inspect library logs
Usually, pass in the same logger that you're using for your application to the library. Search using your IDE.
If you're lucky, your organization invested the money and SRE time to configure a log aggregator that you can use to search.
If not, you need to first find the container ID, then use docker logs or kubectl logs to get the logs, and then filter using grep.
You may need to shell into the container because the logs are getting written to the container filepath.
Debug library logic
Add a breakpoint in library code. Step through with the debugger as needed.
Either figure out how to do remote debugging from your IDE, connected to the container, or resort to rebuilding the container on each iteration loop with print statements.
Sometimes you need to shell into the container to explore the state of the filesystem.
Debug library performance
Use the profiler of your IDE.
If you're lucky, your organization invested the money and SRE time to configure an APM.
If not, first find the container ID, then monitor docker stats or kubectl top to see how the container is performing.
From the size of the rightmost column alone, it’s no wonder that many developers prefer to work with raw binaries over containers.
Fortunately, these problems largely result from the recency of containerization relative to regular software libraries and the corresponding gap in tooling development. For example, the package managers that are now ubiquitous in programming language ecosystems were only developed in the 1990s, the first documentation generators were developed around the same time, and modern IDE debuggers are more recent still. The containerized world has yet to catch up, and the pain of crossing the author-consumer divide is but one symptom.
We’re building the Kurtosis tooling to solve this pain and make building distributed applications as easy as building single-server apps. The next post in this series will discuss the principles we're designing around as we build.