This article contains the contents of a talk that I’ve delivered at the office (Pivotal - Toronto!)
The event was an internal thing, a session of three lightning talks at the end of the day.
If you’re even more curious, go check out the series of articles I’m creating around
/proc! You can find all of them in A month of Proc.
Now, onto the transcriptions and slides!
In this very short talk, I want to cover a filesystem that I’m 100% sure that everyone here already used, even if just indirectly.
The reason why I think that, is because I guess everyone, at some point, needed to answer the following two questions:
- how much RAM is my process consuming? and
- how much RAM does my whole system has available?
Naturally, to answer those, you probably used either
ps, or something like
But, what does
/proc has to do with this?
To answer that, we need to go back to operating system stuff, and remember that the OS takes two roles:
- First, managing resources, like scheduling tasks to run in a limited set of CPUs; and
- second, providing abstractions for users to consume these resources that the OS manages.
So, if we think about where software is running in a machine at any given point in time, we can see that it can be running in either one of two possible spaces:
userspace, where code is sandboxed to this thing called process, not being able to touch hardware at any point in time; and
kernelspace, where it has complete access to pretty much all of the hardware, being able to essentially execute whatever instructions the machine is capable of executing.
Given that the user can’t just tell the hardware directly that it wants to know about it, the user program needs to first ask the OS for that info, so that the OS, which can talk directly with the hardware, can then let the user program know such information.
But, how does a process talk with the Kernel in the first place?
How can it ask that question?
The answer is
syscalls - a set of well defined interfaces that allow a user to request a specific service to be completed by the kernel.
So, let’s say we want to count how many bytes a file has :
To do that, we use two system calls:
- one for opening a file, and
- another for reading it.
For us, users, that’s awesome!
We don’t need to care about the semantics of the filesystem where the file lives - all I want to do is read a file, which might live on disk, tape, network or RAM!
It doesn’t matter for the consumer of such contract - It’s up to the Kernel to figure out which filesystem is responsible delagating to a specific filesystem to respond to such call.
So, at this point, you might imagine that there’s a syscall like
get_my_tcp_stats()!!, but that’s not the case.
Whenever a system call gets added, it becomes part of the kernel API, which has to be supported indefinitely, involving a lot of work to document, test, and essentially support it forever!
So, for that reason, there are just a few of them (check
syscall.h header w/ all of them).
Knowing that, this is where
/proc fits in.
Given that Linux abstracts the idea of reading a file, why not change the details of it and swap the concept of reading from disk by simply “writing back specific information”?
For instance, consider the path that a
read to an EXT4 filesystem takes:
read syscall is handled, its arguments are passed down to an abstract interace, the virtual file system, which is then responsible for passing down the actual operation to whoever owns such file - in this case, EXT4.
So, given that the Kernel already has this support for providing an interface that is common between all of the filesystems, which allows filesystems to implement adaptors to adhere to such interface, why not just implement the interface and let the user get the data from a pseudo filesystem?
procfs does! It implements the virtual filesystem interface, providing files that represent operations to retrieve kernel info.
So, what kind of things are exposed? Well, a bunch!
To have an idea of what are some of those methods, we can looks at some of the files that are exposed for networking stuff.
And not only that! It not only support reads, but also writes!
If you’ve needed to use
sysctl before to change things like to total number of open files that your system can handle, or a particular behavior of
tcp, that’s touching