How arrays work one level below the usual abstractions, from RAM layout to Two Sum
Intro
The array is the most fundamental data structure in computer science. To build a solid mental model of how arrays work, it helps to go one level lower, straight to the RAM. Once you see how the hardware is laid out, the behavior of an array follows from it.
With that mental model in mind, it becomes easy to understand how arrays behave under the hood, how std::vector in C++ and list in Python behave differently and how arrays can be used to solve Two Sum, one of the classic LeetCode problems.
Think of the RAM as a Huge Array
The main memory (DRAM) of a computer is a large collection of memory cells, which are basically tiny boxes that can store a small piece of data. Think of a massive warehouse filled with row after row of storage shelves.
For the CPU to use these boxes, it needs a way to reference them and that’s why memory addresses exist. Every cell in the RAM holds exactly 8 bits (1 byte) of information and carries a unique address, a number that increases sequentially and is usually written in hexadecimal, something like 0x7ffe5367e044. It is like labeling every shelf in the warehouse with its own unique serial number, counted straight up from zero.
Now take a step back and the layout becomes a long chain of data. Every byte sits at a fixed position in sequential order, so the RAM is logically nothing more than one gigantic one-dimensional array, baked into the hardware itself.
This hardware array forces two constraints on every piece of software running above it.
First, the CPU can do exactly two things with a cell:
Read the byte at a given address.
Write a byte to a given address.
There is no hardware instruction that inserts a new block between two existing ones and none that deletes a shelf to make the warehouse smaller. The size of the physical array is fixed and so is the ordering. Every higher-level data structure has to be built out of those two primitives.
Second, a box is reached by going straight to its address, not by scanning the others to find it. That is what makes a single memory access O(1). The time to read or write a cell does not depend on where in memory it sits.
Arrays in Memory
So far the whole RAM has been one big array, but a program almost never gets all of it. Instead it requests a small, contiguous slice from the OS and that slice is what’s usually called an array.
The contiguous part is what makes it fast and useful. Because all the elements in an array sit right next to each other, the address of any element is just a small calculation, base + index * elementSize, where base is the address of the first element. Example: Element 7 of an int-array that starts at address 1000 lives at 1000 + 7 * 4 = 1028. The CPU doesn't search for it, it computes the address and reads it directly, which is why element 7 and element 7000 cost the same single read. The real walltime of walking the array is a different question, once the cache decides, but that comes later. For now this is the O(1) access from the last section, with the arithmetic behind it.
But the speed comes with a fundamental tradeoff. The size of the array must be fixed the moment you allocate it, because the slice has neighbors on both sides, whom you don’t want to rob. The order is fixed too, so inserting a value in the middle means pushing every element behind it one cell over to make room. For an array of n elements that's up to n shifts, which makes a middle insert O(n). Deleting works the same way in reverse.
Sorting the array adds one more rule on top of this. The elements stay in order, which makes search cheap. Instead of checking elements one by one, you look at the middle element and ask whether your target sits in the left or right half. Because the array is ordered, that one comparison tells you which side to keep, so you throw the other half away and repeat. Each step halves what's left, which brings search down from O(n) to O(log n). The catch is that keeping the order isn't free. Every insert has to find its spot first and then shift the rest, so you trade O(n) writes for cheaper reads.
Dynamic Array
As mentioned in the last section the array has a hard limit. You need to fix its size when you allocate it and once every cell is full there's nowhere to grow into. So what do you do when you need to append one more value and there's no room?
The trick is to not grow the array at all. You allocate a new, bigger block somewhere else in memory, copy every existing element into it, free the old block and switch to the new one. A dynamic array wraps all of this so you never see it happen. C++ calls it std::vector and Python calls it list. In both cases an append looks like a single cheap operation, even though it sometimes triggers an expensive full copy underneath.
The interesting question is how much bigger the new block should be. Say you grow by exactly one cell each time. Then every append has to copy the entire array, which is n copies for n appends, so building it up is O(n²). The more efficient way is to grow geometrically. When the array runs out of space you allocate double the size, so a copy only happens at 1, 2, 4, 8, 16 elements and gets rarer the bigger the array becomes at the cost of over-allocating memory. Spread the cost of those occasional copies across all the cheap appends in between and each one works out to O(1) on average. That's what amortized O(1) means: most appends just write into a free cell, a few are expensive and the average stays constant.
This is also where C++ and Python start to behave differently and that's next.
Arrays in Python and C++
Both languages give you a dynamic array, but what sits inside the block is not the same thing and that difference decides how fast they are.
A C++ std::vector<int> stores the integers themselves, packed back to back (a std::array if you want it fixed-size). Element 7, for example, is four bytes after element 6, exactly the base + index * elementSize layout from before. When you walk through the vector, the CPU reads one contiguous run of memory and the hardware is built for that. It loads memory in chunks called cache lines (usually 64 bytes), so reading one int usually pulls the next fifteen along with it for free and a prefetcher notices the straight-line pattern and fetches ahead. The data you want is usually already close to the CPU.
A Python list looks the same from the outside but stores something else inside. The block is contiguous too, but it holds pointers, not values. In Python everything is an object, so every integer lives on the heap as its own object and the list just keeps the addresses. Reading one element means loading the pointer from the block, then following it to wherever the actual int lives. That extra hop on every read is what makes a list of numbers slower to walk through than a flat block of values. This is why heavy numerical work in Python uses NumPy, whose arrays store the raw values back to back instead of pointers.
Same idea, implemented in a different way. C++ keeps the values where the math says they are, Python keeps a tidy table of pointers to values living all over the heap. Both index in O(1), but the constant hiding inside that O(1) is much smaller when the data is actually contiguous. That gap is exactly what shows up when you run Two Sum next.
Two Sum
Two Sum gives you an array of integers plus a target, then asks for the indices of the two numbers that add up to it. For [2, 7, 11, 15] with target 9 the answer is [0, 1], because the values at those indices sum to 9.
The most direct way is to check every pair. For each element you walk the rest of the array and ask whether it completes the target.
def two_sum(nums, target):
for i in range(len(nums)):
for j in range(i + 1, len(nums)):
if nums[i] + nums[j] == target:
return [i, j]
Every line in here is the one array operation from the whole post: nums[i] and nums[j] are index lookups, each an O(1) read straight to an address. These single reads are cheap. The cost is in how many of them are needed to solve the problem. The outer loop runs n times, the inner loop runs up to n times for each, so you do on the order of n² reads. That's O(n²). For ten elements it's nothing. For a million it's a trillion reads. That's where you can use your laptop as a radiator.
The same code, submitted twice on LeetCode: Python at 1719 ms, C++ at 40 ms. Same algorithm, same O(n²), a factor of forty between them. Most of that is the interpreter, CPython runs every read and every add as bytecode while C++ compiles straight to machine instructions. The memory layout from the last section is the other part: the C++ reads stay in cache lines, the Python reads chase pointers across the heap. The big-O is identical, the wall clock is not.
So the array gives you the fast part for free, the cheap indexed access and leaves you with the expensive part, the sheer number of pairs. Getting rid of that quadratic count is the actual problem and you can't do it with the array alone. The usual fix is a hash map, which trades extra memory for a single O(n) pass. That is a structure of its own, built on the array you just saw. The rest of the series goes the same way, from the hardware up.