Python’s
`defaultdict`

is an incredibly useful tool
to specify default values for missing keys
(RealPython has a
great primer
on them).

I found I was taking up so much memory to store the value 0 for an algorithm which
counted rather infrequent elements in a huge sequence split into many chunks.
Using a `defaultdict(int)`

for each distinct element
to store which indices had a non-zero count of the respective element
greatly optimised it,
but I yearned for a `defaultlist`

equivalent to standardise this approach and
generally make my code more expressive.

## Humble beginnings

There are some opinionated decisions of what constitutes a “defaultlist”,
so whilst an
excellent implementation already exists by c0fec0de,
I disagreed with some behaviours that led me to implement my own.
My idea of a `defaultlist`

is to be a glorified wrapper of the `defaultdict`

.
We can use the
`MutableSequence`

abstract base class to mirror the interface one would expect from a builtin `list`

.

```
class defaultlist(MutableSequence):
def __getitem__(self, i):
...
def __setitem__(self, i, value):
...
def __delitem__(self, i):
...
def __len__(self):
...
def insert(self):
...
```

Firstly we can initialise an internal `defaultdict`

in `self._ddict`

,
taking a `default_facotry`

argument.
Each key of `self._ddict`

will represent an index of a list.

Note that `defaultdict`

does not require a `default_factory`

and will raise a `KeyError`

when you try to access a missing key in such scenarios.
For a list variant this behaviour would be nonsensical as
missing elements between existing ones really aren’t erroneous,
so we can just make `default_factory`

return `None`

if not specified.

```
def __init__(self, default_factory=None):
self._ddict = defaultdict(default_factory or defaultlist._none_factory)
@staticmethod
def _none_factory():
return None
```

The get and set item methods,
which determine the behaviour of square brackets notation (e.g. `my_dlist[i]`

),
can also directly interface with the internal `defaultdict`

we initialised.
This means that all elements of our `defaultlist`

will default
just like its dictionary counter-part.

```
def __getitem__(self, i):
return self._ddict[i]
def __setitem__(self, i, v):
self._ddict[i] = v
```

A delete item method
(called like `del my_dlist[i]`

)
requires a bit more work.
The `list`

builtin will reindex elements above the deleted index,
and so we can emulate that behaviour
by decrementing the keys of our internal `defaultdict`

.
Conversely an insert method can increment the keys of `self._ddict`

before safely assigning the value at `i`

.

```
def __delitem__(self, i):
# I decided to be agnostic if the element exists or not
try:
del self._ddict[i]
except KeyError:
pass
larger_keys = [k for k in self._ddict.keys() if k > i]
reindexed_subdict = {k - 1: self._ddict[k] for k in larger_keys}
for k in larger_keys:
del self._ddict[k]
self._ddict.update(reindexed_subdict)
def insert(self, i, value):
larger_keys = [k for k in self._ddict.keys() if k >= i]
reindexed_subdict = {k + 1: self._ddict[k] for k in larger_keys}
for k in larger_keys:
del self._ddict[k]
self._ddict.update(reindexed_subdict)
self._ddict[i] = value
```

As we’re treating keys in the `self._ddict`

as indexes,
we can treat the highest existing index as the deciding factor
for what constitutes the length
(e.g. result of `len(my_dlist)`

)
of our `defaultlist`

.

```
def __len__(self):
try:
return max(self._ddict.keys()) + 1
except ValueError:
return 0
```

We are currently assuming positive indices are being passed,
but `list`

does take in negative indices
that translate to the length of the list `n`

minus (or rather plus) the negative index.
We can support this behaviour by inspecting and modifying the passed index
before we actually use it on our internal `defaultdict`

.

```
def _actualise_index(self, i):
if i >= 0:
return i
else:
n = len(self)
if i >= -n:
return n + i
else:
raise IndexError("negative list index larger than list length, unresolvable")
def __getitem__(self, i):
i = self._actualise_index(i)
return self._ddict[i]
def __setitem__(self, i, v):
i = self._actualise_index(i)
self._ddict[i] = v
def __delitem__(self, key: Union[int, slice]):
i = self._actualise_index(key)
try:
del self._ddict[i]
except KeyError:
pass
...
```

As we’ve successfully overridden the `MutableSequence`

interface,
it will conveniently deal with second-order features of a list
such as iteration (`for x in my_dlist`

)
and appending (`my_dlist.append(x)`

).

## Bounding infinity

Currently, iterating over a `defaultlist`

will go on forever.
Iteration of an object is determined by it’s `__iter__()`

method,
and inheriting `MutableSequence`

comes with such a method
which will `__getitem__(i)`

from 0 until an `IndexError`

is raised.
As `defaultlist`

is an infinite field of elements,
no such error is ever raised.

I decided its iteration capabilities should be bounded to the length property (defined earlier).

```
def __iter__(self):
for k in range(len(self)):
yield self._ddict[k]
```

This provides an easy way for user to specify iteration bounds
using slice notation.
Say you were implementing an algorithm that divvied-up a sequence into 100 chunks
and counted in each chunk using `counts = defaultlist(int)`

(i.e. default the count to 0),
then you can specify the first 100 elements of `counts`

when it’s time for a for loop.

```
for count in counts[:100]:
...
```

…that is, if slicing is supported.

We can actually detect the `stop:start:step`

notation—which
is infact a builtin
`slice`

object—in
our `__getitem__()`

method
by inspecting the passed `key`

with an instance check (i.e. `isinstance()`

).

A builtin `list`

handles slices as a range bounded by the list’s length—for
the most part,
check out my
list reference implementation
for a complete specification of how `list`

handles slicing.
We however don’t need to worry about going out-of-bounds
as the `defaultlist`

is essentially an infinite field of elements.
All we need to do then is handle `None`

values
before making a 1-to-1 translation from `slice`

to the `range`

builtin.

```
def __getitem__(self, key):
if isinstance(key, int):
i = self._actualise_index(key)
return self._ddict[i]
elif isinstance(key, slice):
start = key.start or 0
stop = key.start or 0
step = key.step or 1
if key.start is None:
start = 0 if step > 0 else len(self)
else:
start = self._actualise_index(key.start)
if key.stop is None:
stop = len(self) if step > 0 else 0
else:
stop = self._actualise_index(key.stop)
srange = range(start, stop, step)
dlist = defaultlist(self._ddict.default_factory)
dlist.extend(self._ddict[i] for i in srange)
return dlist
```

At this point, we’ve got ourselves a pretty functional `defaultlist`

!

## Filling in the gaps

### Complete slicing support

We can also support slices for setting and deleting elements.
Like with the `insert`

method,
it’s simply a matter of reindexing the internal `defaultdict`

to accommodate for the addition and/or removal of existing elements
… but as you can see,
it’s a bit of a headache to keep track of things.

```
def _determine_srange(self, slice_):
step = slice_.step or 1
if slice_.start is None:
start = 0 if step > 0 else len(self)
else:
start = self._actualise_index(slice_.start)
if slice_.stop is None:
stop = len(self) if step > 0 else 0
else:
stop = self._actualise_index(slice_.stop)
return range(start, stop, step)
def __getitem__(self, key):
if isinstance(key, int):
i = self._actualise_index(key)
return self._ddict[i]
elif isinstance(key, slice):
srange = self._determine_srange(key)
dlist = defaultlist(self._ddict.default_factory)
dlist.extend(self._ddict[i] for i in srange)
return dlist
def __setitem__(self, key, value):
if isinstance(key, int):
i = self._actualise_index(key)
self._ddict[i] = value
elif isinstance(key, slice):
values = list(value) if isinstance(value, Iterable) else [value]
nvalues = len(values)
srange = self._determine_srange(key)
if srange:
srange_min = min(srange[0], srange[-1])
srange_max = max(srange[0], srange[-1])
else:
srange_min = srange_max = srange.start
for i in srange:
try:
del self._ddict[i]
except KeyError:
pass
diff = nvalues - len(srange)
if diff:
keys = list(self._ddict.keys())
threshold = srange_min + nvalues
reindexed_subdict = {}
mid_keys = [k for k in keys if srange_min <= k < srange_max]
for i, k in enumerate(sorted(mid_keys), start=threshold):
reindexed_subdict[i] = self._ddict[k]
larger_keys = [k for k in self._ddict.keys() if k > srange_max]
for k in larger_keys:
i = k + diff
reindexed_subdict[i] = self._ddict[k]
self._ddict.update(reindexed_subdict)
for i, v in enumerate(values, start=srange_min):
self[i] = v
def __delitem__(self, key):
if isinstance(key, int):
i = self._actualise_index(key)
try:
del self._ddict[i]
except KeyError:
pass
larger_keys = [k for k in self._ddict.keys() if k > i]
reindexed_subdict = {k - 1: self._ddict[k] for k in larger_keys}
for k in larger_keys:
del self._ddict[k]
self._ddict.update(reindexed_subdict)
elif isinstance(key, slice):
srange = self._determine_srange(key)
if srange:
for i in srange:
try:
del self._ddict[i]
except KeyError:
pass
srange_min = min(srange[0], srange[-1])
srange_max = max(srange[0], srange[-1])
reindexed_subdict = {}
mid_keys = [k for k in self._ddict.keys() if srange_min <= k < srange_max]
for i, k in enumerate(values, start=srange_min):
reindexed_subdict[i] = self._ddict[k]
larger_keys = [k for k in self._ddict.keys() if k > srange_max]
for k in larger_keys:
i = k - len(srange)
reindexed_subdict[i] = self._ddict[k]
for k in chain(mid_keys, larger_keys):
del self._ddict[k]
self._ddict.update(reindexed_subdict)
```

### Equality checks

We might also want to emulate how `list`

can be equality checked (i.e. `a == b`

notation)
with other `list`

objects.
Equality check behaviour is specified
in the `__eq__(self, other)`

dunder method
of the first instance (i.e. `a`

in `a == b`

).
Rather than caring if a `defaultlist`

is being checked with another `defaultlist`

however,
I found broadly check against all
`Sequence`

-like types
to make it much more versatile.

```
def __eq__(self, other):
if isinstance(other, Sequence):
if len(self) != len(other):
return False
else:
return all(a == b for a, b in zip(self, other))
else:
return False
```

Note that `zip(self, other)`

is using the `__iter__()`

methods
of both the `defaultlist`

instance and `other`

.
This means you can check if an iteration of a `defaultlist`

would actually match up with your expectations
using the nice brackets notation that Python uses for its `list`

.

```
>>> my_dlist = defaultlist(int)
>>> my_dlist[2] = 2
>>> my_dlist == [0, 0, 2]
True
```

### Finding first instances

`MutableSequence`

also adds the
`index()`

method to our `defaultlist()`

automatically,
but like with `__iter__()`

it expects an `IndexError`

from our get method—as
this never comes to fruition,
`index()`

can go on forever.

All this means is we need to override `index()`

to bound the iteration to it’s pseudo-length.
Like with our `__eq__()`

method above,
note that `enumerate(self)`

is calling `__iter__()`

.

```
def index(self, x):
for i, v in enumerate(self):
if v == x:
return i
else:
raise ValueError(f"'{x}' is not in defaultlist")
```

This also fixes the automatic `remove()`

method too,
as that relies upon `index()`

to find the `i`

to then delete.

## Fin

A hopefully production-ready `defaultlist`

is available in my coinflip library,
available in the `coinflip.collections.defaultlist`

namespace.
Its source code is available on
GitHub
(and don’t forget tests).

Huge thanks to c0fec0de for their `defaultlist`

implementation,
which really helped me think deeply about what I wanted the API to be for my own take.
Also I suppose a shout out to Guido, who I *think* implemented the much beloved `defaultdict`

. Maybe it was Raymond Hettinger…?
I couldn’t work this one out—git blame and all—so if anyone knows this then please let me know!