Saturday, November 28, 2015

LRU Cache in Java - Part 2

Introduction

In the previous post I described LRU cache notion and showed how it can be easily implemented using LinkedHashMap data structure. But sometimes there is a requirement to implement LRU cache in Java without using such a convenient thing as LinkedHashMap. A good example of this is an interview question. So let's take a look at how this can be done.

Implementation

This time I'm going to use plain old HashMap to guarantee constant-time access/modification performance and build doubly linked list out of the cache elements manually to speed-up least recently used element tracking.

To begin with let's define class Node which represents an element in a doubly linked list.
class Node<K, V> {
    K key;
    V value;

    Node next;
    Node prev;

    Node(final K key, final V value) {
        this.key = key;
        this.value = value;
    }
}
Since this class is an implementation detail of our LRU cache abstraction, let's make it a private nested class. Should it be a static nested class or non-static one (a.k.a. inner class)? It's a good question. Let's take a look at pros and cons of either approach.

Node as Inner Class

The following code snippet shows how class Node can be implemented as an inner class of enclosing LRUCache class:
public class LRUCache<K, V> {
    // ...
    private Node head;
    private Node tail;
    // ...
    private class Node {
        K key;
        V value;

        Node next;
        Node prev;

        Node(final K key, final V value) {
            this.key = key;
            this.value = value;
        }
    }
    // ...
}
Class Node is non-static, hence generic type parameters of class LRUCache (K and V) are accessible from it. So there's no need to implement class Node as a generic one. This makes code a bit more compact, but introduces additional memory cost per Node instance for tracking a reference to enclosing LRUCache instance (usually 4 bytes for 32-bit machines and 8 bytes for 64-bit machines). For futher information about inner classes and enclosing instances I suggest reading this section of JLS.

Obviously, additional memory costs introduced by the usage of inner classes are not suitable for designing a high-performance data structure like cache. So let's take a look at the other approach.

Node as Static Nested Class

The following code snippet shows how class Node can be implemented as a static nested class of LRUCache class:
public class LRUCache<K, V> {
    // ...
    private Node<K, V> head;
    private Node<K, V> tail;
    // ...
    private static class Node<K1, V1> {
        K1 key;
        V1 value;

        Node<K1, V1> next;
        Node<K1, V1> prev;

        Node(final K1 key, final V1 value) {
            this.key = key;
            this.value = value;
        }
    }
    // ...
}
Note that class Node is made generic with its own set of type parameters (K1 and V1) because type parameters of class LRUCache (K and V) are not accessible from static context. Strictly speaking,  type parameters of class Node could also be named K and V. Compiler always knows which one is meant from the context. I decided to name them differently to increase code readability.

This approach is a bit more verbose but much more efficient when it comes to memory usage. So let's stick to it in our LRUCache implementation.

Operations on Doubly Linked List

In order to implement LRUCache we need to be able to do the following operations on doubly linked list:
  • insert new node into the head position of the list (head of the list represents the most recently used element in the cache);
  • remove a node from any position in the list (when a node is accessed it should be removed from its current position in the list and placed in the head position as the most recently used one).
Here's how the aforementioned operations can be implemented. Implementations are pretty self-explanatory, so I won't describe them in details:
/**
 * Inserts a node to the beginning of the list (in head position).
 * @param node new node to be inserted
 */
private void insertNode(final Node<K, V> node) {
    node.next = head;
    node.prev = null;

    if (head != null) {
        head.prev = node;
    }

    head = node;

    if (tail == null) {
        tail = head;
    }
}

/**
 * Removes a node from the list.
 * @param node a node to be removed
 */
private void removeNode(final Node<K, V> node) {
    if (node.prev != null) {
        node.prev.next = node.next;
    } else {
        head = node.next;
    }

    if (node.next != null) {
        node.next.prev = node.prev;
    } else {
        tail = node.prev;
    }
}

Operations on Cache

Cache abstraction needs two operations in the simplest case:
  • get value from the cache corresponding to a certain key;
  • insert/update value in the cache corresponding to a certain key.
Here they are:
/**
 * Returns a value corresponding to the provided key.
 * @param key key to be searched
 * @return value corresponding to the key, or null if the key is missing
 */
public V get(final K key) {
    if (map.containsKey(key)) {
        final Node<K, V> node = map.get(key);

        // remove node from its current position
        // and insert it to the beginning of the list
        removeNode(node);
        insertNode(node);

        return node.value;
    }

    return null;
}

/**
 * Inserts key-value pair in the cache. If cache size exceeds capacity
 * then the least recently used element is evicted from cache.
 * @param key key to be inserted
 * @param value value corresponding to the key
 */
public void set(final K key, final V value) {
    if (map.containsKey(key)) {
        // replace old value with the new one
        final Node<K, V> node = map.get(key);
        node.value = value;

        // remove node from its current position
        // and insert it to the beginning of the list
        removeNode(node);
        insertNode(node);
    } else {
        final Node<K, V> node = new Node<>(key, value);

        // remove least recently used node
        if (map.size() >= capacity) {
            map.remove(tail.key);
            removeNode(tail);
        }

        // insert new node in the beginning of the list
        insertNode(node);
        map.put(key, node);
    }
}

All Together

Now it is time to take all the pieces together. Let's add missing instance fields (capacity and map), create a constructor and build a complete LRUCache implementation from start to end:
/**
 * This class is an implementation of LRU (least recently used) cache.
 * It is NOT thread-safe and uses HashMap under the hood.
 * @param <K> key type
 * @param <V> value type
 */
public class LRUCache<K, V> {
    private final int capacity;
    private final Map<K, Node<K, V>> map = new HashMap<>();

    private Node<K, V> head;
    private Node<K, V> tail;

    /**
     * Creates new LRUCache instance of the specified capacity.
     * @param capacity cache size
     */
    public LRUCache(final int capacity) {
        this.capacity = capacity;
    }

    /**
     * Returns a value corresponding to the provided key.
     * @param key key to be searched
     * @return value corresponding to the key, or null if the key is missing
     */
    public V get(final K key) {
        if (map.containsKey(key)) {
            final Node<K, V> node = map.get(key);

            // remove node from its current position
            // and insert it to the beginning of the list
            removeNode(node);
            insertNode(node);

            return node.value;
        }

        return null;
    }

    /**
     * Inserts key-value pair in the cache. If cache size exceeds capacity
     * then the least recently used element is evicted from cache.
     * @param key key to be inserted
     * @param value value corresponding to the key
     */
    public void set(final K key, final V value) {
        if (map.containsKey(key)) {
            // replace old value with the new one
            final Node<K, V> node = map.get(key);
            node.value = value;

            // remove node from its current position
            // and insert it to the beginning of the list
            removeNode(node);
            insertNode(node);
        } else {
            final Node<K, V> node = new Node<>(key, value);

            // remove least recently used node
            if (map.size() >= capacity) {
                map.remove(tail.key);
                removeNode(tail);
            }

            // insert new node in the beginning of the list
            insertNode(node);
            map.put(key, node);
        }
    }

    /**
     * Represents a node in a doubly linked list which is used to track
     * least recently used element in the cache.
     * @param <K1> key type
     * @param <V1> value type
     */
    private static class Node<K1, V1> {
        K1 key;
        V1 value;

        Node<K1, V1> next;
        Node<K1, V1> prev;

        Node(final K1 key, final V1 value) {
            this.key = key;
            this.value = value;
        }
    }

    /**
     * Inserts a node to the beginning of the list (in head position).
     * @param node new node to be inserted
     */
    private void insertNode(final Node<K, V> node) {
        node.next = head;
        node.prev = null;

        if (head != null) {
            head.prev = node;
        }

        head = node;

        if (tail == null) {
            tail = head;
        }
    }

    /**
     * Removes a node from the list.
     * @param node a node to be removed
     */
    private void removeNode(final Node<K, V> node) {
        if (node.prev != null) {
            node.prev.next = node.next;
        } else {
            head = node.next;
        }

        if (node.next != null) {
            node.next.prev = node.prev;
        } else {
            tail = node.prev;
        }
    }
}
Thank you for reading!
I hope you enjoyed the article :)

Thursday, November 19, 2015

LRU Cache in Java - Part 1

Introduction

Caching is an optimization technique which can dramatically boost application performance when used appropriately. There are a lot of different approaches to implement caching ranging from simple homegrown classes to full-blown caching solutions. In this post I will demonstrate how to implement LRU (least recently used) cache in Java in a few lines of code.

What is LRU Cache?

LRU cache is a data structure which stores key-value pairs and has fixed capacity. When an attempt is made to add an element to a cache, current cache size (number of key-value pairs already in the cache) is compared to its capacity (maximum allowed number of key-value pairs). Further actions depend on the result of the comparison:
  • If size is less than capacity then:
    1. New key-value pair is added to the cache.
    2. New key-value pair is marked as the most recently used one.
  • Otherwise:
    1. The least recently used key-value pair is evicted from the cache.
    2. New key-value pair is added to the cache.
    3. New key-value pair is marked as the most recently used one.
In other words, LRU cache is a key-value mapping data structure which grows to a certain size and removes the least recently used key-value pairs to prevent exceeding given capacity.

Implementation

In order to be efficient LRU cache implementation should carry out the following operations in a constant (or at least amortized constant) time:
  • Find a value with the given key.
  • Insert (replace) a value with the given key.
  • Figure out the least recently used key-value pair.
The first and the second property can be easily implemented with a hash map. The goal of the third one can be achieved with a linked list. After looking up an item in the hash map it can be moved to the head of the linked list (item's position in the hash map remains unchanged). Hence the tail of the list will always contain the least recently used item.

As you can see, we need a kind of hybrid data structure which stores its elements in a hash map, but at the same time maintains certain ordering of the elements in a linked list. Do we need to implement it from scratch? The answer is no. Java Collections Framework provides such a data structure: meet LinkedHashMap.

In accordance with documentation, LinkedHashMap is a hash table and linked list implementation of the Map interface, with predictable iteration order. This implementation differs from HashMap in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order).

Unfortunately, insertion-order doesn't work for LRU cache implementation. Is there a way to tell LinkedHashMap to use access-order instead? Luckily, there is a constructor which allows for doing this:
public LinkedHashMap(int initialCapacity, float loadFactor,
        boolean accessOrder) {
// ...
}
Assigning true to the accessOrder parameter will tell LinkedHashMap instance to use access-order instead of insertion-order.

So far so good, now we need a way to enforce cache eviction policy. One approach is to find all methods which insert elements in a LinkedHashMap (put, putAll, putIfAbsent, etc.) and override them appropriately. However this approach is fragile and cumbersome. A better one exists. There is a method called removeEldestEntry which is invoked after new element is inserted in a LinkedHashMap to find out if the eldest entry should be removed. It is the best place to enforce cache eviction policy.

Now it is time to gather all the pieces of the puzzle together. Here is the complete implementation of LRU cache in Java:
public class LRUCache<K, V> extends LinkedHashMap<K, V> {
    private int capacity;

    public LRUCache(final int capacity) {
        super(16, 0.75F, true);
        this.capacity = capacity;
    }

    @Override
    protected boolean removeEldestEntry(final Map.Entry<K, V> eldest) {
        return size() >= capacity;
    }
}
Thanks for reading!

Thursday, November 5, 2015

Secret Way of Returning Array from Method in Java

Introduction

This post is like a little travel in time. It is quite short and doesn't bring a lot of practical sense, but rather a small historical trick that was known better in the early days of Java. So without further ado let me show you ancient and deprecated but still working way of declaring a method that returns an array.

Traditional Way

Just for completeness let me give a small example of traditional syntax used to define a method which returns an array in Java:
int[] traditionalMethod() {
    return new int[] {1, 2, 3};
}

Old School Way

And here is the old school way which was better known in the early days of Java, but now can be found probably only in JLS:
int oldSchoolMethod() [] {
    return new int[] {1, 2, 3};
}
Notice how squared brackets are placed after parentheses. It is worth noting that this way of declaring a method which returns an array is considered deprecated, but I think it is still interesting to know this old funny trick :)