Misapplied Math

Trading, Data Science, CS

There's a major crosswalk bisecting one of the busiest streets in Chicago; I stop there on a near daily basis as said crosswalk also leads to the best running trail downtown. On sunny weekends, crowds grow fifty pedestrians large before the light turns red, alleviating the congestion. It's a great place to people watch, especially if you're interested in crowd dynamics or game theory. The size of the crowd aside, a few things make this particular crosswalk special:

  • Cross traffic is sparse, and in the absence of a car, the light will never turn without a pedestrian hitting the walk button.
  • The crowds are large enough to form human traffic jams at times. When this happens, there's ambiguity as to who arrived first, and who should hit the button.
  • It's a tourist heavy spot, so most pedestrians aren't aware of how long the wait is, or the aforementioned rules of the game.

Having studied it casually for a year, I feel justified in my habit of always hitting the crosswalk button.

The Volunteer's Dilemma refers to a broad set of game theoretic problems applicable to everything from wireless network design to employee incentive packages. When a stranger murdered Kitty Genovese outside of her Brooklyn apartment while thirty seven bystanders watched without so much as calling the police, the Volunteer's Dilemma went mainstream. Economists demonstrated an unintuitive result: the size of the crowd worked to her detriment, and not to her favor.

Assume that individuals witnessing a crime will act independently and call the police with probability pp. For Kitty, the probability of no one calling, (1p)37(1 - p)^{37} (a minuscule quantity even for depressingly small values of pp) isn't congruous with the outcome. Instead, assume that an individual will always call the cops if they know with certainty that no one else will. However, no one wants to deal with paperwork at 3am. Given the option, witnesses will bank on another member of the group rising to the occasion. It's a simple but intuitive model of how self motivated agents act in a group.

To make this notion concrete, consider the following payout scheme. Each witness has the option of calling the police. If no one calls, all players receive a payout of 00. If at least one player calls, all players who didn't call receive a payout of 11, and those who did receive a payout of 12\frac{1}{2}. It's easy to see that there are no pure-strategy Nash Equilibrium (NE) for this game. However, we can find a mixed-strategy NE quite easily.

Players will randomize iif the payout from each of kk actions is equal. We denote the probability of an individual calling in the mixed-strategy NE as pp^*. Thus:

U(call)=U(dont call)=1P(at least one other person calls)=12P(at least one other person calls)=1(1p)n1p=1(12)1n1\begin{aligned} &U(\text{call}) = U(\text{don't call}) = 1 * P(\text{at least one other person calls}) = \frac{1}{2} \\ &P(\text{at least one other person calls}) = 1 - (1-p)^{n - 1} \\ &p^* = 1 - \left(\frac{1}{2}\right)^{\frac{1}{n - 1}} \end{aligned}

It follows immediately that the probability of no one calling is (1p)n=(12)nn1(1-p^*)^n = \left(\frac{1}{2}\right)^\frac{n}{n - 1}. That function is monotonically increasing in nn, asymptotically approaching a max of 12\frac{1}{2}. As nn increases, the marginal importance of any one player is diminished. The above holds as NE requires that it's impossible for a player to alter their strategy in a manner that ensures a uniformly superior outcome. Given that all players know the equilibrium probability, they're happy to use a weighted coin as the basis of their decision – there's no way to outperform. When n=2n = 2, our equilibrium probability is 12\frac{1}{2}. In the case of Kitty, where n=37n = 37, p.02p \approx .02, and the probability of no one calling is .49\approx .49.

As is commonplace in game theory, the equilibrium for a repeated play game looks quite different. In fact, it's relatively easy to show that a strategy in which all nn players take turns volunteering is both utility maximizing and a NE. So there you have it, the age old adage ``play nice with others'' has a mathematical justification, as long as your play group stays the same.


Testing Custom Java Collections

Over the years I've built up a suite of unit tests for various types of java collections. When I started doing so, I didn't know of any good off-the-shelf means of generic collections testing. The JDK itself has some unit/regression tests, but not as many as you would think, and that aside they rely on a custom build system/testing framework.

I've used google-guava in a few projects but recently discovered their not so well advertised test support library: guava-testlib. I tried it out out on my most recent project (a custom random access list with O(1)\O(1) deque operations) and I'm very impressed with the test coverage and flexibility. The documentation is sparse but once you get it working you'll have a test suite with hundreds to thousands of unit tests generated for your list, map, or set. Guava-testlib also generates tests for the non-standard collections in google-guava.

The library relies on JUnit, and most of what I found online detailing its use pertains to JUnit 3. There was a substantial API break between JUnit 3 and JUnit 4, as JUnit 3 relies on naming conventions and inheritance whereas JUnit 4 uses less intrusive annotations. I struggled for a bit trying to figure out how to get things working without resorting to the old JUnit 3 APIs. I arrived at something that works, but it's still a little dirty as it indirectly relies on a magic static method named suite() from the JUnit 3 days.

ArrayListTest.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
package com.indigo.collections;

import com.google.common.collect.testing.*;
import com.google.common.collect.testing.features.CollectionFeature;
import com.google.common.collect.testing.features.CollectionSize;
import com.google.common.collect.testing.features.ListFeature;
import junit.framework.TestSuite;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Suite;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

/**
 * Your test class must be annotated with {@link RunWith} to specify that it's a
 * test suite and not a single test.
 */
@RunWith(Suite.class)
/**
 * We need to use static inner classes as JUnit only allows for empty "holder"
 * suite classes.
 */
@Suite.SuiteClasses({
        ArrayListTest.GuavaTests.class,
        ArrayListTest.AdditionalTests.class,
})
public class ArrayListTest {

    /**
     * Add your additional test cases here.
     */
    public static class AdditionalTests {

        @Test
        public void testFoo() {

        }

    }

    /**
     * This class will generate the guava test suite. It needs a public static
     * magic method called {@link GuavaTests#suite()} to do so.
     */
    public static class GuavaTests {

        /**
         * The method responsible for returning the {@link TestSuite} generated
         * by one of the {@link FeatureSpecificTestSuiteBuilder} classes.
         * This method must be public static method with no arguments named
         * "suite()".
         *
         * @return An instance of {@link TestSuite} for collection testing.
         */
        public static TestSuite suite() {
            /**
             * guava-testlib has a host of test suite builders available,
             * all descending from {@link FeatureSpecificTestSuiteBuilder}.
             * The
             * {@link FeatureSpecificTestSuiteBuilder#usingGenerator(Object)}
             * is the main entry point in that collections are tested by
             * implementing {@link TestCollectionGenerator} and providing an
             * instance of the interface to the test suite builder via the
             * #usingGenerator(Object) method. Most of suite builder classes
             * provide a convenience method such as
             * {@link ListTestSuiteBuilder.using()} that streamline the process
             * of creating a builder.
             *
             */
            return ListTestSuiteBuilder
                    // The create method is called with an array of elements
                    // that should populate the collection.
                    .using(new TestStringListGenerator() {

                        @Override
                        protected List<String> create(String[] elements) {
                            return new ArrayList<>(Arrays.asList(elements));
                        }


                    })
                    // You can optionally give a name to your test suite. This
                    // name is used by JUnit and other tools during report
                    // generation.
                    .named("ArrayList tests")
                    // Guava has a host of "features" in the
                    // com.google.common.collect.testing.features package. Use
                    // them to specify how the collection should behave, and
                    // what operations are supported.
                    .withFeatures(
                            ListFeature.GENERAL_PURPOSE,
                            ListFeature.SUPPORTS_ADD_WITH_INDEX,
                            ListFeature.SUPPORTS_REMOVE_WITH_INDEX,
                            ListFeature.SUPPORTS_SET,
                            CollectionFeature.SUPPORTS_ADD,
                            CollectionFeature.SUPPORTS_REMOVE,
                            CollectionFeature.SUPPORTS_ITERATOR_REMOVE,
                            CollectionFeature.ALLOWS_NULL_VALUES,
                            CollectionFeature.GENERAL_PURPOSE,
                            CollectionSize.ANY
                    ).createTestSuite();
        }


    }


}


Day Twelve: The Bellman-Ford Algorithm

TL/DR

If you need to find the shortest path for a graph with negative weights, the Bellman-Ford algorithm can do so. Dijkstra's algorithm has better time complexity, but Bellman-Ford can handle graphs with negative weights, and even be used to detect negative cycles.

Detecting Negative Cycles

Today's post is short and sweet – I might add code later but the algorithm is quite simple. For graphs with negative weights, Dijkstra's algorithm fails. Negative weights introduce the potential for negative cycles, in which case a shortest path might not exist as a shorter path could always be constructed by transversing the cycle once again. That breaks Dijkstra. The Bellman-Ford algorithm can detect this when it happens. Unfortunately, the generality comes at the cost of time complexity. Dijkstra's algorithm runs in O(E+Vlog(E))\O(\|E\| + \|V\|\log(\|E\|)). Bellman-Ford takes O(EV))\O(\|E\| \cdot \|V\|)), where VV and EE are the sizes of the vertex and edge sets, respectively. There are modifications to Bellman-Ford with slightly better time complexity, but Bellman-Ford is quite simple to implement.

Finding Negative Cycles

Once you know that there is a negative cycle, you still need to find it. There are fast algorithms to do so, but depth-first search (DFS) is arguably the simplest route. You can run DFS on a subgraph that's produced as a bi product running Bellman-Ford (the shortest path edge set that's generated along the way) for slightly better time complexity. There's a break even point between just running depth-first search on the full graph or starting off with Bellman-Ford to check if there's a negative cycle in the first place. The run-time complexity of DFS is exponential for graphs with a branching factor greater than one so it's not cheap. For really small graphs you're almost certainly better off with a vectorized brute force search.

Why do we Care?

Finding a path via DFS, or the shortest path via Bellman-Ford/Dijkstra's algorithm/A-star has lots of applications in optimization and logistics. Here's one nice example. Graphs with negative cycles are a special case of the shortest path problem, and once upon a time, currencies were inefficient enough that triangular arbitrage persisted for some time. These days, it's a purely high frequency game.