Failing The Turing Test: June 2008

Saturday, June 21, 2008

Where Jeff Atwood Is Wrong On Stackoverflow.Com

I have been reading Jeff Atwood's Coding Horror Blog for many years and have high regards for it, but I am starting to get a little disillusioned by some of his later writings and also several comments on the Stackoverflow.Com Podcast. Anyway, considering all that interesting and entertaining stuff he has been blogging over time I suppose it's also OK if I can't identify with his opinion once in a while.

There is this blog entry on the Model/View/Controller design pattern, and Jeff and Joel Spolsky also discussed that during their podcast. I consider the "Model=HTML, View=CSS and Controller=Browser"-explanation misleading (I would not go so far to call it "completely wrong", it's just not a great analogy). HTML and CSS are declarative languages, not Models/Views in an OOP sense. The browser application's code and data structures are the building blocks for Model, View and Controller. So the browser IS Model, View and Controller.

By definition, the Model represents application state, the View defines how Model data is rendered into the UI and the Controller takes user input and changes the Model accordingly, resp. invokes application functionality.

One could favorably argue that HTML is a source of data for the browser's Model, and that CSS also affects the way the View does the rendering. But Model and View themselves are something else, they are browser-internal software components that represent web content (Model) resp. are responsible for visualizing that web content on a UI platform (View). Remember the View serves as an Observer on Model data - that just doesn't fit with CSS and HTML. And that's why I think the comparison is a little bit far-fetched.

Take a CAD application as another example of a Model that contains some kind of graphical artefacts simply because that's the nature of the application (just like web content in case of a browser). The CAD Model might consist of shapes and polygon definitions, and the CAD View then renders that into a two-dimensional UI.

Here is a diagram taken from the J2EE Architecture blueprints that pretty much sums up things (yeah I know this article refers to N-tier architecture, but the diagram is valid for all MVC-secnarios).

Jeff Atwood noted that he could not find any MVC documentation that really pleased him and that he could do better (hey, why not take the GoF book - MVC is described in the introduction chapter). Sorry to say that in this case he didn't comply with his own high standards.

Then there was a remark about SqlServer as being pretty much standalone self-tuning. As much as I appreciate SqlServer, it's really not self-tuning. Don't get me started on all that query optimizer issues I have been through (admitted this has gotten better in the latest versions). It's also a must to define a customized maintenance plan on SqlServer. But maintenance plans are kind of cumbersome or even impossible on desktop/express engines, and on those engines self-tuning would be essential (as there is no database admin to take care of such things). If I had to name a self-tuning database, it would be Sybase SQLAnywhere Personal Server - no big surprise as it has been designed for smaller-than-enterprise usage.

Finally Jeff stated that he sees no difference between computer science and software engineering. I'd like to quote Steve McConnell at this point (and by the way, coincidently the term "Coding Horror" descends from Steve McConnell's book "Code Complete"):

Scientists learn what is true, how to test hypotheses, and how to extend knowledge in their field. Engineers learn what is true, what is useful, and how to apply well-understood knowledge to solve practical problems.

Scientists must keep up to date with the latest research. Engineers must be familiar with knowledge that has already proven to be reliable and effective. If you are doing science, you can afford to be narrow and specialized. If you are doing engineering, you need a broad understanding of all the factors that affect the product you are designing.

Scientists don't have to be regulated because they are chiefly accountable to other scientists. Engineers do have to be regulated because they are chiefly accountable to the public.

An undergraduate science education prepares students to continue their studies. An undergraduate engineering education prepares students to enter the workforce immediately after completing their studies.

But hey, just as Jeff remarked repeatedly, it's also fine to disagree every now and then.

Thursday, June 19, 2008

Battleship Game Algorithm Explained (Part 3)

This is part 3 of the mini-series explaining the inner workings of the Cubido C# Pirates Winner Algorithm. We have talked about how to find the cell best suited for a trial shot. Now let's see what we can do to improve the accuracy of sinking shots, that is the follow-up shots once a trial shot hit on a ship.

So what needs to be decided at this point is in which direction to continue shooting after the first hit. The stakes are high - after all it's possible to produce three unnecessary misses until we know for sure were the ship is really located.

Let's have a look at the following scenario, and suppose that there is one ship left of size 4, one ship of size 3 and one ship of size 2:

It somehow seems tempting to select the eastern neighbor cell for the next sink shot, because of the large unknown space in that direction. But as it turns out that would be a mistake.

A better approach is to calculate the probabilities for the ships that are still left on the battlefield to be located on each unknown neighbor cell (of course considering the fact where we have hit the ship already). This is very similar to our strategy when placing trial shots.

The ship of size 2 can be located in either of the four directions, so this does not really help.

There is 1 possibility for the ship of size 3 to be located on the western neighbor cell, 2 possibilities on the northern neighbor cell, 2 possibilities on the eastern neighbor cell and 2 possibilities on the southern neighbor cell. So the western cell is out of the game.

Regarding the ship of size 4, there is 1 possibility to the west, 2 to the north, 2 to the east and 3 to the south. The southern neighbor is our target of choice.

In case we miss we'll give it another try the next round by applying the same mechanism. After all the probabilities have changed by then (the ship is then more likely to be positioned horizontally).

Once we hit a second ship cell we can be sure about its orientation, but may still not know about the exact location. So we continue to calculate probabilities for neighbor cells on each side until the ship has been sunk. After that we store the information about which ship size just has been sunk, mark all cells surrounding the ship with State Impossible, and switch back to trialshot mode (as explained in part 1).

In part 4 we are going to put all of that together, and have a look at the resulting sourcecode.

Previous Postings:

Followup Postings:

Battleship Game Algorithm Explained (Part 4)

Wednesday, June 18, 2008

Battleship Game Algorithm Explained (Part 2)

This is part 2 of the mini-series explaining the inner workings of the Cubido C# Pirates Winner Algorithm. We previously examined how to basically choose the next battlefield cell to shoot at, that is by calculating each cell's probability to contain a ship. If there is more than one top-ranked cell, we need to decide among those cells.

So far we have done our best to hit a ship. But let's face it, it's more likely to hit the water. How can we benefit from a missed shot as well?

By applying a checkerboard-like shooting pattern we will find each ship by only having to hit only 50% of the cells (and that's a worst case scenario). E.g. there is no way for a ship of size 2 to hide anywhere here:

Great, we now have an additional selection criteria for choosing the next target cell: the checkerboard pattern.

Still there might be plenty of cells left with equal ship location probabilities which also fit into the checkerboard. Which other value can be gained from a missed shot? The value of learning something about unknown neighbor cells! When an unknown cell A has a water cell neighbor B (or an impossible cell neigbor, which is a cell that adjoins a sunk ship), this clearly limits the options for a ship to be located on cell A - because cell B is out of the game for a joined ship location.

If cell A is surrounded by two water/impossible cells the probabilities decrease further, and even more of course with three water/impossible neighbors (just one direction left to place a ship). Finally if we happen to shoot at the last remaining unknown neighbor cell of A (and hit water only), we can abandon that cell A for future consideration all together.

So what we are going to do is to count the water and impossible neighbors for each of our potential target's unknown neighbors (our target's unknown neighbors' water/impossible neighbors so to speak), and prefer the one target with the highest score.

This cell here might be a good choice for a shot (neighbor score is 7, 2 water neighbors for the western unknown neighbor + 2 water/impossible neighbors for the northern unknown neighbor + 0 for the eastern impossible neighbor + 3 water/impossible neighbors for the southern unknown neighbor):

In the following case that cell would score even higher (neighbor score is 9, 3 water neighbors for the western unknown neighbor + 3 water/impossible neighbors for the northern unknown neighbor + 0 for the eastern impossible neighbor + 3 water/impossible neighbors for the southern unknown neighbor):

So the neigbor score becomes our second target selection criteria, right after to the ship location probability.

If there are cells with equal ship location probability and equal neighbor score, we choose one that fits the checkerboard (by the way, applying the neighbor score almost naturally leads to a checkerboard as well, so this is just to go sure for special cases, e.g. a more or less empty battlefield). If more than one of the top-ranked cells fits the checkerboard, we'll randomly select any of them.

In part 3 we will discuss how to proceed once the first cell of a ship has been hit. There is a lot of potential for tuning as well.

Previous Postings:

Battleship Game Algorithm Explained (Part 1)

Followup Postings:

Tuesday, June 17, 2008

Battleship Game Algorithm Explained (Part 1)

Several people have approached me about this, so here is the first article in a mini-series describing the Cubido C# Pirates Winner Algorithm. It's final score was 38.19 shots (average shot count on 10,000 battleship games). 17 contentors scored under 39.00 shots, so this clearly was a very close match. 83 algorithms took part overall. The sourcecode is available, too.

Please note:

The code was not optimized for speed and has plenty of potential for performance improvement, esp. regarding nested looping. The competition rules included an upper time limit of 500ms per shot. This implementation runs at 1ms per shot in average, so there was no point in tuning that esp. as readability might have suffered.

The algorithm does not take into consideration opponents' behavior. The C# Pirates competition environment itself placed the ships randomly, and did not provide state storage on any larger scope than a single game, so it was not possible to react on opponent's behavior when shooting, or on different ship placement strategies.

Here is the spec: The battlefield consists of 10x10 cells. Ships are laid out either horizontally or vertically, with a margin of at least one cell between each ship. There is one ship of size 5, one of size 4, two of size 3 and one of size 2. Battleship algorithms are being called back on each round and must return the cell to shoot at next. The state of each cell can be queried through the C# Pirates API. Those states are: Unknown (cell has not been shot at yet), Water (cell has been shot at before and contains no ship) and Ship (cell has been shot at before and contains a ship).

Unknown cell

Water cell

Ship cell

So on each round of the game there is a decision to be made: at which cell to shoot at next. It soon becomes clear that we need two different modes of operation: Trialshot mode and Sinkshot mode. Trial shots occure while scanning the battlefield for ships. Sink shots happen once a ship has been hit in order to continue sinking it. We are going to talk about trial shots in part 1 and part 2, while sink shots will be covered in part 3.

In trialshot mode it's clearly a good idea to choose the cell that is most likely to be occupied by a ship. The probability of a ship residing at a certain cell can be calculated by scanning for unknown neighbor cells in each direction.

Let's have a look at an example. Say that there are the following ships left to be sunk: One of size 5, one of size 3 and one of size 2 (we know that because we kept track on which ships we sank already), and we are about to examine the cell marked with a shot impact.

There are following possibilities for this cell to contain a ship: 1 option for a ship of size 5 (filling out the whole horizontal space), 1 option for a ship of size 3 (to the east), and 2 options for a ship of size 2 (north and east). So this gives us 4 options for ships to be placed on that very cell.

What about the next unknown cell?

Again there is 1 option for a ship of size 5 (horizontally), 2 options for a ship of size 3 (both horizontally, one to the east, the other one with the proposed target cell located in the center of the ship), and 3 options for a ship of size 2 (east, west and south). There is no way for a ship to fit to the north although the northern neighbor has not been shot at yet, as it's itself restricted by the sunken ship next to it (one cell margin between ships). Still, 6 options beat 4 options, so the algorithm would choose the second cell among those two cells.

The fact that there is a margin of one cell between ships creates another cell state that we must consider: the cell has not been shot at, but is next to a ship, hence cannot be part of a ship itself. Let's call that state Impossible, and keep this information in an internal data structure between callbacks, because the C# Pirates API does not support that state, and querying all neighbors of state Unknown each time is kind of cumbersome.

So what we basically do is to iterate over all cells of state Unknown, calculate their probabilities to contain a ship, and select the highest-rated option. The question is what to do when there are several target cells with the same probability (which by the way is quite likely to happen). Making a good decision at that point is crucial and will differentiate our algorithm from competing ones which apply the same basic strategy. We will discuss that in part 2.

Followup Postings: