H. Chase Stevens Logo
blog · schedule · résumé · public key · contact


Most Common Tags

  1. programming
  2. python
  3. code
  4. philosophy
  5. evolution
  6. game design
  7. probability
  8. video games
  9. genetic algorithms
  10. government

Archives

Loading

How complex is English, and why?

posted on: Saturday, March 29, 2014 (2:38 pm) by

Last night, I was reading a thread on reddit titled "When did the use of gender start showing up in language, and what purpose did it serve?". One of the top posts, by a user named ggchappell, said that the original poster seemed to have it the wrong way around: languages "start complicated and sometimes get simpler", especially when those languages are used for trade. In response to this, user mojitz asked,

"The simplest languages are those that ended up being used for trade between multiple groups of people."
Does English conform to this generalization by your estimate?
I thought that was a really exciting question, and was prompted to write the response below.


Depending on how you define "simple", English is a relatively simple language, but probably not because of trade.

The question of how to define language complexity (or, in fact, if languages differ in complexity in the first place) has been a hotly contested, very political issue in the world of linguistics. Because of this, it’s really only been in the last few years that we’ve made substantive advances toward finding quantitative metrics of language complexity. Some really exciting work was done a few years ago by Max Bane, who applied Kolmogorov complexity – a concept from information theory that measures how densely you can compress some data without losing any information – to the inflectional and affixal morphology of various languages [1]. This is to say, instead of measuring how complex a language’s sound system is, or how difficult that language is to learn, Bane was looking at how complex the words of a language are in terms of their internal structure: the rules dictating which prefixes and suffixes you can (or must) add to a "base" word form, e.g. "type" becoming "typed" to indicate the past tense.

If we use Bane’s metric for complexity, we end up with languages like Hungarian and Latin (which, as others have pointed out, has a very rich inflectional system) being ranked as most complex, while morphologically "isolating" languages like Vietnamese and Chinese – languages that have on average fewer units of meaning per word (contrast "establish" with "anti-dis-establish-ment-arian-ism") – are ranked as the least complex.

Slightly more complex than isolating languages are a class of languages called "pidgins". Pidgins are communicative systems formed when two or more groups of people without a common language need to communicate. Getting back to what /u/ggchappel said, pidgins are very frequently the result of different language speakers coming together to engage in trade. Moreover, pidgins are very widely held as being particularly simple languages - in pidgins, we very often observe things like the reduplication of words or morphemes to augment their intensity (e.g., in Nigerian Pidgin, "sikisiki", meaning "someone who is always sickly", and "E big well well" to mean "It is very big" [2]). When children grow up learning a pidgin, it transitions into what linguists call a "creole", and usually as a result of this undergoes an increase in regularization and linguistic complexity. However, children don’t normally grow up learning a pidgin as a result of trade – this is much more likely to be indicative of one of two situations: either multiple groups of people without a common language have been brought together as a result of slavery, or one group of people has invaded another. In the latter scenario, we see the establishment of a "prestige" language – a language of the elite and the learned, the language of the conquerors – and a "vulgar" language, the language of the conquered masses.

Now, this is where things start to get really neat. Let’s take a quick look at the history of England (and English) from the 9th century through to the 13th century. In 9th century England, Old English was being spoken, a Germanic language strongly related to other continental European languages. In northern England, however, English was hardly being spoken at all: the Old Norse-speaking Danish had invaded, and established Danelaw. At this time Old Norse and Old English would have been mutually intelligible, much like modern-day Swedish and Norwegian – a speaker of Old English would have been able to understand a speaker of Old Norse, and vice-versa, although probably with some difficulty. It’s very hard to overstate the amount of influence Old Norse had on Old English, but to give an example, the pronouns "they", "their" and "them" – core components of Modern English – derive directly from the Old Norse "þeir" ("their") [3]. But the amount of change Old English experienced interacting with Old Norse pales in comparison to what happened next.

In 1066, William the Conqueror invaded England and, in doing so, established French as a prestige language. Now suddenly we see incredible amounts of French influence on Old English vocabulary and grammar, so much so that by the 12th or 13th century Old English isn’t even Old English anymore, it’s Middle English. To give you an idea of how radical a change this was, Modern English speakers can understand Middle English, whereas Old English might as well have no relation to what we think of as English today. Middle English is the language of Chaucer, who in the 14th century wrote lines like

"Whan that Aprill, with his shoures soote The droghte of March hath perced to the roote And bathed every veyne in swich licour" [4].
Middle English is the language of Orm, who wrote in the introduction of his 12th century book Orrmulum,
"[Th]iss boc iss nemmned Orrmulum, for[th]i [th]att Orrm itt wrohhte" [5].
Old English is the language Beowulf is written in, which contains such unintelligible passages as
"Hwæt! We Gardena in geardagum [th]eodcyninga [th]rym gefrunon hu [th]a æ[th]elingas elle fremedon" [6].
About 90% of the words in Modern English are from either Latin or Greek, with the vast majority of these having come through French during this time period or later [7] (although roughly half of those used in everyday speech are still of Germanic origin [8]).

Let’s get back to Bane. You might have noticed that I didn’t mention where English ranked in Bane’s complexity survey. Let me tell you, English is much less complex than Latin. English is much less complex than Icelandic, which has changed so little as to be mutually intelligible with Old Norse. English is less complex than French or German. In fact, in Bane’s ranking of 20 languages using his complexity metric, English was less complex than any other European language. Instead, English falls somewhere between Dutch and Maori. More notably, though, the English language’s complexity is, at 16.88%, closer to Nigerian Pidgin (at 9.8%) than French (at 23.05%). Old English underwent creolization to become Middle English [9].

In short, and to answer your question, yes, English can be construed as being one of the least complex European languages, not because of trade – but because of conquest.


  1. Bane, Max, 2008, "Quantifying and measuring morphological complexity", Proceedings of the 26th West Coast Conference on Formal Linguistics, http://www.lingref.com/cpp/wccfl/26/paper1657.pdf .
  2. Ugot, Mercy and Ogundipe, Afolabi, 2011, "Reduplication in Nigerian Pidgin: A Versatile Communication Tool?", Pakistan Journal of Social Sciences, http://www.medwelljournals.com/fulltext/?doi=pjssci.2011.227.233 .
  3. Harper, Douglas, n.d., "they (pron.)", Online Etymology Dictionary, http://www.etymonline.com/index.php?term=they .
  4. Eliot, Charles W., 1909, "English poetry I: from Chaucher to Gray", The Harvard Classics, http://www.bartleby.com/40/0101.html .
  5. Breen, Katharine, 2010, "Imagining an English Reading Public, 1150-1400", Volume 79 of Cambridge Studies in Medieval Literature, http://books.google.co.uk/books?id=HPW8KrD88OUC&lpg=PA116&dq=boc%20is%20nemmned%20Orrmulum&pg=PA116#v=onepage&q&f=false .
  6. Slade, Benjamin, 2012, "Beowulf: diacritically-marked text and facing translation", http://www.heorot.dk/beowulf-rede-text.html .
  7. Dictionary.com, "Word FAQs", http://dictionary.reference.com/help/faq/language/t16.html .
  8. Nation, I.S.P., 2001, "Learning Vocabulary in Another Language", Cambridge Applied Linguistics.
  9. Danchev, Adrei, 1997, "The Middle English creolization hypothesis revisited", Trends in Linguistics Studies and Monographs 103.

Tags: english, history, information theory, linguistics

Making pictures using triangles (an exercise in stochastic hill-climbing algorithms) – Part 1

posted on: Wednesday, December 4, 2013 (4:02 am) by

In this blog post, I’d like to introduce a project I’ve been tinkering on and off (mostly off) with for the past few months. The idea behind the project was pretty simple: to write a program which, given a set number of triangles and a blank canvas, adjusts the coordinates, color, and transparency of the triangles over time in an attempt to replicate a given image. There were two primary motivations behind my project, at least initially:

  • to create (lossily) compressed versions of the source images that still look decent (and might even reduce image noise), and
  • to have those compressed versions themselves be scalable to any size.
While I found that my program met these goals fairly adequately, what I was both surprised and delighted by was its aesthetically pleasing and almost artful representations of the source images. Therefore, before delving into the more technical aspects of the implementation, I’d like to present a few samples of the program’s results. The first example uses 50 triangles, while the other two use 100.

Example 1

Iteration 9500 Iteration 50000
Output at iteration 9,500
Image size: 3.38KB
Compressed SVG size: 0.88KB
Output at iteration 50,000
Image size: 2.95KB
Compressed SVG size: 0.88KB
Iteration 1293000 Christian Bale
Output at iteration 1,293,000
Image size: 2.74KB
Compressed SVG size: 0.88KB
Original image
Size: 6.53KB

This was one of the very first tests of my program, and is in my opinion also one of its most successful and impressive recreations. As you might infer from the stark difference between iteration 9,500 and 50,000 and moderate difference between iteration 50,000 and 1,293,000, the program’s output approaches complete similarity to the target image roughly logarithmically with each iteration; this trend is plotted in the graph below.
Graph of recreation error over iterations
The corresponding vector for the final result (below) is actually noticeably less accurate than the JPEG posted above, because the program has exploited JPEG artifacts as well as the specific dimensions of the target in creating its reproduction. When the triangles are used to create an SVG, these artifacts are no longer present and the image becomes less faithful.
One of the most noticeable issues with the program’s image is that it entirely fails to capture the eyes in the source image. I explored various methods of solving this issue, which I will discuss in a later post on implementation. Overall, though, I thought that the program produced a more pleasing (if less accurate) image, kilobyte-for-kilobyte, than comparable "traditional" compression techniques (as pictured below), especially if enlarged.
Iteration 1293000 Christian Bale Christian Bale compressed
Program’s final output
Image size: 2.74KB
Compressed SVG size: 0.88KB
Original image
Size: 6.53KB

Image compressed with Paint.NET
Size: 1.91KB
Compressed size: 1.58KB
To finish off this example: a video of the program’s progress in making its output played at 30 frames per second, with frames produced every 500th iteration. The "jumps" in activity visible at 0:23 and 0:56 are instances when I made updates to the program.

Example 2

Iteration 737500 Actor Christian Bale
Output at iteration 737,500
Image size: 4.03KB
Compressed SVG size: 1.54KB
Original image
Size: 3.51KB

This example yet again highlighted an issue with the image similarity evaluation methods employed by the program. While many aspects of the source image are replicated with high fidelity, the facial features of Christian Bale are not. Below again is a video of the program’s progress over time.

Example 3

Iteration 769500 Esteemed actor Christian Bale
Output at iteration 769,500
Image size: 3.27KB
Compressed SVG size: 1.54KB
Original image
Size: 12.9KB

In this instance, I experimented with another evaluation method that incorporated a saliency comparison between the target image and the output. Although the results presented here seem promising, other tests using the method were lacklustre, especially given that the saliency map calculations added a lot of processing time to each iteration.


In my next post, I’ll detail the specifics of the program. In the meanwhile, the program itself can be downloaded here; it requires numpy and PIL to run.

Tags: image processing, numpy, programming, python

The Anonymized Prisoner's Dilemma Challenge – Exploiting Limited Knowledge

posted on: Tuesday, August 20, 2013 (7:28 am) by

The competition

Recently, a couple of friends and I entered the Brilliant.org Hunger Games competition, in which competitors must create programs which are pitted against others in an arena of sorts. In any given round of the competition, the programs must choose for each other program whether to cooperate with them or take advantage of them. In the Hunger Games competition, these are framed as "hunting" and "slacking".

Tit-for-tat

At face value, the competition is a very standard iterated prisoner’s dilemma. For the general case, the optimal strategy for this game has already been discovered: tit-for-tat. This essence of this strategy is to reciprocate everything done to you in the previous round, and cooperate with everyone in the first round in a show of "good faith". However, the Hunger Games competition had a slight twist: although your program would be able to see the overall reputation of each opponent (id est how many times they hunted versus slacked), no other identifying information would be supplied. This limited knowledge makes true tit-for-tat impossible, since your opponents are effectively anonymized. Although you may know that a player you decided to cooperate with in the previous round defected against you, there’s no way to retaliate against the same player in this round with 100% confidence.

Our strategy

My team’s strategy, nicknamed "The Underminer", both compensated for this lack of knowledge and managed to exploit it. We started with the assumption that tit-for-tat is possible, to a degree. As the number of rounds increases, the reputations of individual opponents becomes solidified, thus making this a means of identification. Although a player with a reputation of 1.0 in round one could drop to a reputation of 0.5 in round two, a player with a reputation of 1.0 in round 20 can only drop to 0.95. Based on this, one can say that the ranking of players by reputation remains mostly static: the worst-ranked opponent will have a very difficult time becoming the highest-ranked. While this is very untrue in the first rounds, at a certain point changing ranks becomes almost impossible. This phenomenon can be imagined like a zipper: before any zipping has occurred, there’s a very large degree of freedom of motion available. However, as you begin to zip, the freedom of motion becomes more and more constrained, until none remains.

Undermining

While our program implements tit-for-tat as described above in most circumstances, there’s a very specific scenario in which it deviates from this otherwise optimal strategy. As mentioned, the "tit-for-tat" allowed by the Hunger Games competition is not foolproof, since player identities can only be approximately tracked at the beginning of the game. Assuming other players are also tracking opponents by reputation, we can exploit this limitation in knowledge by occasionally attempting to "undermine" another player, assuming their identity. This technique is probably best illustrated by example. Suppose in some round we’re ranked 2nd by reputation at 0.7, while another player is ranked 3rd at 0.6 reputation. Assuming both we and the 3rd ranked player slacked completely this round, there would be no way for us to displace our opponent as 3rd ranked player, since they already have the "bad reputation" head-start. However, the likelihood of our opponent slacking completely this round is very low. In fact, the decisions of our opponent can be estimated given their current reputation, the number of rounds elapsed so far, and a rating of how certain we want our estimation to be by using the lower bound of the
Wilson score interval. While this formula is most often used to rank items based on user reviews (and is employed perhaps most famously by Reddit’s "hot" algorithm), in this case we can use it to infer the "true" cooperation rate of opponents and, based on this, their (probable) reputation at the end of the round. Supposing in this circumstance that we predict with a large amount of certainty that our opponent’s reputation at the end of this round will be at worst 0.55, and we can manage to lower our reputation below that, then we choose to slack completely this round. Assuming that the other player remained at 0.6 reputation, while we dove down to 0.5, from a third player’s perspective, this is what happened: the 2nd ranked player went from 0.7 reputation to 0.6 reputation, and the 3rd ranked player went from 0.6 reputation to 0.5 reputation. For the third player to make the assumption that the 2nd ranked player went under the 3rd ranked player - going from 0.7 reputation to 0.5 - would be a strange and unintuitive leap of logic. So, in this way, we can choose to take the advantageous route of fully slacking while passing off some of the negative repercussions of this to a worse-ranked player.

Graph of player reputation over time

In the above, we can see The Underminer repeatedly performing this tactic against a standard tit-for-tat bot. After each undermine, The Underminer reverts to a basic tit-for-tat strategy, which in this simulation caused its reputation to increase over time. As soon as it’s again in a position where it can undermine the tit-for-tat player, it does, repeatedly reaping the rewards of doing so.

Unfortunately, I can’t yet reveal whether The Underminer was successful or not – mostly because I haven’t found out myself. Hopefully, however, in two weeks’ time the results of the Hunger Games competition will be announced, at which point I’ll write another analysis on how our program fared overall. In the meantime, you can go check out the source code for our program as well as the testing/simulation program we used at this github repo.

Tags: anonymity, game theory, probability, programming, python

Generating Contrast and Saliency Maps on the GPU (Using Python and OpenCL)

posted on: Saturday, April 13, 2013 (8:21 am) by

In the field of Computational Neuroscience, saliency maps are a means of graphically representing the areas of any visual scene presenting the most "bottom-up" saliency to a human observer (i.e. those most likely to draw the viewer's attention). Although the generation of these maps is not particularly difficult on a conceptual level, doing so is quite computationally expensive if using a serial approach. Below, I provide code for quickly generating the component contrast maps needed to build a saliency map by parallelizing the task on the GPU, as adapted from MATLAB code provided by Vicente Ordonez of SUNY. To run this, you'll need pyopencl v0.92, numpy, and PIL.

Christian Bale Saliency map of Christian Bale
Original image Saliency map

Download
from PIL import Image
import itertools
import numpy as np
import pyopencl as cl
import pyopencl.array as cl_array
from pyopencl.elementwise import ElementwiseKernel
import math

## initialization of GPU context, queue, kernel
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
kernel_args = "int width, int len_min_width, float *image, float *contrast"
kernel_code = '''
contrast[i/3] = contrast[i/3] +
 (!(
     (i < width) ||
     (i > len_min_width) || 
     (i % width == 0) ||
     ((i+1) % width == 0) ||
     ((i+2) % width == 0) ||
     ((i+3) % width == 0) ||
     ((i-1) % width == 0) ||
     ((i-2) % width == 0)
   )
 )?
 (
     pown((image[i] - image[i-width]), 2) +
     pown((image[i] - image[i-width-3]), 2) +
     pown((image[i] - image[i-width+3]), 2) +
     pown((image[i] - image[i-3]), 2) +
     pown((image[i] - image[i+3]), 2) +
     pown((image[i] - image[i+width]), 2) +
     pown((image[i] - image[i+width-3]), 2) +
     pown((image[i] - image[i+width+3]), 2)
 ) : 0''' # for each rgb value in image, if not part of a pixel on the image
          # border, sum the squares of the differences between it and the
          # corresponding color values of each surrounding pixel, then add
          # this to the corresponding output pixel's value
contrast_kernel = ElementwiseKernel(ctx,
                                    kernel_args,
                                    kernel_code,
                                    "contrast")

flatten = lambda l: list(itertools.chain(*l))

def contrast_map(image):
    
    width = image.size[0]
    height = image.size[1]
    
    ## creating numpy arrays
    image = np.array(flatten(list(image.getdata()))).astype(np.float32) #image array
    contrast_map = np.zeros((height*width)).astype(np.float32) #blank write array
    
    ## send arrays to the gpu:
    image_gpu = cl_array.to_device(ctx,queue,image) 
    contrast_map_gpu = cl_array.to_device(ctx,queue,contrast_map) 
    
    contrast_kernel(width*3,(image.size - width - 1),image_gpu,contrast_map_gpu) #executing kernel
    contrast_map = contrast_map_gpu.get().astype(np.float32) #retrieving contrast map from gpu
    contrast_map = np.nan_to_num(contrast_map) #conversion of NaN values to zero
    
    ## normalization:
    contrast_map += max(contrast_map.min(),0)
    contrast_map /= contrast_map.max()
    contrast_map *= 255
    return contrast_map.astype(np.uint8)

def saliency_map(image):
    
    width = image.size[0]
    height = image.size[1]
    
    resizes = int(math.floor(math.log(min(width,height),2)) - 3) #calculating necessary number of
                                                                 #images for gaussian pyramid
    resized_images = [image.resize((width/factor,height/factor),Image.BICUBIC) 
                      for factor in [2**x for x in range(resizes)]] #resizing images
    contrast_resized_images = map(contrast_map, resized_images) #generating contrast maps

    ## resizing contrast maps back to original image size:
    images = list()
    for i in range(len(contrast_resized_images)):
        image = Image.new("L",resized_images[i].size)
        image.putdata(list(contrast_resized_images[i]))
        images.append(np.array(list(image.resize((width,height)).getdata())))

    ## combining images:    
    result = np.zeros_like(images[0]).astype(np.float32)
    for image in images:
        result += image.astype(np.float32)
        
    ## normalization:
    result += max(result.min(),0)
    result /= result.max()
    result *= 255
    output = Image.new("L",(width,height))
    output.putdata(list(result.astype(np.uint8)))
    return output

Tags: code, cognitive science, gpgpu, image processing, numpy, opencl, programming, python

Visualizing k-means clustering (using d3.js)

posted on: Friday, March 22, 2013 (8:22 am) by

Having learned a little about d3.js at a hackathon I attended, I decided to put it to good use. Below is an interactive implementation of the k-means clustering algorithm, as visualized using d3.js. Click to place points. Centroids are (initially) set on randomly chosen points, and the number of clusters is set by a supposed rule of thumb. You can choose to step through the algorithm one centroid-readjustment/simultaneous point assignment at a time, or view the process in real time.

Download

//config values:
var colors = ['coral','green','steelblue','seagreen','yellow','cyan','salmon','skyblue','pink']
var radius = 8;
var trans_secs = 2;
var update_ms = 150;
var svg_size = {width:600,height:400};

//initialization
var mouse_x;
var mouse_y;
var new_points = false;
var points = new Array();
var centers = new Array();
var algo_started = false;
var svg = d3.select('#kmeans').append('svg')
.attr('width',svg_size.width)
.attr('height',svg_size.height);
canvas = document.getElementById('kmeans');
var stream = false;
var stop_update = false;

//@ http://jsfromhell.com/array/shuffle [v1.0]
function shuffle(o){ //v1.0
    for(var j, x, i = o.length; i; j = parseInt(Math.random() * i), x = o[--i], o[i] = o[j], o[j] = x);
    return o;
};

function safeShuffle(o){
	var x = new Array();
	for(var i = 0; i < o.length; i++){ 
		x[i] = o[i]; 
	}
	return shuffle(x);
}

function addPoint(){
	//add new point to canvas (d3)
	if (algo_started) {return true;}
	var rel_x = mouse_x - canvas.offsetLeft;
	var rel_y = mouse_y - canvas.offsetTop;
	if (points.length >= 1 && (points[points.length-1].x == rel_x)
		&& (points[points.length-1].y == rel_y)){ //making sure double-clicks don't get counted as two points
		return true;
	}
	points.push({x:(rel_x),y:(rel_y),plotted:false,nearest_center:null});
	new_points = true;
}

//tracking mouse:
document.captureEvents(Event.MOUSEMOVE);
document.onmousemove = getMousePosition;
function getMousePosition(mp) {
	mouse_x = mp.pageX;
	mouse_y = mp.pageY;
	return true;
}

function capitalize(string)
{
    return string.charAt(0).toUpperCase() + string.slice(1);
}

function createCenters () {
	//creating centroids
	document.getElementById('create').style.display = 'none';
	document.getElementById('move').style.display = 'block';
	algo_started = true;
	rand_points = safeShuffle(points);
	for (var i = 0; i < Math.round(Math.sqrt(rand_points.length / 2)); i++){
		centers.push({x:rand_points[i].x, y:rand_points[i].y, color:colors[i%colors.length], old_x:null, old_y:null});
	}
	svg.selectAll('.center')
	.data(centers)
	.enter()
	.append('circle')
	.attr('cx', function(d){return d.x;})
	.attr('cy', function(d){return d.y;})
	.attr('r', radius)
	.attr('style', function(d){return 'fill:' + d.color;})
	.attr('class','center')
   	.append("svg:title")
   	.text(function(d){return capitalize(d.color) + ' cluster center'});
}

function moveCenters () {
	//recalculate centroid positions given their assigned points
	for (var i = 0; i < centers.length; i++){
		var center = centers[i];
		center.old_x = center.x;
		center.old_y = center.y;
		var new_x = 0;
		var new_y = 0;
		var point_count = 0;
		for (var j = 0; j < points.length; j++){
			if (points[j].nearest_center == center){
				new_x += points[j].x;
				new_y += points[j].y;
				point_count++;
			}
		}
		center.x = Math.round(new_x / point_count);
		center.y = Math.round(new_y / point_count);
	}
	svg.selectAll('.center')
	.data(centers)
	.transition()
	.ease('linear')
	.duration(trans_secs * 1000)
	.attr('cx', function(d){return d.x;})
	.attr('cy', function(d){return d.y;});

	//check to see if centroids have failed to move any more (i.e. algorithm complete)
	var all_finalized = true;
	var svg_centers = svg.selectAll('.center')[0];
	for (var i = 0; i < centers.length; i++){
		var center = centers[i];
		var svg_center = {x: Math.round(svg_centers[i].cx.baseVal.value),
			y: Math.round(svg_centers[i].cy.baseVal.value)};
		all_finalized &= (center.x == center.old_x 
			&& center.y == center.old_y 
			&& Math.round(center.x) == svg_center.x 
			&& Math.round(center.y) == svg_center.y
			);
	}
	if (all_finalized){
		stream = false;
		document.getElementById('movecenters').disabled = true;
		document.getElementById('togglestream').disabled = true;
		document.getElementById('final').innerHTML = "Finalized!";
		stop_update = true;
	}
}

function eucDist(a,b){
	return Math.sqrt(Math.pow((a.x - b.x),2) + Math.pow((a.y - b.y),2));
}

function updatePoints(){
	//either creates new svg objects based on clicked points,
	//or re-colors points based on current centroid locations
	if (new_points && !algo_started){
		svg.selectAll('.point')
		.data(points)
		.enter()
		.append('circle')
		.attr('cx', function(d){return d.x;})
		.attr('cy', function(d){return d.y;})
		.attr('r', radius)
		.attr('class','point');
		new_points = false;
	}
	if (algo_started){
		for (var i = [0]; i < points.length; i++){
			var point = points[i];
			var center_candidate = null;
			var center_candidate_dist = 99999;
			var svg_centers = svg.selectAll('.center')[0]
			for (var j = 0; j < centers.length; j++){
				var svg_center = svg_centers[j];
				center_dist = eucDist(point,{x:svg_center.cx.baseVal.value,y:svg_center.cy.baseVal.value});
				if (center_dist < center_candidate_dist){
					center_candidate_dist = center_dist;
					center_candidate = centers[j];
				}
			}
			point.nearest_center = center_candidate;
			point.color = 'light' + point.nearest_center.color;
		}
		svg.selectAll('.point')
		.data(points)
		.attr('style', function(d){return 'fill:' + d.color;});
	}
}

function updateDisplay(){
	document.getElementById('numpoints').innerHTML = points.length;
	document.getElementById('numclusters').innerHTML = Math.round(Math.sqrt(points.length / 2));
}

function toggleStream(){
	stream ^= true;
}

window.setInterval(update,update_ms);
function update(){
	if (stop_update){return true;}
	updatePoints();
	updateDisplay();
	if (stream) { moveCenters(); }
}

Tags: code, d3, javascript, machine learning, programming, web