I am looking for Python implementation of kmeans algorithm with examples to cluster and cache my database of coordinates.

1I did a similar implementation for images. You can use 2d arrays instead of RGB values. It's very naive but works for me github.com/keremgocen/patternrecognotes.– KeremMay 18 '15 at 1:35
Update: (Eleven years after this original answer, it's probably time for an update.)
First off, are you sure you want kmeans? This page gives an excellent graphical summary of some different clustering algorithms. I'd suggest that beyond the graphic, look especially at the parameters that each method requires and decide whether you can provide the required parameter (eg, kmeans requires the number of clusters, but maybe you don't know that before you start clustering).
Here are some resources:
Old answer:
Scipy's clustering implementations work well, and they include a kmeans implementation.
There's also scipycluster, which does agglomerative clustering; ths has the advantage that you don't need to decide on the number of clusters ahead of time.

Why is scipy preferred over sklean for kmeans? Having used both recently, I found I liked sklearn's implementation more Apr 3 '19 at 18:05
SciPy's kmeans2() has some numerical problems: others have reported error messages such as "Matrix is not positive definite  Cholesky decomposition cannot be computed" in version 0.6.0, and I just encountered the same in version 0.7.1.
For now, I would recommend using PyCluster instead. Example usage:
>>> import numpy
>>> import Pycluster
>>> points = numpy.vstack([numpy.random.multivariate_normal(mean,
0.03 * numpy.diag([1,1]),
20)
for mean in [(1, 1), (2, 4), (3, 2)]])
>>> labels, error, nfound = Pycluster.kcluster(points, 3)
>>> labels # Cluster number for each point
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)
>>> error # The withincluster sum of distances for the solution
1.7721661785401261
>>> nfound # Number of times this solution was found
1

2It also seems that the scipy cluster kmeans function does not accept a distance method and always uses Euclidean. Another reason to use PyCluster?– SidFeb 22 '12 at 21:04

just hit the error mentioned... I see in your example the cluster groupings, but can you get the cluster "center"?– monkutMay 24 '12 at 3:18

@monkup,
numpy.vstack([points[labels == i].mean(0) for i in range(labels.max() + 1)])
to get the centers of the clusters. May 24 '12 at 9:04 
1You can get rid of the error in kmeans2 by using the keyword argument minit='points' Aug 27 '14 at 5:44
For continuous data, kmeans is very easy.
You need a list of your means, and for each data point, find the mean its closest to and average the new data point to it. your means will represent the recent salient clusters of points in the input data.
I do the averaging continuously, so there is no need to have the old data to obtain the new average. Given the old average k
,the next data point x
, and a constant n
which is the number of past data points to keep the average of, the new average is
k*(1(1/n)) + n*(1/n)
Here is the full code in Python
from __future__ import division
from random import random
# init means and data to random values
# use real data in your code
means = [random() for i in range(10)]
data = [random() for i in range(1000)]
param = 0.01 # bigger numbers make the means change faster
# must be between 0 and 1
for x in data:
closest_k = 0;
smallest_error = 9999; # this should really be positive infinity
for k in enumerate(means):
error = abs(xk[1])
if error < smallest_error:
smallest_error = error
closest_k = k[0]
means[closest_k] = means[closest_k]*(1param) + x*(param)
you could just print the means when all the data has passed through, but its much more fun to watch it change in real time. I used this on frequency envelopes of 20ms bits of sound and after talking to it for a minute or two, it had consistent categories for the short 'a' vowel, the long 'o' vowel, and the 's' consonant. wierd!

this is a great online learning kmeans algorithm! But there is bug at last row of the code. should remove one tab on this row: means[closest_k] = means[closest_k]*(1param) + x*(param)– laiJul 24 '15 at 9:49
(Years later) this kmeans.py under isitpossibletospecifyyourowndistancefunctionusingscikitslearnkmeans is straightforward and reasonably fast; it uses any of the 20odd metrics in scipy.spatial.distance.
From wikipedia, you could use scipy, Kmeans clustering an vector quantization
Or, you could use a Python wrapper for OpenCV, ctypesopencv.
Or you could OpenCV's new Python interface, and their kmeans implementation.
SciKit Learn's KMeans() is the simplest way to apply kmeans clustering in Python. Fitting clusters is simple as:
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
.
This code snippet shows how to store centroid coordinates and predict clusters for an array of coordinates.
>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [4, 2], [4, 4], [4, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([0, 0, 0, 1, 1, 1], dtype=int32)
>>> kmeans.predict([[0, 0], [4, 4]])
array([0, 1], dtype=int32)
>>> kmeans.cluster_centers_
array([[ 1., 2.],
[ 4., 2.]])
(courtesy of SciKit Learn's documentation, linked above)
Python's Pycluster and pyplot can be used for kmeans clustering and for visualization of 2D data. A recent blog post Stock Price/Volume Analysis Using Python and PyCluster gives an example of clustering using PyCluster on stock data.