Segmentation

An essential processing step in TRamWAy consists of dividing the space and time support of the localization microscopy data into time segments and microdomains so as to resolve the dynamics at these different locations in space and time.

The individual molecule locations or displacements are assigned to these segments/microdomains. While the simplest models of the dynamics will independently apply to each spatio-temporal microdomain, others will introduce global regularization terms, e.g. smoothing priors, or compute spatial and/or temporal variations, e.g. gradients or derivatives, which requires the corresponding inference procedures to jointly apply to all the microdomains simultaneously.

Here, the terms microdomain, cell and bin are used interchangeably.

This notebook focuses on the RWAnalyzer class and its tesseller, time and sampler attributes, together with the modules of same names. All of these symbols are exported by the tramway.analyzer package.

The tesseller and time attributes drive the spatial and temporal segmentations respectively, while the sampler attribute articulates the two segmentations and assigns molecule locations to the different microdomains.

As a result, to demonstrate the tesseller or time features, the sample method of the sampler attribute is called anyway.

from tramway.analyzer import *
import numpy as np

from src.segmentation import set_notebook_theme
set_notebook_theme()

from matplotlib import pyplot as plt

from src.segmentation import preset_analyzer, preset_analyzers

Tessellating

An instanced RWAnalyzer object features a tesseller attribute that can be set using any initializer provided by the tramway.analyzer.tesseller module:

a = RWAnalyzer()

a.tesseller = tesseller.TessellerPlugin('grid')

Alternatively, the tessellers module (also exported by the tramway.analyzer package) exports more convenient wrappers:

a = RWAnalyzer()

a.tesseller = tessellers.Squares

tessellers.Squares is equivalent to tesseller.from_plugin('grid') and both rely on the tramway.tessellation.grid module, but the default parameters for the wrappers may differ from the original «plugin»'s defaults.

Standard methods

Most methods feature a resolution attribute that allows to size the spatial bin. These methods are illustrated with and without this attribute explicitly set:

Square grid
Hexagonal grid
k-means
GWR
k-d tree
Random mesh

Beware the name resolution may be misleading, as the resolution attribute actually reflects the desired size of the spatial bins, instead of the density of bins per space unit. To increase the resolution, the bin size should be decreased, so as the value for the resolution attribute, hence the possible confusion.

Square grid

a, b, translocations = preset_analyzers(2)

# first analyzer with default parameters (left figure)
a.tesseller = tessellers.Squares

# second analyzer with explicitly set attribute `resolution` (right figure)
b.tesseller = tessellers.Squares
b.tesseller.resolution = .05

assignment_a = a.sampler.sample(translocations)
assignment_b = b.sampler.sample(translocations)

# plot
_, axes = plt.subplots(1, 2, figsize=(12, 6))
a.tesseller.mpl.plot(assignment_a, axes=axes[0], title='default resolution')
b.tesseller.mpl.plot(assignment_b, axes=axes[1], title='resolution = 0.05 µm')

png

Hexagonal grid

a, b, translocations = preset_analyzers(2)

# first analyzer with default parameters (left figure)
a.tesseller = tessellers.Hexagons

# second analyzer with explicitly set attribute `resolution` (right figure)
b.tesseller = tessellers.Hexagons
b.tesseller.resolution = .05

assignment_a = a.sampler.sample(translocations)
assignment_b = b.sampler.sample(translocations)

# plot
_, axes = plt.subplots(1, 2, figsize=(12, 6))
a.tesseller.mpl.plot(assignment_a, axes=axes[0], title='default resolution')
b.tesseller.mpl.plot(assignment_b, axes=axes[1], title='resolution = 0.05 µm')

png

k-means

a, b, translocations = preset_analyzers(2)

# first analyzer with default parameters (left figure)
a.tesseller = tessellers.KMeans

# second analyzer with explicitly set attribute `resolution` (right figure)
b.tesseller = tessellers.KMeans
b.tesseller.resolution = .05

assignment_a = a.sampler.sample(translocations)
assignment_b = b.sampler.sample(translocations)

# plot
_, axes = plt.subplots(1, 2, figsize=(12, 6))
a.tesseller.mpl.plot(assignment_a, axes=axes[0], title='default resolution')
b.tesseller.mpl.plot(assignment_b, axes=axes[1], title='resolution = 0.05 µm')

png

GWR

While the Growing-When-Required method is adaptive similarly to the k-means method, it is the only available method in TRamWAy that can fit the local average displacement length in addition to the local density. Basically, it fits the displacement length wherever the density is high and, at the same time, exhibits more flexibility in low-density areas, making this method a good candidate for heterogeneously dense data.

a, b, translocations = preset_analyzers(2)

# first analyzer with default parameters (left figure)
a.tesseller = tessellers.GWR

# second analyzer with explicitly set attribute `resolution` (right figure)
b.tesseller = tessellers.GWR
b.tesseller.resolution = .05

assignment_a = a.sampler.sample(translocations)
assignment_b = b.sampler.sample(translocations)

# plot
_, axes = plt.subplots(1, 2, figsize=(12, 6))
a.tesseller.mpl.plot(assignment_a, axes=axes[0], title='default resolution')
b.tesseller.mpl.plot(assignment_b, axes=axes[1], title='resolution = 0.05 µm')

png

Other methods

Some tessellation methods are not available as wrappers, but as plugins instead.

k-d tree

This method inherits from the quad-tree algorithm in InferenceMAP. It does not rely on the Euclidean distance to partition the space.

a, b, translocations = preset_analyzers(2)

# first analyzer with default parameters (left figure)
a.tesseller = tesseller.TessellerPlugin('kdtree')

# second analyzer with explicitly set attribute `resolution` (right figure)
b.tesseller = tesseller.TessellerPlugin('kdtree')
b.tesseller.resolution = .05

assignment_a = a.sampler.sample(translocations)
assignment_b = b.sampler.sample(translocations)

# plot
_, axes = plt.subplots(1, 2, figsize=(12, 6))
a.tesseller.mpl.plot(assignment_a, axes=axes[0], title='default resolution')
b.tesseller.mpl.plot(assignment_b, axes=axes[1], title='resolution = 0.05 µm')

png

Random

Random microdomain centers can be sampled from the location data.

The number of microdomains can be controlled with the avg_probability attribute that is basically the inverse of the desired number of microdomains.

The resolution attribute currently does not apply.

a, b, translocations = preset_analyzers(2)

# first analyzer with default parameters (left figure)
a.tesseller = tesseller.TessellerPlugin('random')

# second analyzer with explicitly set attribute `resolution` (right figure)
b.tesseller = tesseller.TessellerPlugin('random')
b.tesseller.avg_probability = 2e-3

assignment_a = a.sampler.sample(translocations)
assignment_b = b.sampler.sample(translocations)

# plot
_, axes = plt.subplots(1, 2, figsize=(12, 6))
a.tesseller.mpl.plot(assignment_a, axes=axes[0], title='default resolution')
b.tesseller.mpl.plot(assignment_b, axes=axes[1], title='resolution = 0.05 µm')

png

Merging microdomains

For microdomains with too few assigned locations, on the basis of a Voronoi-based partition, microdomains can be merged.

For example, if we reuse our default random mesh and merge the microdomains (or cells) with less than 20 locations:

# right figure
c = preset_analyzer()

c.tesseller = tesseller.TessellerPlugin('random')

c.tesseller.post_processing = cell_mergers.ByTranslocationCount
c.tesseller.post_processing.count_threshold = 20
c.tesseller.post_processing.update_centroids = True

assignment_c = c.sampler.sample(translocations)

# plot
_, axes = plt.subplots(1, 2, figsize=(12, 6))
a.tesseller.mpl.plot(assignment_a, axes=axes[0], title='original random mesh')
c.tesseller.mpl.plot(assignment_c, axes=axes[1], title='random mesh with merged cells')

png

Time windowing

Time segments can be defined using the time attribute.

As for all such attributes, a similarly-named module is exported by tramway.analyzer, with a series of constructors. In the case of the time attribute, the main constructor defines a sliding time window:

a, translocations = preset_analyzer(True)

a.time = time.SlidingWindow(duration=60) # in s

An overlap between successive time segments can be introduced defining a shift less than the window duration:

a.time.window_shift = 30 # in s

If the time attribute is not explicitly set, the sample method simply skips the time segmentation.

In any case, the attribute exposes several methods to handle the Partition object returned by sampler.sample().

assignment = a.sampler.sample(translocations)

a.time.n_time_segments(assignment)

import pandas as pd
pd.set_option('display.max_rows', 6)

t0s, t1s, counts = [], [], []
for t, assignment_t in a.time.as_time_segments(assignment, return_times=True):
    t0s.append(t[0])
    t1s.append(t[1])
    counts.append(len(assignment_t.points))

pd.DataFrame({'t0': t0s, 't1': t1s, '#points': counts})

	t0	t1	#points
0	20.48	80.48	186
1	50.48	110.48	536
2	80.48	140.48	823
...	...	...	...
22	680.48	740.48	916
23	710.48	770.48	984
24	740.48	800.48	924

25 rows × 3 columns

Sampling

The assignment operates in different ways for the spatial segmentation and the temporal segmentation: * time segments are explicitly defined and therefore the assignment is straight-forward, * space bins are often defined as bin centers, and a default sampler will build the Voronoi graph for these bin centers and deals the molecule location into a partition of the space support. In this case, a sampler can also introduce some degree of overlap based on criteria such as a desired minimum number of locations per microdomain.

Note that neither the time nor the sampler attributes need to be explicitly set. The undefined sampler attribute does exposes the sample method and falls back on a Voronoi-based partition of the space support.

If explicitly defined, the sampler attribute exposes the min_location_count attribute that discards the locations in the bins/cells with fewer locations than requested:

# left figure
a, translocations = preset_analyzer(True)
a.tesseller = tesseller.TessellerPlugin('random')
assignment_a = a.sampler.sample(translocations)

# right figure
b = preset_analyzer()
b.tesseller = tesseller.TessellerPlugin('random')

b.sampler = sampler.VoronoiSampler()
b.sampler.min_location_count = 100

assignment_b = b.sampler.sample(translocations)

# plot
_, axes = plt.subplots(1, 2, figsize=(12, 6))
a.tesseller.mpl.plot(assignment_a, axes=axes[0], title='random mesh',
                     location_options=dict(color=None, size=3, alpha=1))
b.tesseller.mpl.plot(assignment_b, axes=axes[1], title='random mesh with discarded cells',
                     location_options=dict(color=None, size=3, alpha=1))

png

The locations in gray pertain to discarded Voronoi cells.

Another approach to controlling the minimum number of points per bin consists in dynamically resizing the bins to assign additional points taken to number bins. In this case, the bins overlap.

The sampler.Knn sampler initially operates just like sampler.VoronoiSampler and then selectively extend the bins that count too few data points.

a, translocations = preset_analyzer(True)
a.tesseller = tesseller.from_plugin('random')

a.sampler = sampler.Knn(100)

assignment = a.sampler.sample(translocations)

tessellation = assignment.tessellation

# deactivate all Voronoi cells
tessellation.cell_label = np.zeros(tessellation.number_of_cells, dtype=bool)

# activate a few Voronoi cells
example_cell_indices = [2, 3, 38, 48, 51, 55, 75, 87, 113, 123, 124]
tessellation.cell_label[example_cell_indices] = True

# update the assignment
kwargs = assignment.param['partition']
assignment.cell_index = tessellation.cell_index(translocations[['x','y']], **kwargs)

# plot
_, axes = plt.subplots(1, 2, figsize=(14, 7))
a.tesseller.mpl.plot_cell_indices(assignment, axes=axes[0], title='cell indices',
                                  fontsize=8)
a.tesseller.mpl.plot(assignment, axes=axes[1], title='example extended cells',
                     location_options=dict(color=None, size=3, alpha=1))
centroids = tessellation.cell_centers[example_cell_indices]
axes[1].plot(centroids[:,0], centroids[:,1], 'bo', markersize=10);

png

Data structures

The object returned by the sample method stores the SPT data (points), the segmentation (tessellation) and the assignment of the data points to the bins (cell_index):

assignment = assignment_a
print(assignment)

tessellation:    <class 'tramway.tessellation.random.RandomMesh'>
points:          <class 'pandas.core.frame.DataFrame'>
cell_index:      <class 'numpy.ndarray'>
location_count:  <class 'numpy.ndarray'>
number_of_cells: 135
bounding_box:              n        x       y       t      dx      dy    dt
                 min     3.0  40.9835  6.1101   20.48 -0.5226 -0.5081  0.04
                 max  1856.0  41.9829  7.1098  795.04  0.5025  0.5157  0.04
@tessellation:   min_distance:       0.061511017201744324
                 avg_distance:       0.1537775430043608
                 max_distance:       None
                 min_probability:    0.001653302471687195
                 avg_probability:    0.00661320988674878
                 max_probability:    None
                 ref_distance:       0.0768887715021804
@partition:      min_location_count: 0

Attributes under @tessellation are sets of arguments passed to the segmentation method.

A temporal segmentation is also keyworded tessellation:

a, translocations = preset_analyzer(True)
a.time = time.SlidingWindow(duration=60)
time_assignment = a.sampler.sample(translocations)

print(time_assignment)

tessellation:    <class 'tramway.tessellation.window.SlidingWindow'>
points:          <class 'pandas.core.frame.DataFrame'>
cell_index:      <class 'tuple'>
location_count:  None
number_of_cells: 13
bounding_box:    None
@partition:      min_location_count: 0

The temporal segmentation alone has little interest for the common use cases TRamWAy was designed for. As a consequence, the time attribute always come as an additional dimension once the spatial segmentation is defined, and it handles the additional boilerplate for distinguishing the spatial component of the overall segmentation.

print(a.time.get_spatial_segmentation(time_assignment))

None

a.tesseller = tessellers.KMeans
spatiotemporal_assignment = a.sampler.sample(translocations)

a.time.get_spatial_segmentation(spatiotemporal_assignment)

<tramway.tessellation.kmeans.KMeansMesh at 0x7f829b4987c0>