# Inference

This document partially covers the following modules:

`tramway.analyzer.mapper`

`tramway.analyzer.browser`

Like most of the pages of this documentation, this one originates from a IPython notebook and, as a consequence, the code examples are sequentially run, hence the inclusion of a few transition steps.

**TRamWAy** features multiple models to infer physical properties of random walks, including diffusivity, effective potential, drift, force, etc.
Most of the available models are based on the overdamped Langevin equation.

## Models for random walks

For extensive introduction, please refer to the official documentation and the InferenceMAP paper and webpage.

Briefly, we consider the problem of maximizing the posterior probability for model parameters \(\theta\) given the observed particle displacements \(T_i = {(\Delta t_j, \Delta\textbf{r}_j)}\) partitioned into bins or microdomains whose index is denoted by \(i\):

The various models **TRamWAy** features give different formulae for the local likelihoods \(P(T_i|\theta_i)\).
In addition, \(P(\theta)\) can be treated as a regularization term, often in the shape of an improper prior (not normalized).

The main models are recaped in the following table:

model name | model parameters | equation for the individual displacements |
---|---|---|

D 1 2 | diffusivity \(D\) | \(\frac{d\textbf{r}}{dt} = \sqrt{2D(\textbf{r})} \xi(t)\) |

DD 1 2 | diffusivity \(D\) drift \(\textbf{a}\) |
\(\frac{d\textbf{r}}{dt} = \sqrt{2D(\textbf{r})} \xi(t) + \textbf{a}(\textbf{r})\) |

DF 1 2 | diffusivity \(D\) force \(\textbf{F}\) |
\(\frac{d\textbf{r}}{dt} = \sqrt{2D(\textbf{r})} \xi(t) + D(\textbf{r})\frac{\textbf{F}(\textbf{r})}{k_{\textrm{B}}T}\) |

DV 1 2 | diffusivity \(D\) effective potential \(V\) |
\(\frac{d\textbf{r}}{dt} = \sqrt{2D(\textbf{r})} \xi(t) - D(\textbf{r}) \frac{\nabla V(\textbf{r})}{k_{\textrm{B}}T}\) |

The stochastic variant for the DV model is covered in more details.

```
from tramway.analyzer import *
import numpy as np
```

```
from src.inference import set_notebook_theme
# this actually does something only for *dark* themes
set_notebook_theme('light')
from matplotlib import pyplot as plt
```

The preset analyzer is programmed to load a designated SPT data file, extract the data in a single region of interest, build a k-means-based tessellation, and assign the particle locations to the different Voronoi cells with an additional adjustment to ensure at least 10 locations are associated with each cell.

```
from src.inference import reset_data, preset_analyzer
reset_data()
```

### Localization error

All the models above derive a similar likelihood function. As an example, for the DV model, at bin \(i\) with assigned displacements \(T_i\), the local likelihood is:

with \(\sigma^2\) the experimental localization error.

Because \(\sigma\) is a parameter of the experimental setup and protocol, it can be set and found in the `spt_data`

attribute:

```
a, _ = preset_analyzer()
a.spt_data.localization_precision
```

```
0.02
```

`localization_precision`

is actually a property that sets/reflects the `localization_error`

attribute, which corresponds to \(\sigma^2\):

```
a.spt_data.localization_error
```

```
0.0004
```

Such a piece of information is required at several stages of a typical processing chain.
For convenience, the `localization_precision`

and `localization_error`

attributes are also exposed by the `RWAnalyzer`

object and the `mapper`

attribute, and point to the same value in memory, with the `localization_precision`

properties returning the square-root of `localization_error`

.

### D

Assuming the dynamics is solely diffusive in nature, the following likelihood is considered at bin \(i\), with observed displacements \(T_i\):

with \(\sigma^2\) the localization error.

```
d_analyzer, assignment = preset_analyzer()
d_analyzer.mapper = mapper.MapperPlugin('d')
```

```
%%time
d_map = d_analyzer.mapper.infer(assignment)
```

```
CPU times: user 2.46 s, sys: 81.6 ms, total: 2.55 s
Wall time: 2.48 s
```

See the Data structures section for more information about the returned object.

```
_, axes = plt.subplots(1, 2, figsize=(16,6))
d_analyzer.mapper.mpl.plot(d_map, 'diffusivity', assignment, axes=axes[0], title='diffusivity (not regularized)', unit='std')
d_analyzer.mapper.mpl.plot(d_map, 'diffusivity', assignment, axes=axes[1], title='same fig. + particle locations (in red)', unit='std', overlay_locations=True)
```

### Smoothing regularizers

To properly estimate the local diffusivity with few data points, we make use of regularizing priors that smooth the diffusivity landscape.

The `'d'`

, `'dd'`

, `'df'`

and `'dv'`

models feature the following default penalty on the diffusivity \(D\):

with \(\mathcal{A}_i\) the surface area (or volume) of bin \(i\), and \((\nabla D)_i\) the gradient of \(D\) at bin \(i\).

```
reg_d_analyzer, assignment = preset_analyzer()
reg_d_analyzer.mapper = mapper.MapperPlugin('d')
```

The \(\lambda\) hyperparameter can be set as follows:

```
m = assignment.number_of_cells # number of bins
n = len(assignment) # number of assigned translocations
roi_surface_area = 2.5 * 2.5 # in µm²
# how large can be a difference in values over a 1-µm distance?
expected_high_delta_D = .3 # in µm²s⁻¹
diffusivity_prior_z = expected_high_delta_D ** 2 * roi_surface_area * m
```

```
reg_d_analyzer.mapper.diffusivity_prior = (30 * n) / diffusivity_prior_z
```

Although the above value of `30`

is still to be adjusted, *e.g.* depending on the size of the expected pattern in the diffusivity landscape, the proposed normalization makes this value less variable from a dataset to another.

Indeed, the log-likelihood term in the cost function is a sum over all translocations (`n`

), while the regularizing term is a sum over all bins (`m`

).

Be careful however, when exploring the hyperparameter space, that it may take much more time for the inference to converge at higher weights on the regularizing prior.
If this happens, as long as at least one iteration was performed, you can interrupt the interpreter with `Ctrl`+`C`, or the *Interrupt the kernel* button in a notebook.
The solution at the last iteration is returned and the program execution proceeds.

In addition to this smoothing prior, we can also prevent the estimated diffusivity values to become negative, using a Jeffreys' prior:

```
reg_d_analyzer.mapper.jeffreys_prior = True
```

A better way to prevent the diffusivity to take negative values would consist of removing the static trajectories. Such a preprocessing step is skipped here, but systematically performed by the building blocks mentioned further.

Here, we stick to the Jeffreys' prior solution for comparing with the regularization-free approach.

```
%%time
reg_d_map = reg_d_analyzer.mapper.infer(assignment)
```

```
CPU times: user 8h 10min 58s, sys: 47.5 s, total: 8h 11min 46s
Wall time: 8h 10min 42s
```

See the Data structures section for more information about the returned object.

```
_, axes = plt.subplots(1, 2, figsize=(16,6))
d_analyzer.mapper.mpl.plot(d_map, 'diffusivity', assignment, axes=axes[0], title='diffusivity (not regularized)', unit='std')
reg_d_analyzer.mapper.mpl.plot(reg_d_map, 'diffusivity', assignment, axes=axes[1], title='diffusivity (regularized)', unit='std')
```

See also the Spatial smoothing priors section of the original documentation.

### DD

This model estimates the local drift \(\textbf{a}(\textbf{r})\) in a straight-forward fashion, with the following likelihood at bin \(i\) and observed displacements \(T_i\):

with \(\sigma^2\) the localization error.

```
dd_analyzer, assignment = preset_analyzer()
dd_analyzer.mapper = mapper.MapperPlugin('dd')
```

The `mapper`

attribute for `'dd'`

features the `diffusivity_prior`

and `drift_prior`

smoothing hyperparameters.

```
%%time
dd_maps = dd_analyzer.mapper.infer(assignment)
```

```
CPU times: user 3.81 s, sys: 15 µs, total: 3.81 s
Wall time: 3.79 s
```

See the Data structures section for more information about the returned object.

```
_, axes = plt.subplots(1, 2, figsize=(16,6))
dd_analyzer.mapper.mpl.plot(dd_maps, 'diffusivity', assignment, axes=axes[0], title='diffusivity (not regularized)', unit='std')
dd_analyzer.mapper.mpl.plot(dd_maps, 'drift', assignment, axes=axes[1], title='drift (not regularized)', unit='log[ µm / s ]', transform='log')
```

### Arrow scaling in field maps

```
_, axes = plt.subplots(1, 2, figsize=(16,6))
common_args = (dd_maps, 'drift', assignment)
common_kwargs = dict(unit='std', transform='log')
dd_analyzer.mapper.mpl.plot(*common_args, **common_kwargs,
axes=axes[0], title='drift (constant size arrows)',
arrow_size='constant', cell_arrow_ratio=1.3) # cell_arrow_ratio is optional
dd_analyzer.mapper.mpl.plot(*common_args, **common_kwargs,
axes=axes[1], title='drift (same as above)')
```

### DF

Building upon the DD model, the drift is explained by a force \(\textbf{F}\) and friction \(\gamma\): \(\textbf{a}(\textbf{r}) = \frac{\textbf{F}(\textbf{r})}{\gamma(\textbf{r})}\). In addition, assuming equilibrium: \(\gamma(\textbf{r}) = \frac{k_{\textrm{B}}T}{D(\textbf{r})}\).

As a consequence, the following likelihood is considered at bin \(i\), with observed displacements \(T_i\):

with \(\sigma^2\) the localization error.

```
df_analyzer, assignment = preset_analyzer()
df_analyzer.mapper = mapper.MapperPlugin('df')
```

The `mapper`

attribute for `'df'`

features the `diffusivity_prior`

and `force_prior`

smoothing hyperparameters.

```
%%time
df_maps = df_analyzer.mapper.infer(assignment)
```

```
CPU times: user 6.26 s, sys: 124 ms, total: 6.38 s
Wall time: 6.24 s
```

See the Data structures section for more information about the returned object.

```
_, axes = plt.subplots(1, 2, figsize=(16,6))
df_analyzer.mapper.mpl.plot(df_maps, 'diffusivity', assignment, axes=axes[0], title='diffusivity (not regularized)', unit='std')
df_analyzer.mapper.mpl.plot(df_maps, 'force', assignment, axes=axes[1], title='force (not regularized)', clim=[-1,5], clip=3, unit='std', transform='log')
```

### DV

Building upon the DF model, the force field is assumed to be conservative and result from a potential energy: \(\textbf{F}(\textbf{r}) = -\nabla V(\textbf{r})\).

The following likelihood is considered at bin \(i\), with observed displacements \(T_i\):

with \(\sigma^2\) the localization error. In practice, TRamWAy estimates the dimension-less ratio \(\frac{V}{k_{\textrm{B}}T}\) and \(V\) is expressed as multiples of \(k_{\textrm{B}}T\).

```
dv_analyzer, assignment = preset_analyzer()
dv_analyzer.mapper = models.DV()
```

`mapper.MapperPlugin('dv')`

is also available but its naive implementation makes it far too slow for the example 635-bin segmentation.
We recommend using `models.DV()`

instead which wraps an alternative implementation with a faster gradient calculation for the cost function to be minimized.

Note however that `models.DV()`

also features an alternative regularizer.

The `mapper`

attribute for `mapper.MapperPlugin('dv')`

and `models.DV()`

features the `diffusivity_prior`

and `potential_prior`

smoothing hyperparameters.

The effective potential is implicitly smoothed, already, due to the calculation of \(\nabla V\). The DV model is slow for the same reason, that also requires all the parameters to be jointly optimized.

### Stochastic DV

A faster stochastic implementation of the DV model is available in two flavors: either fully stochastic (`'stochastic.dv'`

plugin or `models.DV(variant='stochastic')`

), or two-phase with the stochastic approach used first and the standard approach used second for fine-tuning (`'semi.stochastic.dv'`

plugin or `models.DV(start='stochastic')`

).

The stochastic method is described in a published article. As observed in that work, the stochastic variant works well to reach a basin around the solution, and then underperforms within the basin. That is why we recommend the semi-stochastic approach with a predetermined number of stochastic steps followed by non-stochastic fine-tuning.

Among other benefits of reducing the complexity of the inference, this method makes spatio-temporal inferences possible, that regularize the dynamics parameters across time.

```
sdv_analyzer, assignment = preset_analyzer()
sdv_analyzer.mapper = models.DV(start='stochastic')
```

#### Setting the smoothing priors

The `'stochastic.dv'`

model features an alternative penalty on the diffusivity \(D\) (and similarly on the effective potential \(V\)):

with \(i\) all the existing bin/microdomain indices and \(N_i\) the subset of neighbor bins.

This is used as a prior probability function but, just like the default smoothing prior for the other algorithms, is not a proper probability function. Per default, no normalization is performed to allow maximum control on the regularization effect, especially across multiple datasets.

Here we normalize as follows:

```
m = assignment.number_of_cells # number of bins
n = len(assignment) # number of assigned translocations
roi_surface_area = 2.5 * 2.5 # in µm²
# how large can be a difference in values over a 1-µm distance?
expected_high_delta_D = .3 # in µm² s⁻¹
expected_high_delta_V = 5 # in kT
diffusivity_prior_z = expected_high_delta_D ** 2 * roi_surface_area * m
potential_prior_z = expected_high_delta_V ** 2 * roi_surface_area * m
```

Due to \(\nabla V\) already smoothing the landscape of \(V\), we can penalize the spatial variations of \(V\) to a lesser extent than those of \(D\).

Compared with the original smoothing prior from InferenceMAP, we will consider lower weights for this prior, usually 4 times lower to account for how \(\nabla V\) is calculated and how many times each difference is counted and scaled.

Again, higher weights on the diffusivity smoothing prior usually make the inference take quite a lot more time, but in the case the inference takes too long, you can interrupt it with `Ctrl`+`C`, or the *Interrupt the kernel* square button in a notebook.

```
sdv_analyzer.mapper.diffusivity_prior = (7.5 * n) / diffusivity_prior_z
sdv_analyzer.mapper.potential_prior = (0.5 * n) / potential_prior_z
```

```
((7.5 * n) / diffusivity_prior_z, (0.5 * n) / potential_prior_z)
```

```
(259.2755905511811, 0.06222614173228346)
```

#### Running the inference

This is done calling the `infer`

method, passing as input argument the assignment previously given by the `sample`

method.

```
%%time
sdv_maps = sdv_analyzer.mapper.infer(assignment)
```

```
CPU times: user 3min 24s, sys: 9.39 s, total: 3min 33s
Wall time: 9min 5s
```

See the Data structures section for more information about the returned object.

#### Plotting the inferred parameters

```
_, axes = plt.subplots(1, 2, figsize=(16,6))
sdv_analyzer.mapper.mpl.plot(sdv_maps, 'diffusivity', assignment, axes=axes[0], title='diffusivity', unit='std')
sdv_analyzer.mapper.mpl.plot(sdv_maps, 'potential', assignment, axes=axes[1], title='effective potential', unit='std')
```

## Saving parameter maps to *.rwa* files

In the above example, with a single set of maps on a single region of interest, we can explicitly dump the above assignment and maps in a *.rwa* file:

```
import os.path
a = sdv_analyzer # any of the previously-defined analyzer accesses the same file and can be used here
rwa_file = os.path.splitext(a.spt_data.filepath)[0] + '.rwa'
with a.spt_data.autosaving() as analysis_tree:
analysis_tree.rwa_file = rwa_file
analysis_tree['my roi'] = assignment
analysis_tree['my roi']['my maps 0'] = reg_d_map
analysis_tree['my roi']['my maps 1'] = sdv_maps
```

This appends the new records to the existing analyses if the file already exists.

The structure of an analysis tree is flexible, but preferably includes the full SPT data at the root node (as a `DataFrame`

object), the different assignments as immediate children nodes (as `Partition`

objects), and finally the inferred maps as grand-children (as `Maps`

objects):

```
print(analysis_tree)
```

```
<class 'pandas.core.frame.DataFrame'>
'my roi' <class 'tramway.tessellation.base.Partition'>
'my maps 0' <class 'tramway.inference.base.Maps'>
'my maps 1' <class 'tramway.inference.base.Maps'>
```

This analysis tree can be loaded using the `load_rwa`

function exported by the `tramway.helper`

module:

```
%%script echo skipping
from tramway.helper import load_rwa
reloaded_analysis_tree = load_rwa(rwa_file)
print(reloaded_analysis_tree)
assignment = reloaded_analysis_tree['my roi'].data
reg_d_map = reloaded_analysis_tree['my roi']['my maps 0'].data
```

```
skipping
```

Generally speaking, inspecting a *.rwa* file may be easier without the help of the `RWAnalyzer`

object, and the reader is referred to the command-line interface or the `tramway.helper`

package for functions like `load_rwa`

, `cell_plot`

or `map_plot`

.

Indeed, an `RWAnalyzer`

object cannot be loaded directly from a *.rwa* file.
Especially, a single `RWAnalyzer`

object may handle multiple files, and a *.rwa* file may contain analysis artefacts generated by different `RWAnalyzer`

objects.

In addition, even a specific analysis path from a *.rwa* file does not contain enough information to rebuild an `RWAnalyzer`

object.

First, we would need to have the region of interest defined in an additional file. Such a file can be generated as follows:

```
roi_file = os.path.splitext(a.spt_data.filepath)[0] + '-roi.txt'
f = single(a.spt_data)
f.roi.to_ascii_file(roi_file)
```

See the roi notebook/section about ROI for more information.

However, an analysis based on a `RWAnalyzer`

object is mostly declarative. The entire computation can be described as a series of attribute assignments, and eventually triggered by a single line of code.

## Segmenting and inferring all at once

To further ease the manipulation of data files, including *.rwa* files, the `tramway.analyzer`

package features some convenient building blocks, especially `tessellate_and_infer`

.

For example, to reproduce the steps of the DV inference above, on top of a k-means-based tessellation with local adjustments to reach a minimum number of 10 data points per space bin:

```
from tramway.analyzer import *
import numpy as np
import pandas as pd
a = RWAnalyzer()
a.spt_data.from_ascii_file('../data/Image_traj.txt')
a.spt_data.frame_interval = .04 # in s
a.spt_data.localization_precision = .02 # in µm
a.roi.from_ascii_files() # default filepath will be '../data/Image_traj-roi.txt'
a.tesseller = tessellers.KMeans
a.tesseller.resolution = .05 # in µm
a.sampler = sampler.Knn(10)
a.mapper = models.DV(start='stochastic')
# details about the hyperparameters below are omitted
a.mapper.diffusivity_prior = 259.27
a.mapper.potential_prior = 0.062226
```

```
a.pipeline.append_stage(stages.tessellate_and_infer(sampling_label='my roi', map_label='my map 0'))
```

The last expression above does not run the computation. It only schedules it for later execution by the `run`

method. The resulting analysis artefacts are saved in a *.rwa* file whose name per default is borrowed by the SPT data filename, replacing the original extension by the *.rwa* extension.

Note that `tessellate_and_infer`

(similarly `tessellate`

) will discard the static trajectories.
This additional preprocessing step was not performed by the `preset_analyzer`

function to keep it as simple as possible.

With a few additional lines, the computation can be dispatched to a computer cluster instead of being run on the local machine. This, however, is out of the scope of this section.

The computation can be triggered calling the `run`

method:

```
%%script echo skipping
a.pipeline.run()
```

```
skipping
```

We skip the execution of the above `run`

call so that the present document (originally a Jupyter notebook) can be run in a limited amount of time.

## Post-processing

Instead, if the *.rwa* file already exists, we can load and manipulate it:

```
stages.reload(skip_missing=False)(a)
```

To show the same maps as above, in a different flavor, let us plot 3D surfaces:

```
%%capture
import sys
!"{sys.executable}" -m pip install plotly nbformat
```

```
from plotly.io import renderers
renderers.default = 'notebook'
from plotly.subplots import make_subplots
from IPython.display import HTML, display
for r in a.roi.as_support_regions(): # in this example, we have a single ROI
assignment = r.get_sampling('my roi')
maps = assignment.get_child('my maps 1')
fig = make_subplots(rows=1, cols=2,
specs=[[{'type': 'surface'}, {'type': 'surface'}]],
subplot_titles=['diffusivity', 'effective potential'])
a.mapper.plotly.plot_surface(maps, 'diffusivity', assignment, fig=fig, col=0,
colorbar=dict(thickness=20, x=0.46))
a.mapper.plotly.plot_surface(maps, 'potential', assignment, fig=fig, col=1,
colorbar=dict(thickness=20, x=1.02))
fig.update_layout(
scene1=dict(
zaxis=dict(range=[-.08,.15]),
camera=dict(eye=dict(x=.8, y=.8, z=1.4)),
),
scene2=dict(
zaxis=dict(range=[-2,8.3]),
camera=dict(eye=dict(x=.5, y=.5, z=2.)),
),
)
# fig.show() # nbconvert cannot export the figure this way
display(HTML(fig.to_html())) # .. so we call this expression instead
```