SPINNA
======

SPINNA is a module for analyzing the oligomerization of proteins using super-resolution microscopy data. For more information, please refer to the publication `L. A. Masullo, R. Kowalewski, et al. Nature Comm, 2025 <https://doi.org/10.1038/s41467-025-59500-z>`_.

As of Picasso 0.8.4., SPINNA supports labeling efficiency fitting as described in `J. Hellmeier, S. Strauss, et al. Nature Methods, 2024 <https://doi.org/10.1038/s41592-024-02242-5>`_. See the "Fitting" section below for instructions.

Overview of the GUI
-------------------

The GUI consists of three tabs that can be navigated at the top of the screen: 

1. *Structures*: to define the structures used in simulations.
2. *Simulate*: to simulate any combination of structures with user-defined parameters as well as to fit the proportions of structures to experimental data.
3. *Mask generation*: to generate masks for simulations with heterogeneous densities of molecular targets.

Structures tab
--------------

.. image:: ../docs/spinna_structures_tab.png
   :alt: structures_tab

This tab allows the user to define the model structures for SPINNA. The outline of the tab is shown above. Follow these steps to create new structures:

1. Click *Add a new structure* in the *Structures summary box* (top right corner).
2. Enter the name of the structure in the new dialog and confirm by clicking *OK*.
3. The structure is now loaded and its name is displayed in the *Preview* box (left panel).
4. To add molecular targets, navigate to the box *Molecular targets* (bottom right corner).
5. Click *Add a molecular target*. This creates a new row in the *Molecular targets* box. Please specify the following: name of the molecular target (e.g., EGFR), x, y and z coordinates (in nm). Please note that the structure will be rotated around the origin (i.e., x = y = z = 0 nm) during simulations. 
6. It is possible to delete each molecular target by clicking its corresponding delete button (*x* in the *Molecular targets* box).
7. The user can navigate between structures by clicking on their names in the *Structures summary* box. 
8. The *Preview* box allows the user to see the currently loaded structure, rotate it in 3D, show/hide legend and scale bar (whose length is adjustable) as well as save the current view as a .png file. 
9. Once at least two molecular targets are defined for the given structure, it is possible to add a new molecular target by clicking with the right mouse button on the structure view.

The image above illustrates the example structures generated for simulations of EGFR described in the main text. Once the structures are ready to use, save them by clicking *Save all structures* in the *Structures summary* box. 

**Note**: the user must ensure that no typos are introduced in the names of the molecular targets, since SPINNA will interpret these as separate molecular target species.

Simulate tab
------------

.. image:: ../docs/spinna_simulate_tab_before_load.png
   :alt: simulate_tab_before_load

This tab is used for fitting the SPINNA model (see **Structures tab** above) to experimental data, displaying nearest neighbor distances (NND) and saving simulated molecules that can later be loaded into Picasso: Render. The image above shows the outline of the tab before loading data.
 
Load data and parameters
~~~~~~~~~~~~~~~~~~~~~~~~

1. Click the *Load structures* button in the top left corner of the window. Upon loading, new widgets will appear in the GUI.  
2. For each detected molecular target species, load the experimental data which must be saved in .hdf5 format that is compatible with localizations files in other Picasso modules, see `here <https://picassosr.readthedocs.io/en/latest/files.html#hdf5-files>`_.
3. Furthermore, input label uncertainty and labeling efficiency and observed density in the *Load data* box. Alternatively, load the mask to simulate heterogeneous distribution by clicking on *Masks* in the bottom left corner of the box. For more information about the mask, see **Mask generation tab**.
4. Moreover, in the *Load data* box, the user can change the dimensionality of the simulation. If 3D simulation is chosen without a mask, the user needs to input the range of z coordinates of molecular targets simulated by clicking *Z range*. In the "Optional settings" dialog, the user can change the mode of rotations (random rotations around z axis (2D), random rotations around 3 axes or no rotations). Additionally, the fitting mode can be adjusted - one of "bayesian", "coarse to fine" or "brute force". The chosen fitting mode applies to all fitting workflows (*Find best fitting combination*, *Compare models* and *Fit LE*). For more information about the fitting modes, see **Fitting** below.

Fitting
~~~~~~~

Within the *Fitting* box:

1. To generate the search space, i.e., the set of stoichiometries tested in SPINNA, click the button *Generate parameter search space* and define the number of simulation repeats and granularity. For more information, see Supplementary Figure 2 in the `SPINNA publication <https://doi.org/10.1038/s41467-025-59500-z>`_.
2. To save the fitting scores for each tested stoichiometry, tick *Save fitting scores*. The user will be asked to input the name of the resulting .csv file.
3. To obtain the result’s uncertainty, check the *Bootstrap* box, which will resample from the best fitting model 20 times and rerun SPINNA on the resampled datasets. Note that this will increase the computation time.
4. To test different SPINNA models, click *Compare models*. The dialog will open, asking the user to input the range of tested label uncertainties (the user can choose to fit label uncertainty or not) and the candidate SPINNA models. For example, the user may want to explore the models with different spacings between the structures or different shape. We recommend the choose lower granularity when comparing models since the fitting may take a long time. A single progress dialog is displayed throughout the comparison; its title shows the current round number (``[Round X/Y]``) so the user knows how many SPINNA rounds remain. The fitting mode selected in *Optional settings* is honored.
5. To run SPINNA, click *Find best fitting combination*. The progress dialog will be displayed.
6. After the fitting is finished, specify the name for saving a fit summary file (.txt). This file includes all the information about the fitting, the parameters and the results. The user may also choose not to save the file by clicking *Cancel* in the dialog. Additionally, the fitted stoichiometry is displayed in the *Single simulation* box and the NND histograms are shown in the *Plotting* box, see image below.

Fitting labeling efficiency
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since v0.10.1 the labeling efficiency (LE) fit has its own workflow. Whenever exactly two molecular targets are loaded, a *Fit labeling efficiencies* button is shown in the *Fitting* box. The user no longer needs to load the three "monomer A / monomer B / heterodimer AB" structures manually — SPINNA constructs them internally from the two target names alone.

Clicking *Fit labeling efficiencies* opens a small dialog with three sections:

1. **Fit label uncertainty** (checkbox) — when checked, the dialog exposes a *From / To / Step* row per target so SPINNA can search for the best label uncertainty. When unchecked, the current value from the *Load data* box is used as a fixed input for that target.
2. **Fit heterodimer distance** (checkbox) — when checked, the dialog exposes a *From / To / Step* row in nm. When unchecked, a single fixed distance is used (entered in the *Distance (nm)* field).
3. **Save fit scores** (checkbox) — when checked, the user selects a folder where SPINNA saves the fit scores for every candidate.

The dialog also displays a live "Estimated SPINNA rounds" preview that updates as the spin boxes change, so the user can gauge how long the fit will take before starting it.

Fitting modes
~~~~~~~~~~~~~

Since v0.10.0, in the "Optional settings", the user can choose between three fitting modes: "bayesian", "coarse to fine" and "brute force". The chosen mode is honored by *Find best fitting combination*, *Compare models* and *Fit LE...*. In the "bayesian" mode, the search space is explored using Bayesian optimization with Gaussian process regression. This is a more efficient way to explore the search space, especially when it is large, and it is recommended as the default fitting mode. In the "coarse to fine" mode, a coarse grid of structure combinations is tested, which consists of 10% of evenly distributed structure combinations. Then, a finer grid is tested around the best combination from the coarse grid. In the "brute force" mode, all combinations of structures are tested sequentially. The "coarse to fine" mode is recommended for faster fitting, especially when the search space is large. Previously, only brute force mode was available.

.. image:: ../docs/spinna_simulate_tab_after_fit.png
   :alt: simulate_tab_after_fit

Single simulation
~~~~~~~~~~~~~~~~~

SPINNA allows the user to run a single simulation to visually inspect NNDs for a specific set of proportions of structures as well as to save the positions of the simulated molecular targets in an .hdf5 format. Once the model structures and simulation parameters in the *Load data* box are defined:

1. Specify area/volume (in the case of homogeneous distribution of structures, i.e., no masking) - it should equal the area/volume of the experimental data.
2. To save the positions of molecules from a simulation, tick *Save positions of simulated molecules*. The user will be asked to enter the name of the resulting file.
3. Click *Run a single simulation*. This will generate and display NND histogram(s) of the simulated molecular targets (solid lines) and (if loaded) of the experimental data (histogram bars). 
4. If fitting was completed before, the user can retrieve the best fitting combination of proportions of structures by clicking *Best fitting combination* in the bottom of the *Input proportions of structures* box.

Plotting
~~~~~~~~

*Plotting* box, located in the top right corner of the GUI, displays the NND histograms for simulated (solid lines) and experimental data (histogram bars).

- The NND plots can be saved by clicking *Save plots* and the plotted values (bins and frequencies) by *Save values*. 
- *# simulations* controls how many simulation results are accumulated to draw NND histograms. The higher the value, the smoother the histograms will be obtained.
- *Plot settings* opens a new dialog that allows the user to show/hide plot legend, adjust the histogram bin size and min. and max. plotted distances, among others, see below.

.. image:: ../docs/spinna_nnd_plot_settings.png
   :scale: 40 %
   :alt: nnd_plot_settings

If the loaded structures include several molecular target species, several NND histograms are plotted, one for each pair of molecular target species which can be explored by clicking left and right arrows in the *Plotting* box.

Mask generation tab
-------------------

.. image:: ../docs/spinna_mask_generation_tab.png
   :alt: mask_generation_tab

This tab allows the user to create a density/binary mask capable of recovering the heterogeneous density distribution present in the experimental data. 
 
1. Click *Load molecules* to open the .hdf5 file with molecules/localizations that will be used to generate the mask. 
2. Adjust bin size and Gaussian blur to be applied to the mask. Since v0.9.6, the user can choose anisotropic bin size and Gaussian blur with one value in the xy plane and another value in the z direction.
3. The mask can be generated in 3D and/or converted to a binary mask.
4. Click *Generate mask*. This may take a while, especially for a 3D mask. The mask will be displayed automatically. The legend in the *Navigation* box displays the probability of finding a molecular target per pixel/voxel. 
5. The density mask can be thresholded at any user-defined probability value. By default, the Otsu threshold is used (Otsu. *Automatica*, 1975). 
6. To explore the mask, use the buttons in the *Navigation* box. Alternatively, arrow keys can be used too. For 3D masks, the user can slice through individual z planes using the slider.
7. Once the mask is ready, click *Save mask*. This saves a numpy array in the .npy format.


Command window - batch analysis
-------------------------------

SPINNA can be run directly from the command window to allow fast and efficient batch analysis – either to analyze many datasets or to analyze the same datasets with many user settings, or both. The entire list thereof is summarized in a .csv file. For more information on Picasso direct command window usage, see `here <https://picassosr.readthedocs.io/en/latest/cmd.html>`_. To run SPINNA batch analysis, run ``python -m picasso spinna -p NAME_OF_CSV_FILE``. The following arguments are available:
- ``-a`` or ``--asynch`` switches off the multiprocessing mode. If not specified, multiprocessing is used.
- ``-v`` or ``--verbose`` switches on the verbose mode, i.e., a progress bar for each row is displayed. If not specified, the verbose mode is off. 
- ``-b`` or ``--bootstrap`` switches on the bootstrap mode, i.e., the best fitting model is resampled 20 times and SPINNA is rerun on the resampled datasets. If not specified, the bootstrap mode is off.

Each row in the .csv file will specify parameters for which SPINNA is run. In the file, define the following column names (i.e., the values typed into the first row) as follows:

- *structures_filename* : Path to the file with structures saved (.yaml), see **Structures tab** above. Required unless ``le_fitting=1``, in which case the monomer/heterodimer structures are built internally from the two ``exp_data_TARGET`` columns.
- *exp_data_TARGET* : Path to the file with experimental data (.hdf5) for each molecular target species. Each target in the structures must have a corresponding column, for example, *exp_data_EGFR*.
- *le_TARGET* : Labeling efficiency (%) for each molecular target species. Ignored when ``le_fitting=1``.
- *label_unc_TARGET* : Label uncertainty (nm) for each molecular target species. When ``le_fitting=1``, this may be a comma-separated list of candidates (e.g. ``"3,4,5,6"``); a single value disables the per-target search.
- *granularity* : Granularity used in parameters search space generation. The higher the value the more combinations of structure counts will be tested.
- *save_filename* : Name of the .txt file where the results will be saved.
- *NND_bin* : Bin size (nm) for plotting the NND histogram(s).
- *NND_maxdist* : Maximum distance (nm) for plotting the NND histogram(s).
- *sim_repeats* : Number of simulation repeats.

Depending on whether a homo- or heterogeneous distribution is used, the following columns must be present:

For a homogeneous distribution:
- *area* or *volume* : Area (2D simulation) or volume (3D simulation) of the simulated ROI (um^2 or um^3). For 2D rows, *area* is optional: if omitted, the area is read from the experimental data metadata key ``Area (um^2)`` (written by Picasso when picks/areas are saved).
- *z_range* : Applicable only when *volume* is provided. Defines the range of z coordinates (nm) of simulated molecular targets.

For a heterogeneous distribution:
- *mask_filename_TARGET* : Name of the .npy file with the mask saved for each molecular target species.

Optional columns are:
- *rotation_mode* : Random rotations mode used in analysis. Values must be one of {*3D*, *2D*, *None*}. Default: *2D*.
- *nn_plotted* : Number of nearest neighbors plotted, default: 4.
- *le_fitting* : 0 if standard SPINNA is ran, 1 if labeling efficiency fitting is to be performed. When set to 1, monomer A, monomer B and heterodimer structures are built internally for each candidate ``distances`` value, label uncertainty is fit per target from the comma-separated candidates in ``label_unc_TARGET``, and the per-target LE is recovered from the fitted structure proportions. Exactly two ``exp_data_*`` columns must be present; the first maps to ``target_a``. ``-b/--bootstrap`` is ignored on LE-fitting rows. If the column is not provided, standard SPINNA is ran. For more details, see `Hellmeier, Strauss, et al. Nature Methods, 2024 <https://doi.org/10.1038/s41592-024-02242-5>`_.
- *distances* : Comma-separated list of candidate heterodimer distances in nm (e.g. ``"5,10,15,20"``). A single value fixes the distance. Required when ``le_fitting=1``; ignored otherwise.

The full column reference can also be printed from the command line via ``python -m picasso spinna --columns``.


SPINNA in Python
----------------

SPINNA functions can also be run in a Python script directly. Examples are presented in ``picasso/samples/SampleNotebook4.ipynb``.