filetypes

This section describes the different file format and name conventions used in Picasso.

Movie Files

Picasso accepts three types of raw movie files: TIFF (preferably from μManager), raw binary data (file extension “.raw”) and the Nikon format .nd2.

When loading raw binary files, the user will be prompted for movie metadata such as the number of frames, number of pixels, etc. Alternatively, this metadata can be supplied by an accompanying metadata file with the same filename as the raw binary file, but with the extension .yaml. See YAML Metadata Files for more details.

HDF5 Files

HDF5 is a generic and efficient binary file format for storing data. In Picasso, HDF5 files are used for storing tabular data of localizations with the file extension .hdf5. Furthermore, Picasso saves the statistical properties of groups of localizations in an HDF5 file.

Generally, several datasets can be stored within an HDF5 file. These datasets are accessible by specifying a path within the HDF5 file, similar to a path of an operating system. When saving localizations, Picasso stores tabular data under the path /locs. When saving statistical properties of groups of localizations, Picasso saves the table under the path /groups.

An HDF5 file can be opened with various software packages. In Picasso, we use pandas for this purpose. For example, to open localizations, pandas.read_hdf(PATH_TO_LOCALIZATIONS, key="locs") is used. The key argument can be adjusted for other datasets. The available keys can be verified using pandas.HDFStore(PATH_TO_FILE).keys().

Note: Picasso HDF5 files are accompanied by YAML metadata files which are read together using locs, info = picasso.io.load_locs. See sections “Localization HDF5 Files” and “YAML Metadata Files” below for more details on the minimum requirements to process HDF5 files in Picasso.

Importing HDF5 files in MATLAB and Origin

In MATLAB, execute the command locs = h5read(filename, dataset). Replace dataset with /locs for localization files and with /groups for pick property files.

In Origin, select File > Import > HDF5 or drag and drop the file into the main window.

Localization HDF5 Files

Picasso’s localization HDF5 files are accompanied by a YAML metadata file with the same filename, but with the extension .yaml. See YAML Metadata File for more details. locs, info = picasso.io.load_locs is used to read both the HDF5 file and the metadata. The localization table is stored as a dataset of the HDF5 file in the path /locs. This table can be explored by opening the HDF5 file with Picasso: Filter. The localization table can have an unlimited number of columns. Table 1 explains the main column names in Picasso.

Table 1: Name, description and data type for the main columns used in Picasso.

Column Name

Description

C Data Type

frame

The frame in which the localization occurred, starting with zero for the first frame.

unsigned long

x

The subpixel x coordinate in camera pixels.

float

y

The subpixel y coordinate in camera pixels.

float

photons

The total number of detected photons from this event, not including background or camera offset.

float

sx

The Point Spread Function width in camera pixels.

float

sy

The Point Spread Function height in camera pixels.

float

bg

The number of background photons per pixel, not including the camera offset.

float

lpx

The localization precision in x direction, in camera pixels, as estimated by the Cramer-Rao Lower Bound (Mortensen et al., Nat Meth, 2010 and Smith et al., Nat Meth, 2010).

float

lpy

The localization precision in y direction, in camera pixels, as estimated by the Cramer-Rao Lower Bound (Mortensen et al., Nat Meth, 2010 and Smith et al., Nat Meth, 2010).

float

net_gradient

The net gradient of this spot which is defined by the sum of gradient vector magnitudes within the fitting box, projected to the spot center.

float

z

(Optional) The z coordinate fitted in 3D in nm. Please note the units are different for x and y coordinates.

float

lpz

(Optional) The localization precision in z direction in nm.

float

d_zcalib

(Optional) The value of the D function used for z fitting with astigmatism, see the supplement to Huang et al. 2008.

float

likelihood

(Optional) The log-likelihood of the fit. Only available for MLE fitting.

float

iterations

(Optional) The number of iterations of the fit procedure. Only available for MLE fitting.

long

group

(Optional) An identifier to assign multiple localizations to groups, for example by picking regions of interest or clustering.

long

group_input

“(Optional) Assigned after clustering if the input localizations had a “group” column. This allows to trace back which input group a clustered group originated from.”

long

len

(Optional) The length of the event in frames, if localizations from consecutive frames have been linked.

long

n

(Optional) The number of localizations in this event, if localizations from consecutive frames have been linked, potentially diverging from the “len” column due to a transient dark time tolerance.

long

photon_rate

(Optional) The mean number of photons per frame, if localizations from consecutive frames have been linked. The total number of photons is set in the “photons” column.

float

x_pick_rot

(Optional) Projection of localizations onto the axis of the rectangular pick in camera pixels. Only available after saving rectangular pick(s).

float

y_pick_rot

(Optional) Projection of localizations against the axis of the rectangular pick in camera pixels. Can be used to plot profile along the pick. Only available after saving rectangular pick(s).

float

photons_unc

(Optional) The uncertainty of the photons estimation as estimated by the Cramer-Rao Lower Bound of the Maximum Likelihood fit.

float

bg_unc

(Optional) The uncertainty of the background estimation as estimated by the Cramer-Rao Lower Bound of the Maximum Likelihood fit.

float

sx_unc

(Optional) The uncertainty of the sx estimation (camera pixels) as estimated by the Cramer-Rao Lower Bound of the Maximum Likelihood fit.

float

sy_unc

(Optional) The uncertainty of the sy estimation (camera pixels) as estimated by the Cramer-Rao Lower Bound of the Maximum Likelihood fit.

float

The minimum required columns are: x, y, frame, lpx and lpy. For 3D data, the column z is also required. Since v0.9.5, Picasso supports the lpz column but it is not necessary for rendering (although recommended for accurate rendering in 3D).

Molecular maps (cluster centers) HDF5 Files

Molecular maps generated with RESI or by single-protein resolution imaging (i.e., generated by SMLM clusterer or G5M) can be opened in Picasso: Render just like localizations. The column names change slightly to reflect the different data type. Table 2 explains the main column names in molecular maps.

Table 2: Name, description and data type for the main columns used in molecular maps in Picasso.

Column Name

Description

C Data Type

frame

Mean frame of the localizations around the molecule/cluster center.

float

std_frame

St. dev. of frames of the localizations around the molecule.

float

x/y/z

Spatial coordinates of the molecule/cluster center (camera pixels).

float

std_x/std_y/std_z

St. dev. of the localizations around the molecule/cluster center in respective directions (camera pixels).

float

photons

Mean number of photons per localization around the molecule/cluster center.

float

sx

Mean Point Spread Function width/height of localizations around the molecule/cluster center (camera pixels).

float

bg

Mean background photons per pixel per localization around the molecule/cluster center.

float

lpx/lpy/lpz

Molecule’s/cluster center’s position uncertainty in respective directions (camera pixels).

float

ellipticity

Mean ellipticity of localizations around the molecule/cluster center.

float

net_gradient

Mean net gradient of localizations around the molecule/cluster center.

float

n/n_locs

Number of localizations assigned to the molecule/cluster center. “n” is the old convention for DBSCAN, HDBSCAN and SMLM clusterer.

unsigned long

n_events

Number of binding events assigned to the molecule/cluster center.

unsigned long

group

Cluster ID assigned to the molecule/cluster center.

unsigned long

group_input

(Optional) Previous group ID of the localizations around the molecule/cluster center, if they had a ‘group’ column.

unsigned long

area/volume

(Non-G5M) Area (2D) or volume (3D) of the ellipse/ellipsoid defined by the radius = 2 * std_x/y/z.

float

convexhull

(Non-G5M) Area (2D) or volume (3D) of the convex hull of the localizations assigned to the molecule/cluster center.

float

fitted_sigma

(Only G5M, 2D) Fitted sigma of the Gaussian component representing the molecule (camera pixels).

float

fitted_sigma_x/y/z

(Only G5M, 3D) Fitted sigma of the Gaussian component representing the molecule in respective directions (camera pixels).

float

rel_sigma

(Only G5M, 2D) ‘fitted_sigma’ divided by the average localization precision around the molecule.

float

rel_sigma_x/y/z

(Only G5M, 3D) ‘fitted_sigma_x/y/z’ divided by the average localization precision in respective directions around the molecule.

float

p_val

(Only G5M) P-value of the molecule being a true positive detection according to the G5M model.

float

mol_log_likelihood

(Only G5M) Log-likelihood of the molecule according to the G5M model.

float

group_log_likelihood

(Only G5M) Log-likelihood of the GMM fitted to the preclustered localizations around the molecule.

float

HDF5 Pick Property Files

When selecting File > Save pick properties in Picasso: Render, the properties of picked regions are stored in an HDF5 file. Within the HDF5 file, the data table is stored in the path /groups. Each row in the “groups” table corresponds to one picked region. For each localization property (see Table 1), two columns are generated in the groups table: the mean and standard deviation of the respective column over the localizations in a pick region. For example, if the localization table contains a column len, the “groups” table will contain a column len_mean and len_std.

Furthermore, the following columns are included:

  • group: the group identifier;

  • n_events: the number of binding events in the region;

  • n_units: the number of units from a qPAINT measurement;

  • len_cdf and dark_cdf: estimates of mean bright and dark times, respectively, obtained by fitting the distributions to the CDF of the exponential distribution. Units: frames;

  • locs: the number of localizations in the region;

  • len_mean and dark_mean: mean bright and dark times, respectively, obtained by averaging over all binding events, rather than fitting to the CDF. Units: frames;

  • len_std and dark_std: standard deviation of bright and dark times,respectively;

YAML Metadata Files

YAML files are document-oriented text files that can be opened and changed with any text editor. In Picasso, YAML files are used to store metadata of movie or localization files. Each localization HDF5 file must always be accompanied with a YAML file of the same filename, except for the extension, which is .yaml. Deleting this YAML metadata file will result in failure of the Picasso software!

The metadata file must contain the keys: Width, Height (size of the field of view in camera pixels), Frames (number of frames in the movie), and Pixelsize (effective camera pixel size after magnification in nm). Example files can be found here

Raw binary files (i.e., with extension .raw) may be accompanied by a YAML metadata file to store data about the movie dimensions, etc. While the metadata file, in this case, is not required, it reduces the effort of typing in this metadata each time the movie is loaded with Picasso: Localize. To generate such a YAML metadata file, load the raw movie into Picasso: Localize, then enter all required information in the appearing dialog. Check the checkbox Save info to yaml file and click ok. The movie will be loaded and the metadata saved in a YAML file. This file will be detected the next time this raw movie is loaded, and the metadata does not need to be entered again.