filetypes
This section describes the different file format and name conventions used in Picasso.
Movie Files
Picasso accepts three types of raw movie files: TIFF (preferably from μManager), raw binary data (file extension “.raw”) and the Nikon format .nd2.
When loading raw binary files, the user will be prompted for movie metadata such as the number of frames, number of pixels, etc. Alternatively, this metadata can be supplied by an accompanying metadata file with the same filename as the raw binary file, but with the extension .yaml. See YAML Metadata Files for more details.
HDF5 Files
HDF5 is a generic and efficient binary file format for storing data. In Picasso, HDF5 files are used for storing tabular data of localizations with the file extension .hdf5. Furthermore, Picasso saves the statistical properties of groups of localizations in an HDF5 file.
Generally, several datasets can be stored within an HDF5 file. These datasets are accessible by specifying a path within the HDF5 file, similar to a path of an operating system. When saving localizations, Picasso stores tabular data under the path /locs. When saving statistical properties of groups of localizations, Picasso saves the table under the path /groups.
An HDF5 file can be opened with various software packages. In Picasso, we use pandas for this purpose. For example, to open localizations, pandas.read_hdf(PATH_TO_LOCALIZATIONS, key="locs") is used. The key argument can be adjusted for other datasets. The available keys can be verified using pandas.HDFStore(PATH_TO_FILE).keys().
Note: Picasso HDF5 files are accompanied by YAML metadata files which are read together using locs, info = picasso.io.load_locs. See sections “Localization HDF5 Files” and “YAML Metadata Files” below for more details on the minimum requirements to process HDF5 files in Picasso.
Importing HDF5 files in MATLAB and Origin
In MATLAB, execute the command locs = h5read(filename, dataset). Replace dataset with /locs for localization files and with /groups for pick property files.
In Origin, select File > Import > HDF5 or drag and drop the file into the main window.
Localization HDF5 Files
Picasso’s localization HDF5 files are accompanied by a YAML metadata file with the same filename, but with the extension .yaml. See YAML Metadata File for more details. locs, info = picasso.io.load_locs is used to read both the HDF5 file and the metadata. The localization table is stored as a dataset of the HDF5 file in the path /locs. This table can be explored by opening the HDF5 file with Picasso: Filter. The localization table can have an unlimited number of columns. Table 1 explains the main column names in Picasso.
Column Name |
Description |
C Data Type |
|---|---|---|
frame |
The frame in which the localization occurred, starting with zero for the first frame. |
unsigned long |
x |
The subpixel x coordinate in camera pixels. |
float |
y |
The subpixel y coordinate in camera pixels. |
float |
photons |
The total number of detected photons from this event, not including background or camera offset. |
float |
sx |
The Point Spread Function width in camera pixels. |
float |
sy |
The Point Spread Function height in camera pixels. |
float |
bg |
The number of background photons per pixel, not including the camera offset. |
float |
lpx |
The localization precision in x direction, in camera pixels, as estimated by the Cramer-Rao Lower Bound (Mortensen et al., Nat Meth, 2010 and Smith et al., Nat Meth, 2010). |
float |
lpy |
The localization precision in y direction, in camera pixels, as estimated by the Cramer-Rao Lower Bound (Mortensen et al., Nat Meth, 2010 and Smith et al., Nat Meth, 2010). |
float |
net_gradient |
The net gradient of this spot which is defined by the sum of gradient vector magnitudes within the fitting box, projected to the spot center. |
float |
z |
(Optional) The z coordinate fitted in 3D in nm. Please note the units are different for x and y coordinates. |
float |
lpz |
(Optional) The localization precision in z direction in nm. |
float |
d_zcalib |
(Optional) The value of the D function used for z fitting with astigmatism, see the supplement to Huang et al. 2008. |
float |
likelihood |
(Optional) The log-likelihood of the fit. Only available for MLE fitting. |
float |
iterations |
(Optional) The number of iterations of the fit procedure. Only available for MLE fitting. |
long |
group |
(Optional) An identifier to assign multiple localizations to groups, for example by picking regions of interest or clustering. |
long |
group_input |
|
long |
len |
(Optional) The length of the event in frames, if localizations from consecutive frames have been linked. |
long |
n |
(Optional) The number of localizations in this event, if localizations from consecutive frames have been linked, potentially diverging from the “len” column due to a transient dark time tolerance. |
long |
photon_rate |
(Optional) The mean number of photons per frame, if localizations from consecutive frames have been linked. The total number of photons is set in the “photons” column. |
float |
x_pick_rot |
(Optional) Projection of localizations onto the axis of the rectangular pick in camera pixels. Only available after saving rectangular pick(s). |
float |
y_pick_rot |
(Optional) Projection of localizations against the axis of the rectangular pick in camera pixels. Can be used to plot profile along the pick. Only available after saving rectangular pick(s). |
float |
photons_unc |
(Optional) The uncertainty of the photons estimation as estimated by the Cramer-Rao Lower Bound of the Maximum Likelihood fit. |
float |
bg_unc |
(Optional) The uncertainty of the background estimation as estimated by the Cramer-Rao Lower Bound of the Maximum Likelihood fit. |
float |
sx_unc |
(Optional) The uncertainty of the sx estimation (camera pixels) as estimated by the Cramer-Rao Lower Bound of the Maximum Likelihood fit. |
float |
sy_unc |
(Optional) The uncertainty of the sy estimation (camera pixels) as estimated by the Cramer-Rao Lower Bound of the Maximum Likelihood fit. |
float |
The minimum required columns are: x, y, frame, lpx and lpy. For 3D data, the column z is also required. Since v0.9.5, Picasso supports the lpz column but it is not necessary for rendering (although recommended for accurate rendering in 3D).
Molecular maps (cluster centers) HDF5 Files
Molecular maps generated with RESI or by single-protein resolution imaging (i.e., generated by SMLM clusterer or G5M) can be opened in Picasso: Render just like localizations. The column names change slightly to reflect the different data type. Table 2 explains the main column names in molecular maps.
Column Name |
Description |
C Data Type |
|---|---|---|
frame |
Mean frame of the localizations around the molecule/cluster center. |
float |
std_frame |
St. dev. of frames of the localizations around the molecule. |
float |
x/y/z |
Spatial coordinates of the molecule/cluster center (camera pixels). |
float |
std_x/std_y/std_z |
St. dev. of the localizations around the molecule/cluster center in respective directions (camera pixels). |
float |
photons |
Mean number of photons per localization around the molecule/cluster center. |
float |
sx |
Mean Point Spread Function width/height of localizations around the molecule/cluster center (camera pixels). |
float |
bg |
Mean background photons per pixel per localization around the molecule/cluster center. |
float |
lpx/lpy/lpz |
Molecule’s/cluster center’s position uncertainty in respective directions (camera pixels). |
float |
ellipticity |
Mean ellipticity of localizations around the molecule/cluster center. |
float |
net_gradient |
Mean net gradient of localizations around the molecule/cluster center. |
float |
n/n_locs |
Number of localizations assigned to the molecule/cluster center. “n” is the old convention for DBSCAN, HDBSCAN and SMLM clusterer. |
unsigned long |
n_events |
Number of binding events assigned to the molecule/cluster center. |
unsigned long |
group |
Cluster ID assigned to the molecule/cluster center. |
unsigned long |
group_input |
(Optional) Previous group ID of the localizations around the molecule/cluster center, if they had a ‘group’ column. |
unsigned long |
area/volume |
(Non-G5M) Area (2D) or volume (3D) of the ellipse/ellipsoid defined by the radius = 2 * std_x/y/z. |
float |
convexhull |
(Non-G5M) Area (2D) or volume (3D) of the convex hull of the localizations assigned to the molecule/cluster center. |
float |
fitted_sigma |
(Only G5M, 2D) Fitted sigma of the Gaussian component representing the molecule (camera pixels). |
float |
fitted_sigma_x/y/z |
(Only G5M, 3D) Fitted sigma of the Gaussian component representing the molecule in respective directions (camera pixels). |
float |
rel_sigma |
(Only G5M, 2D) ‘fitted_sigma’ divided by the average localization precision around the molecule. |
float |
rel_sigma_x/y/z |
(Only G5M, 3D) ‘fitted_sigma_x/y/z’ divided by the average localization precision in respective directions around the molecule. |
float |
p_val |
(Only G5M) P-value of the molecule being a true positive detection according to the G5M model. |
float |
mol_log_likelihood |
(Only G5M) Log-likelihood of the molecule according to the G5M model. |
float |
group_log_likelihood |
(Only G5M) Log-likelihood of the GMM fitted to the preclustered localizations around the molecule. |
float |
HDF5 Pick Property Files
When selecting File > Save pick properties in Picasso: Render, the properties of picked regions are stored in an HDF5 file. Within the HDF5 file, the data table is stored in the path /groups.
Each row in the “groups” table corresponds to one picked region. For each localization property (see Table 1), two columns are generated in the groups table: the mean and standard deviation of the respective column over the localizations in a pick region. For example, if the localization table contains a column len, the “groups” table will contain a column len_mean and len_std.
Furthermore, the following columns are included:
group: the group identifier;n_events: the number of binding events in the region;n_units: the number of units from a qPAINT measurement;len_cdfanddark_cdf: estimates of mean bright and dark times, respectively, obtained by fitting the distributions to the CDF of the exponential distribution. Units: frames;locs: the number of localizations in the region;len_meananddark_mean: mean bright and dark times, respectively, obtained by averaging over all binding events, rather than fitting to the CDF. Units: frames;len_stdanddark_std: standard deviation of bright and dark times,respectively;
YAML Metadata Files
YAML files are document-oriented text files that can be opened and changed with any text editor. In Picasso, YAML files are used to store metadata of movie or localization files.
Each localization HDF5 file must always be accompanied with a YAML file of the same filename, except for the extension, which is .yaml. Deleting this YAML metadata file will result in failure of the Picasso software!
The metadata file must contain the keys: Width, Height (size of the field of view in camera pixels), Frames (number of frames in the movie), and Pixelsize (effective camera pixel size after magnification in nm). Example files can be found here
Raw binary files (i.e., with extension .raw) may be accompanied by a YAML metadata file to store data about the movie dimensions, etc. While the metadata file, in this case, is not required, it reduces the effort of typing in this metadata each time the movie is loaded with Picasso: Localize. To generate such a YAML metadata file, load the raw movie into Picasso: Localize, then enter all required information in the appearing dialog. Check the checkbox Save info to yaml file and click ok. The movie will be loaded and the metadata saved in a YAML file. This file will be detected the next time this raw movie is loaded, and the metadata does not need to be entered again.