Detailed API documentation

This page documents the different modules and functions in the nestcheck package.

data_processing

Module containing functions for loading and processing output files produced by nested sampling software.

Background: threads

nestcheck’s error estimates and diagnostics rely on the decomposition of a nested sampling run into multiple runs, each with a single live point. We refer to these constituent single live point runs as threads. See “Sampling Errors In Nested Sampling Parameter Estimation” (Higson et al. 2018) for a detailed discussion, including an algorithm for dividing nested sampling runs into their constituent threads.

Nested sampling run format

nestcheck stores nested sampling runs in a standard format as python dictionaries. For a run with \(n_\mathrm{samp}\) samples, the keys are:

logl: 1d numpy array

Loglikelihood values (floats) for each sample. Shape is (\(n_\mathrm{samp}\),).

thread_labels: 1d numpy array

Integer label for each point representing which thread each point belongs to. Shape is (\(n_\mathrm{samp}\),). For some thread label k, the thread’s start (birth) log-likelihood and end log-likelihood are given by thread_min_max[k, :].

thread_min_max: 2d numpy array

Shape is (\(n_\mathrm{threads}\), 2). Each row with index k contains the logl from within which the first point in the thread with label k was sampled (the “birth contour”) and the logl of the final point in the thread. The birth contour is -inf if the thread began by sampling from the whole prior.

theta: 2d numpy array

Parameter values for samples - each row represents a sample. Shape is (\(n_\mathrm{samp}\), d) where d is number of dimensions.

nlive_array: 1d numpy array

Number of live points present between the previous point and this point.

output: dict (optional)

Dict containing extra information about the run.

Samples are arranged in ascending order of logl.

Processing nested sampling software output

To process output files for a nested sampling run into the format described above, the following information is required:

  • Samples’ loglikelihood values;

  • Samples’ parameter values;

  • Information allowing decomposition into threads and identifying each thread’s birth contour (starting logl).

The first two items are self-explanatory, but the latter is more challenging as it can take different formats and may not be provided by all nested sampling software packages.

Sufficient information for thread decomposition and calculating the number of live points (including for dynamic nested sampling) is provided by a list of the loglikelihoods from within which each point was sampled (the points’ birth contours). This is output by PolyChord >= v1.13 and MultiNest >= v3.11, and is used in the output processing for these packages via the birth_inds_given_contours and threads_given_birth_inds functions. Also sufficient is a list of the indexes of the point which was removed at the step when each point was sampled (“birth indexes”), as this can be mapped to the birth contours and vice versa.

process_dynesty_run does not require the birth_inds_given_contours and threads_given_birth_inds functions as dynesty results objects already include thread labels via their samples_id property. If the dynesty run is dynamic, the batch_bounds property is need to determine the threads’ starting birth contours.

Adding a new processing function for another nested sampling package

You can add new functions to process output from other nested sampling software, provided the output files include the required information for decomposition into threads. Depending on how this information is provided you may be able to adapt process_polychord_run or process_dynesty_run. If thread decomposition information if provided in a different format, you will have to write your own helper functions to process the output into the nestcheck dictionary format described above.

nestcheck.data_processing.batch_process_data(file_roots, **kwargs)[source]

Process output from many nested sampling runs in parallel with optional error handling and caching.

The result can be cached using the ‘save_name’, ‘save’ and ‘load’ kwargs (by default this is not done). See save_load_result docstring for more details.

Remaining kwargs passed to parallel_utils.parallel_apply (see its docstring for more details).

Parameters:
file_roots: list of strs

file_roots for the runs to load.

base_dir: str, optional

path to directory containing files.

process_func: function, optional

function to use to process the data.

func_kwargs: dict, optional

additional keyword arguments for process_func.

errors_to_handle: error or tuple of errors, optional

which errors to catch when they occur in processing rather than raising.

save_name: str or None, optional

See nestcheck.io_utils.save_load_result.

save: bool, optional

See nestcheck.io_utils.save_load_result.

load: bool, optional

See nestcheck.io_utils.save_load_result.

overwrite_existing: bool, optional

See nestcheck.io_utils.save_load_result.

Returns:
list of ns_run dicts

List of nested sampling runs in dict format (see the module docstring for more details).

nestcheck.data_processing.birth_inds_given_contours(birth_logl_arr, logl_arr, **kwargs)[source]

Maps the iso-likelihood contours on which points were born to the index of the dead point on this contour.

MultiNest and PolyChord use different values to identify the inital live points which were sampled from the whole prior (PolyChord uses -1e+30 and MultiNest -0.179769313486231571E+309). However in each case the first dead point must have been sampled from the whole prior, so for either package we can use

init_birth = birth_logl_arr[0]

If there are many points with the same logl_arr and dup_assert is False, these points are randomly assigned an order (to ensure results are consistent, random seeding is used).

Parameters:
logl_arr: 1d numpy array

logl values of each point.

birth_logl_arr: 1d numpy array

Birth contours - i.e. logl values of the iso-likelihood contour from within each point was sampled (on which it was born).

dup_assert: bool, optional

See ns_run_utils.check_ns_run_logls docstring.

dup_warn: bool, optional

See ns_run_utils.check_ns_run_logls docstring.

Returns:
birth_inds: 1d numpy array of ints

Step at which each element of logl_arr was sampled. Points sampled from the whole prior are assigned value -1.

nestcheck.data_processing.process_dynesty_run(results)[source]

Transforms results from a dynesty run into the nestcheck dictionary format for analysis. This function has been tested with dynesty v9.2.0.

Note that the nestcheck point weights and evidence will not be exactly the same as the dynesty ones as nestcheck calculates logX volumes more precisely (using the trapezium rule).

This function does not require the birth_inds_given_contours and threads_given_birth_inds functions as dynesty results objects already include thread labels via their samples_id property. If the dynesty run is dynamic, the batch_bounds property is need to determine the threads’ starting birth contours.

Parameters:
results: dynesty results object

N.B. the remaining live points at termination must be included in the results (dynesty samplers’ run_nested method does this if add_live_points=True - its default value).

Returns:
ns_run: dict

Nested sampling run dict (see the module docstring for more details).

nestcheck.data_processing.process_error_helper(root, base_dir, process_func, errors_to_handle=(), **func_kwargs)[source]

Wrapper which applies process_func and handles some common errors so one bad run does not spoil the whole batch.

Useful errors to handle include:

OSError: if you are not sure if all the files exist AssertionError: if some of the many assertions fail for known reasons; for example is there are occasional problems decomposing runs into threads due to limited numerical precision in logls.

Parameters:
root: str

File root.

base_dir: str

Directory containing file.

process_func: func

Function for processing file.

errors_to_handle: error type or tuple of error types

Errors to catch without throwing an exception.

func_kwargs: dict

Kwargs to pass to process_func.

Returns:
run: dict

Nested sampling run dict (see the module docstring for more details) or, if an error occured, a dict containing its type and the file root.

nestcheck.data_processing.process_multinest_run(file_root, base_dir, **kwargs)[source]

Loads data from a MultiNest run into the nestcheck dictionary format for analysis.

N.B. producing required output file containing information about the iso-likelihood contours within which points were sampled (where they were “born”) requies MultiNest version 3.11 or later.

Parameters:
file_root: str

Root name for output files. When running MultiNest, this is determined by the nest_root parameter.

base_dir: str

Directory containing output files. When running MultiNest, this is determined by the nest_root parameter.

kwargs: dict, optional

Passed to ns_run_utils.check_ns_run (via process_samples_array)

Returns:
ns_run: dict

Nested sampling run dict (see the module docstring for more details).

nestcheck.data_processing.process_polychord_run(file_root, base_dir, process_stats_file=True, **kwargs)[source]

Loads data from a PolyChord run into the nestcheck dictionary format for analysis.

N.B. producing required output file containing information about the iso-likelihood contours within which points were sampled (where they were “born”) requies PolyChord version v1.13 or later and the setting write_dead=True.

Parameters:
file_root: str

Root for run output file names (PolyChord file_root setting).

base_dir: str

Directory containing data (PolyChord base_dir setting).

process_stats_file: bool, optional

Should PolyChord’s <root>.stats file be processed? Set to False if you don’t have the <root>.stats file (such as if PolyChord was run with write_stats=False).

kwargs: dict, optional

Options passed to ns_run_utils.check_ns_run.

Returns:
ns_run: dict

Nested sampling run dict (see the module docstring for more details).

nestcheck.data_processing.process_polychord_stats(file_root, base_dir)[source]

Reads a PolyChord <root>.stats output file and returns the information contained in a dictionary.

Parameters:
file_root: str

Root for run output file names (PolyChord file_root setting).

base_dir: str

Directory containing data (PolyChord base_dir setting).

Returns:
output: dict

See PolyChord documentation for more details.

nestcheck.data_processing.process_samples_array(samples, **kwargs)[source]

Convert an array of nested sampling dead and live points of the type produced by PolyChord and MultiNest into a nestcheck nested sampling run dictionary.

Parameters:
samples: 2d numpy array

Array of dead points and any remaining live points at termination. Has #parameters + 2 columns: param_1, param_2, … , logl, birth_logl

kwargs: dict, optional

Options passed to birth_inds_given_contours

Returns:
ns_run: dict

Nested sampling run dict (see the module docstring for more details). Only contains information in samples (not additional optional output key).

nestcheck.data_processing.sample_less_than_condition(choices_in, condition)[source]

Creates a random sample from choices without replacement, subject to the condition that each element of the output is greater than the corresponding element of the condition array.

condition should be in ascending order.

nestcheck.data_processing.threads_given_birth_inds(birth_inds)[source]

Divides a nested sampling run into threads, using info on the indexes at which points were sampled. See “Sampling errors in nested sampling parameter estimation” (Higson et al. 2018) for more information.

Parameters:
birth_inds: 1d numpy array

Indexes of the iso-likelihood contours from within which each point was sampled (“born”).

Returns:
thread_labels: 1d numpy array of ints

labels of the thread each point belongs to.

ns_run_utils

Functions for performing basic operations on nested sampling runs; such as working out point weights and splitting and combining runs.

Nested sampling runs are stored in a standard format as python dictionaries (see the data_processing module docstring for more details).

nestcheck.ns_run_utils.array_given_run(ns_run)[source]

Converts information on samples in a nested sampling run dictionary into a numpy array representation. This allows fast addition of more samples and recalculation of nlive.

Parameters:
ns_run: dict

Nested sampling run dict (see data_processing module docstring for more details).

Returns:
samples: 2d numpy array

Array containing columns [logl, thread label, change in nlive at sample, (thetas)] with each row representing a single sample.

nestcheck.ns_run_utils.check_ns_run(run, dup_assert=False, dup_warn=False)[source]

Checks a nestcheck format nested sampling run dictionary has the expected properties (see the data_processing module docstring for more details).

Parameters:
run: dict

nested sampling run to check.

dup_assert: bool, optional

See check_ns_run_logls docstring.

dup_warn: bool, optional

See check_ns_run_logls docstring.

Raises:
AssertionError

if run does not have expected properties.

nestcheck.ns_run_utils.check_ns_run_logls(run, dup_assert=False, dup_warn=False)[source]

Check run logls are unique and in the correct order.

Parameters:
run: dict

nested sampling run to check.

dup_assert: bool, optional

Whether to raise and AssertionError if there are duplicate logl values.

dup_warn: bool, optional

Whether to give a UserWarning if there are duplicate logl values (only used if dup_assert is False).

Raises:
AssertionError

if run does not have expected properties.

nestcheck.ns_run_utils.check_ns_run_members(run)[source]

Check nested sampling run member keys and values.

Parameters:
run: dict

nested sampling run to check.

Raises:
AssertionError

if run does not have expected properties.

nestcheck.ns_run_utils.check_ns_run_threads(run)[source]

Check thread labels and thread_min_max have expected properties.

Parameters:
run: dict

Nested sampling run to check.

Raises:
AssertionError

If run does not have expected properties.

nestcheck.ns_run_utils.combine_ns_runs(run_list_in, **kwargs)[source]

Combine a list of complete nested sampling run dictionaries into a single ns run.

Input runs must contain any repeated threads.

Parameters:
run_list_in: list of dicts

List of nested sampling runs in dict format (see data_processing module docstring for more details).

kwargs: dict, optional

Options for check_ns_run.

Returns:
run: dict

Nested sampling run dict (see data_processing module docstring for more details).

nestcheck.ns_run_utils.combine_threads(threads, assert_birth_point=False)[source]

Combine list of threads into a single ns run. This is different to combining runs as repeated threads are allowed, and as some threads can start from log-likelihood contours on which no dead point in the run is present.

Note that if all the thread labels are not unique and in ascending order, the output will fail check_ns_run. However provided the thread labels are not used it will work ok for calculations based on nlive, logl and theta.

Parameters:
threads: list of dicts

List of nested sampling run dicts, each representing a single thread.

assert_birth_point: bool, optional

Whether or not to assert there is exactly one point present in the run with the log-likelihood at which each point was born. This is not true for bootstrap resamples of runs, where birth points may be repeated or not present at all.

Returns:
run: dict

Nested sampling run dict (see data_processing module docstring for more details).

nestcheck.ns_run_utils.dict_given_run_array(samples, thread_min_max)[source]

Converts an array of information about samples back into a nested sampling run dictionary (see data_processing module docstring for more details).

N.B. the output dict only contains the following keys: ‘logl’, ‘thread_label’, ‘nlive_array’, ‘theta’. Any other keys giving additional information about the run output cannot be reproduced from the function arguments, and are therefore ommitted.

Parameters:
samples: numpy array

Numpy array containing columns [logl, thread label, change in nlive at sample, (thetas)] with each row representing a single sample.

thread_min_max’: numpy array, optional

2d array with a row for each thread containing the likelihoods at which it begins and ends. Needed to calculate nlive_array (otherwise this is set to None).

Returns:
ns_run: dict

Nested sampling run dict (see data_processing module docstring for more details).

nestcheck.ns_run_utils.get_logw(ns_run, simulate=False)[source]

Calculates the log posterior weights of the samples (using logarithms to avoid overflow errors with very large or small values).

Uses the trapezium rule such that the weight of point i is

\[w_i = \mathcal{L}_i (X_{i-1} - X_{i+1}) / 2\]
Parameters:
ns_run: dict

Nested sampling run dict (see data_processing module docstring for more details).

simulate: bool, optional

Should log prior volumes logx be simulated from their distribution (if false their expected values are used).

Returns:
logw: 1d numpy array

Log posterior masses of points.

nestcheck.ns_run_utils.get_logx(nlive, simulate=False)[source]

Returns a logx vector showing the expected or simulated logx positions of points.

The shrinkage factor between two points

\[t_i = X_{i-1} / X_{i}\]

is distributed as the largest of \(n_i\) uniform random variables between 1 and 0, where \(n_i\) is the local number of live points.

We are interested in

\[\log(t_i) = \log X_{i-1} - \log X_{i}\]

which has expected value \(-1/n_i\).

Parameters:
nlive_array: 1d numpy array

Ordered local number of live points present at each point’s iso-likelihood contour.

simulate: bool, optional

Should log prior volumes logx be simulated from their distribution (if False their expected values are used).

Returns:
logx: 1d numpy array

log X values for points.

nestcheck.ns_run_utils.get_run_threads(ns_run)[source]

Get the individual threads from a nested sampling run.

Parameters:
ns_run: dict

Nested sampling run dict (see data_processing module docstring for more details).

Returns:
threads: list of numpy array

Each thread (list element) is a samples array containing columns [logl, thread label, change in nlive at sample, (thetas)] with each row representing a single sample.

nestcheck.ns_run_utils.get_w_rel(ns_run, simulate=False)[source]

Get the relative posterior weights of the samples, normalised so the maximum sample weight is 1. This is calculated from get_logw with protection against numerical overflows.

Parameters:
ns_run: dict

Nested sampling run dict (see data_processing module docstring for more details).

simulate: bool, optional

See the get_logw docstring for more details.

Returns:
w_rel: 1d numpy array

Relative posterior masses of points.

nestcheck.ns_run_utils.log_subtract(loga, logb)[source]

Numerically stable method for avoiding overflow errors when calculating \(\log (a-b)\), given \(\log (a)\), \(\log (a)\) and that \(a > b\).

See https://hips.seas.harvard.edu/blog/2013/01/09/computing-log-sum-exp/ for more details.

Parameters:
loga: float
logb: float

Must be less than loga.

Returns:
log(a - b): float
nestcheck.ns_run_utils.run_estimators(ns_run, estimator_list, simulate=False)[source]

Calculates values of list of quantities (such as the Bayesian evidence or mean of parameters) for a single nested sampling run.

Parameters:
ns_run: dict

Nested sampling run dict (see data_processing module docstring for more details).

estimator_list: list of functions for estimating quantities from nested

sampling runs. Example functions can be found in estimators.py. Each should have arguments: func(ns_run, logw=None).

simulate: bool, optional

See get_logw docstring.

Returns:
output: 1d numpy array

Calculation result for each estimator in estimator_list.

error_analysis

Sampling error estimates and diagnostic tests for nested sampling runs.

nestcheck.error_analysis.bootstrap_resample_run(ns_run, threads=None, ninit_sep=False, random_seed=False)[source]

Bootstrap resamples threads of nested sampling run, returning a new (resampled) nested sampling run.

Get the individual threads for a nested sampling run.

Parameters:
ns_run: dict

Nested sampling run dictionary.

threads: None or list of numpy arrays, optional
ninit_sep: bool

For dynamic runs: resample initial threads and dynamically added threads separately. Useful when there are only a few threads which start by sampling the whole prior, as errors occur if none of these are included in the bootstrap resample.

random_seed: None, bool or int, optional

Set numpy random seed. Default is to use None (so a random seed is chosen from the computer’s internal state) to ensure reliable results when multiprocessing. Can set to an integer or to False to not edit the seed.

Returns:
ns_run_temp: dict

Nested sampling run dictionary.

nestcheck.error_analysis.implementation_std(vals_std, vals_std_u, bs_std, bs_std_u, **kwargs)[source]

Estimates varaition of results due to implementation-specific effects. See ‘nestcheck: diagnostic tests for nested sampling calculations’ (Higson et al. 2019) for more details.

Uncertainties on the output are calculated numerically using the fact that (from central limit theorem) our uncertainties on vals_std and bs_std are (approximately) normally distributed. This is needed as results from standard error propagation techniques are not valid when the uncertainties are not small compared to the result.

Parameters:
vals_std: numpy array

Standard deviations of results from repeated calculations.

vals_std_u: numpy array

\(1\sigma\) uncertainties on vals_std_u.

bs_std: numpy array

Bootstrap error estimates. Each element should correspond to the same element in vals_std.

bs_std_u: numpy array

\(1\sigma\) uncertainties on vals_std_u.

nsim: int, optional

Number of simulations to use to numerically calculate the uncertainties on the estimated implementation-specific effects.

random_seed: int or None, optional

Numpy random seed. Use to get reproducible uncertainties on the output.

Returns:
imp_std: numpy array

Estimated standard deviation of results due to implementation-specific effects.

imp_std_u: numpy array

\(1\sigma\) uncertainties on imp_std.

imp_frac: numpy array

imp_std as a fraction of vals_std.

imp_frac_u:

\(1\sigma\) uncertainties on imp_frac.

nestcheck.error_analysis.pairwise_distances(dist_list, earth_mover_dist=True, energy_dist=True)[source]

Applies statistical_distances to each unique pair of distribution samples in dist_list.

Parameters:
dist_list: list of 1d arrays
earth_mover_dist: bool, optional

Passed to statistical_distances.

energy_dist: bool, optional

Passed to statistical_distances.

Returns:
ser: pandas Series object

Values are statistical distances. Index levels are: calculation type: name of statistical distance. run: tuple containing the index in dist_list of the pair of samples arrays from which the statistical distance was computed.

nestcheck.error_analysis.run_bootstrap_values(ns_run, estimator_list, **kwargs)[source]

Uses bootstrap resampling to calculate an estimate of the standard deviation of the distribution of sampling errors (the uncertainty on the calculation) for a single nested sampling run.

For more details about bootstrap resampling for estimating sampling errors see ‘Sampling errors in nested sampling parameter estimation’ (Higson et al. 2018).

Parameters:
ns_run: dict

Nested sampling run dictionary.

estimator_list: list of functions for estimating quantities (such as the

Bayesian evidence or mean of parameters) from nested sampling runs. Example functions can be found in estimators.py. Each should have arguments: func(ns_run, logw=None)

n_simulate: int
ninit_sep: bool, optional

For dynamic runs: resample initial threads and dynamically added threads separately. Useful when there are only a few threads which start by sampling the whole prior, as errors occur if none of these are included in the bootstrap resample.

flip_skew: bool, optional

Determine if distribution of bootstrap values should be flipped about its mean to better represent our probability distribution on the true value - see “Bayesian astrostatistics: a backward look to the future” (Loredo, 2012 Figure 2) for an explanation. If true, the samples \(X\) are mapped to \(2 \mu - X\), where \(\mu\) is the mean sample value. This leaves the mean and standard deviation unchanged.

random_seeds: list, optional

list of random_seed arguments for bootstrap_resample_run. Defaults to range(n_simulate) in order to give reproducible results.

Returns:
output: 1d numpy array

Sampling error on calculation result for each estimator in estimator_list.

nestcheck.error_analysis.run_ci_bootstrap(ns_run, estimator_list, **kwargs)[source]

Uses bootstrap resampling to calculate credible intervals on the distribution of sampling errors (the uncertainty on the calculation) for a single nested sampling run.

For more details about bootstrap resampling for estimating sampling errors see ‘Sampling errors in nested sampling parameter estimation’ (Higson et al. 2018).

Parameters:
ns_run: dict

Nested sampling run dictionary.

estimator_list: list of functions for estimating quantities (such as the

Bayesian evidence or mean of parameters) from nested sampling runs. Example functions can be found in estimators.py. Each should have arguments: func(ns_run, logw=None)

cred_int: float
n_simulate: int
ninit_sep: bool, optional
Returns:
output: 1d numpy array

Credible interval on sampling error on calculation result for each estimator in estimator_list.

nestcheck.error_analysis.run_std_bootstrap(ns_run, estimator_list, **kwargs)[source]

Uses bootstrap resampling to calculate an estimate of the standard deviation of the distribution of sampling errors (the uncertainty on the calculation) for a single nested sampling run.

For more details about bootstrap resampling for estimating sampling errors see ‘Sampling errors in nested sampling parameter estimation’ (Higson et al. 2018).

Parameters:
ns_run: dict

Nested sampling run dictionary.

estimator_list: list of functions for estimating quantities (such as the

Bayesian evidence or mean of parameters) from nested sampling runs. Example functions can be found in estimators.py. Each should have arguments: func(ns_run, logw=None)

kwargs: dict

kwargs for run_bootstrap_values

Returns:
output: 1d numpy array

Sampling error on calculation result for each estimator in estimator_list.

nestcheck.error_analysis.run_std_simulate(ns_run, estimator_list, n_simulate=None)[source]

Uses the ‘simulated weights’ method to calculate an estimate of the standard deviation of the distribution of sampling errors (the uncertainty on the calculation) for a single nested sampling run.

Note that the simulated weights method is not accurate for parameter estimation calculations.

For more details about the simulated weights method for estimating sampling errors see ‘Sampling errors in nested sampling parameter estimation’ (Higson et al. 2018).

Parameters:
ns_run: dict

Nested sampling run dictionary.

estimator_list: list of functions for estimating quantities (such as the

bayesian evidence or mean of parameters) from nested sampling runs. Example functions can be found in estimators.py. Each should have arguments: func(ns_run, logw=None)

n_simulate: int
Returns:
output: 1d numpy array

Sampling error on calculation result for each estimator in estimator_list.

nestcheck.error_analysis.run_thread_values(run, estimator_list)[source]

Helper function for parallelising thread_values_df.

Parameters:
ns_run: dict

Nested sampling run dictionary.

estimator_list: list of functions
Returns:
vals_array: numpy array

Array of estimator values for each thread. Has shape (len(estimator_list), len(theads)).

nestcheck.error_analysis.statistical_distances(samples1, samples2, earth_mover_dist=True, energy_dist=True)[source]

Compute measures of the statistical distance between samples.

Parameters:
samples1: 1d array
samples2: 1d array
earth_mover_dist: bool, optional

Whether or not to compute the Earth mover’s distance between the samples.

energy_dist: bool, optional

Whether or not to compute the energy distance between the samples.

Returns:
1d array

diagnostics_tables

High-level functions for calculating results of error analysis and diagnostic tests for batches of nested sampling runs.

nestcheck.diagnostics_tables.bs_values_df(run_list, estimator_list, estimator_names, n_simulate, **kwargs)[source]

Computes a data frame of bootstrap resampled values.

Parameters:
run_list: list of dicts

List of nested sampling run dicts.

estimator_list: list of functions

Estimators to apply to runs.

estimator_names: list of strs

Name of each func in estimator_list.

n_simulate: int

Number of bootstrap replications to use on each run.

kwargs:

Kwargs to pass to parallel_apply.

Returns:
bs_values_df: pandas data frame

Columns represent estimators and rows represent runs. Each cell contains a 1d array of bootstrap resampled values for the run and estimator.

nestcheck.diagnostics_tables.error_values_summary(error_values, **summary_df_kwargs)[source]

Get summary statistics about calculation errors, including estimated implementation errors.

Parameters:
error_values: pandas DataFrame

Of format output by run_list_error_values (look at it for more details).

summary_df_kwargs: dict, optional

See pandas_functions.summary_df docstring for more details.

Returns:
df: pandas DataFrame

Table showing means and standard deviations of results and diagnostics for the different runs. Also contains estimated numerical uncertainties on results.

nestcheck.diagnostics_tables.estimator_values_df(run_list, estimator_list, **kwargs)[source]

Get a dataframe of estimator values.

NB when parallelised the results will not be produced in order (so results from some run number will not nessesarily correspond to that number run in run_list).

Parameters:
run_list: list of dicts

List of nested sampling run dicts.

estimator_list: list of functions

Estimators to apply to runs.

estimator_names: list of strs, optional

Name of each func in estimator_list.

parallel: bool, optional

Whether or not to parallelise - see parallel_utils.parallel_apply.

save_name: str or None, optional

See nestcheck.io_utils.save_load_result.

save: bool, optional

See nestcheck.io_utils.save_load_result.

load: bool, optional

See nestcheck.io_utils.save_load_result.

overwrite_existing: bool, optional

See nestcheck.io_utils.save_load_result.

Returns:
df: pandas DataFrame

Results table showing calculation values and diagnostics. Rows show different runs. Columns have titles given by estimator_names and show results for the different functions in estimators_list.

nestcheck.diagnostics_tables.pairwise_dists_on_cols(df_in, earth_mover_dist=True, energy_dist=True)[source]

Computes pairwise statistical distance measures.

Parameters:
df_in: pandas data frame

Columns represent estimators and rows represent runs. Each data frane element is an array of values which are used as samples in the distance measures.

earth_mover_dist: bool, optional

Passed to error_analysis.pairwise_distances.

energy_dist: bool, optional

Passed to error_analysis.pairwise_distances.

Returns:
df: pandas data frame with kl values for each pair.
nestcheck.diagnostics_tables.run_list_error_summary(run_list, estimator_list, estimator_names, n_simulate, **kwargs)[source]

Wrapper which runs run_list_error_values then applies error_values summary to the resulting dataframe. See the docstrings for those two funcions for more details and for descriptions of parameters and output.

nestcheck.diagnostics_tables.run_list_error_values(run_list, estimator_list, estimator_names, n_simulate=100, **kwargs)[source]

Gets a data frame with calculation values and error diagnostics for each run in the input run list.

NB when parallelised the results will not be produced in order (so results from some run number will not nessesarily correspond to that number run in run_list).

Parameters:
run_list: list of dicts

List of nested sampling run dicts.

estimator_list: list of functions

Estimators to apply to runs.

estimator_names: list of strs

Name of each func in estimator_list.

n_simulate: int, optional

Number of bootstrap replications to use on each run.

thread_pvalue: bool, optional

Whether or not to compute KS test diaganostic for correlations between threads within a run.

bs_stat_dist: bool, optional

Whether or not to compute statistical distance between bootstrap error distributions diaganostic.

parallel: bool, optional

Whether or not to parallelise - see parallel_utils.parallel_apply.

save_name: str or None, optional

See nestcheck.io_utils.save_load_result.

save: bool, optional

See nestcheck.io_utils.save_load_result.

load: bool, optional

See nestcheck.io_utils.save_load_result.

overwrite_existing: bool, optional

See nestcheck.io_utils.save_load_result.

Returns:
df: pandas DataFrame

Results table showing calculation values and diagnostics. Rows show different runs (or pairs of runs for pairwise comparisons). Columns have titles given by estimator_names and show results for the different functions in estimators_list.

nestcheck.diagnostics_tables.thread_values_df(run_list, estimator_list, estimator_names, **kwargs)[source]

Calculates estimator values for the constituent threads of the input runs.

Parameters:
run_list: list of dicts

List of nested sampling run dicts.

estimator_list: list of functions

Estimators to apply to runs.

estimator_names: list of strs

Name of each func in estimator_list.

kwargs:

Kwargs to pass to parallel_apply.

Returns:
df: pandas data frame

Columns represent estimators and rows represent runs. Each cell contains a 1d numpy array with length equal to the number of threads in the run, containing the results from evaluating the estimator on each thread.

plots

Functions for diagnostic plots of nested sampling runs.

Includes functions for plots described ‘nestcheck: diagnostic tests for nested sampling calculations’ (Higson et al. 2019).

nestcheck.plots.alternate_helper(x, alt_samps, func=None)[source]

Helper function for making fgivenx plots of functions with 2 array arguments of variable lengths.

nestcheck.plots.average_by_key(dict_in, key)[source]

Helper function for plot_run_nlive.

Try returning the average of dict_in[key] and, if this does not work or if key is None, return average of whole dict.

Parameters:
dict_in: dict

Values should be arrays.

key: str
Returns:
average: float
nestcheck.plots.bs_param_dists(run_list, **kwargs)[source]

Creates posterior distributions and their bootstrap error functions for input runs and estimators.

For a more detailed description and some example use cases, see ‘nestcheck: diagnostic tests for nested sampling calculations’ (Higson et al. 2019).

Parameters:
run_list: dict or list of dicts

Nested sampling run(s) to plot.

fthetas: list of functions, optional

Quantities to plot. Each must map a 2d theta array to 1d ftheta array - i.e. map every sample’s theta vector (every row) to a scalar quantity. E.g. use lambda x: x[:, 0] to plot the first parameter.

labels: list of strs, optional

Labels for each ftheta.

ftheta_lims: list, optional

Plot limits for each ftheta.

n_simulate: int, optional

Number of bootstrap replications to be used for the fgivenx distributions.

random_seed: int, optional

Seed to make sure results are consistent and fgivenx caching can be used.

figsize: tuple, optional

Matplotlib figsize in (inches).

nx: int, optional

Size of x-axis grid for fgivenx plots.

ny: int, optional

Size of y-axis grid for fgivenx plots.

cache: str or None

Root for fgivenx caching (no caching if None).

parallel: bool, optional

fgivenx parallel option.

rasterize_contours: bool, optional

fgivenx rasterize_contours option.

tqdm_kwargs: dict, optional

Keyword arguments to pass to the tqdm progress bar when it is used in fgivenx while plotting contours.

Returns:
fig: matplotlib figure
nestcheck.plots.kde_plot_df(df, xlims=None, **kwargs)[source]

Plots kde estimates of distributions of samples in each cell of the input pandas DataFrame.

There is one subplot for each dataframe column, and on each subplot there is one kde line.

Parameters:
df: pandas data frame

Each cell must contain a 1d numpy array of samples.

xlims: dict, optional

Dictionary of xlimits - keys are column names and values are lists of length 2.

num_xticks: int, optional

Number of xticks on each subplot.

figsize: tuple, optional

Size of figure in inches.

nrows: int, optional

Number of rows of subplots.

ncols: int, optional

Number of columns of subplots.

normalize: bool, optional

If true, kde plots are normalized to have the same area under their curves. If False, their max value is set to 1.

legend: bool, optional

Should a legend be added?

legend_kwargs: dict, optional

Additional kwargs for legend.

Returns:
fig: matplotlib figure
nestcheck.plots.param_logx_diagram(run_list, **kwargs)[source]

Creates diagrams of a nested sampling run’s evolution as it iterates towards higher likelihoods, expressed as a function of log X, where X(L) is the fraction of the prior volume with likelihood greater than some value L.

For a more detailed description and some example use cases, see ‘nestcheck: diagnostic tests for nested sampling calculations” (Higson et al. 2019).

Parameters:
run_list: dict or list of dicts

Nested sampling run(s) to plot.

fthetas: list of functions, optional

Quantities to plot. Each must map a 2d theta array to 1d ftheta array - i.e. map every sample’s theta vector (every row) to a scalar quantity. E.g. use lambda x: x[:, 0] to plot the first parameter.

labels: list of strs, optional

Labels for each ftheta.

ftheta_lims: dict, optional

Plot limits for each ftheta.

plot_means: bool, optional

Should the mean value of each ftheta be plotted?

n_simulate: int, optional

Number of bootstrap replications to use for the fgivenx distributions.

random_seed: int, optional

Seed to make sure results are consistent and fgivenx caching can be used.

logx_min: float, optional

Lower limit of logx axis.

figsize: tuple, optional

Matplotlib figure size (in inches).

colors: list of strs, optional

Colors to plot run scatter plots with.

colormaps: list of strs, optional

Colormaps to plot run fgivenx plots with.

npoints: int, optional

How many points to have in the logx array used to calculate and plot analytical weights.

cache: str or None

Root for fgivenx caching (no caching if None).

parallel: bool, optional

fgivenx parallel optional

point_size: float, optional

size of markers on scatter plot (in pts)

thin: float, optional

factor by which to reduce the number of samples before plotting the scatter plot. Must be in half-closed interval (0, 1].

rasterize_contours: bool, optional

fgivenx rasterize_contours option.

tqdm_kwargs: dict, optional

Keyword arguments to pass to the tqdm progress bar when it is used in fgivenx while plotting contours.

Returns:
fig: matplotlib figure
nestcheck.plots.plot_bs_dists(run, fthetas, axes, **kwargs)[source]

Helper function for plotting uncertainties on posterior distributions using bootstrap resamples and the fgivenx module. Used by bs_param_dists and param_logx_diagram.

Parameters:
run: dict

Nested sampling run to plot.

fthetas: list of functions

Quantities to plot. Each must map a 2d theta array to 1d ftheta array - i.e. map every sample’s theta vector (every row) to a scalar quantity. E.g. use lambda x: x[:, 0] to plot the first parameter.

axes: list of matplotlib axis objects
ftheta_lims: list, optional

Plot limits for each ftheta.

n_simulate: int, optional

Number of bootstrap replications to use for the fgivenx distributions.

colormap: matplotlib colormap

Colors to plot fgivenx distribution.

mean_color: matplotlib color as str

Color to plot mean of each parameter. If None (default) means are not plotted.

nx: int, optional

Size of x-axis grid for fgivenx plots.

ny: int, optional

Size of y-axis grid for fgivenx plots.

cache: str or None

Root for fgivenx caching (no caching if None).

parallel: bool, optional

fgivenx parallel option.

rasterize_contours: bool, optional

fgivenx rasterize_contours option.

smooth: bool, optional

fgivenx smooth option.

flip_axes: bool, optional

Whether or not plot should be rotated 90 degrees anticlockwise onto its side.

tqdm_kwargs: dict, optional

Keyword arguments to pass to the tqdm progress bar when it is used in fgivenx while plotting contours.

Returns:
cbar: matplotlib colorbar

For use in higher order functions.

nestcheck.plots.plot_run_nlive(method_names, run_dict, **kwargs)[source]

Plot the allocations of live points as a function of logX for the input sets of nested sampling runs of the type used in the dynamic nested sampling paper (Higson et al. 2019). Plots also include analytically calculated distributions of relative posterior mass and relative posterior mass remaining.

Parameters:
method_names: list of strs
run_dict: dict of lists of nested sampling runs.

Keys of run_dict must be method_names.

logx_given_logl: function, optional

For mapping points’ logl values to logx values. If not specified the logx coordinates for each run are estimated using its numbers of live points.

logl_given_logx: function, optional

For calculating the relative posterior mass and posterior mass remaining at each logx coordinate.

logx_min: float, optional

Lower limit of logx axis. If not specified this is set to the lowest logx reached by any of the runs.

ymax: bool, optional

Maximum value for plot’s nlive axis (yaxis).

npoints: int, optional

Number of points to have in the fgivenx plot grids.

figsize: tuple, optional

Size of figure in inches.

post_mass_norm: str or None, optional

Specify method_name for runs use form normalising the analytic posterior mass curve. If None, all runs are used.

cum_post_mass_norm: str or None, optional

Specify method_name for runs use form normalising the analytic cumulative posterior mass remaining curve. If None, all runs are used.

Returns:
fig: matplotlib figure
nestcheck.plots.rel_posterior_mass(logx, logl)[source]

Calculate the relative posterior mass for some array of logx values given the likelihood, prior and number of dimensions. The posterior mass at each logX value is proportional to L(X)X, where L(X) is the likelihood. The weight is returned normalized so that the integral of the weight with respect to logX is 1.

Parameters:
logx: 1d numpy array

Logx values at which to calculate posterior mass.

logl: 1d numpy array

Logl values corresponding to each logx (same shape as logx).

Returns:
w_rel: 1d numpy array

Relative posterior mass at each input logx value.

nestcheck.plots.weighted_1d_gaussian_kde(x, samples, weights)[source]

Gaussian kde with weighted samples (1d only). Uses Scott bandwidth factor.

When all the sample weights are equal, this is equivalent to

kde = scipy.stats.gaussian_kde(theta) return kde(x)

When the weights are not all equal, we compute the effective number of samples as the information content (Shannon entropy)

nsamp_eff = exp(- sum_i (w_i log(w_i)))

Alternative ways to estimate nsamp_eff include Kish’s formula

nsamp_eff = (sum_i w_i) ** 2 / (sum_i w_i ** 2)

See https://en.wikipedia.org/wiki/Effective_sample_size and “Effective sample size for importance sampling based on discrepancy measures” (Martino et al. 2017) for more information.

Parameters:
x: 1d numpy array

Coordinates at which to evaluate the kde.

samples: 1d numpy array

Samples from which to calculate kde.

weights: 1d numpy array of same shape as samples

Weights of each point. Need not be normalised as this is done inside the function.

Returns:
result: 1d numpy array of same shape as x

Kde evaluated at x values.

estimators

Functions for estimating quantities from nested sampling runs. Each estimator function should have arguments:

def estimator_func(self, ns_run, logw=None, simulate=False):
    ...

Any additional arguments required for the function should be keyword arguments.

The logw argument allows the log weights for the points in the run to be provided - this is useful if many estimators are being calculated from the same run as it allows logw to only be calculated once. If it is not specified, logw is calculated from the run when required.

The simulate argument is passed to ns_run_utils.get_logw, and is only used if the function needs to calculate logw.

nestcheck.estimators.count_samples(ns_run, **kwargs)[source]

Number of samples in run.

Unlike most estimators this does not require log weights, but for convenience will not throw an error if they are specified.

Parameters:
ns_run: dict

Nested sampling run dict (see the data_processing module docstring for more details).

Returns:
int
nestcheck.estimators.evidence(ns_run, logw=None, simulate=False)[source]

Bayesian evidence \(\log \mathcal{Z}\).

Parameters:
ns_run: dict

Nested sampling run dict (see the data_processing module docstring for more details).

logw: None or 1d numpy array, optional

Log weights of samples.

simulate: bool, optional

Passed to ns_run_utils.get_logw if logw needs to be calculated.

Returns:
float
nestcheck.estimators.get_latex_name(func_in, **kwargs)[source]

Produce a latex formatted name for each function for use in labelling results.

Parameters:
func_in: function
kwargs: dict, optional

Kwargs for function.

Returns:
latex_name: str

Latex formatted name for the function.

nestcheck.estimators.logz(ns_run, logw=None, simulate=False)[source]

Natural log of Bayesian evidence \(\log \mathcal{Z}\).

Parameters:
ns_run: dict

Nested sampling run dict (see the data_processing module docstring for more details).

logw: None or 1d numpy array, optional

Log weights of samples.

simulate: bool, optional

Passed to ns_run_utils.get_logw if logw needs to be calculated.

Returns:
float
nestcheck.estimators.param_cred(ns_run, logw=None, simulate=False, probability=0.5, param_ind=0)[source]

One-tailed credible interval on the value of a single parameter (component of theta).

Parameters:
ns_run: dict

Nested sampling run dict (see the data_processing module docstring for more details).

logw: None or 1d numpy array, optional

Log weights of samples.

simulate: bool, optional

Passed to ns_run_utils.get_logw if logw needs to be calculated.

probability: float, optional

Quantile to estimate - must be in open interval (0, 1). For example, use 0.5 for the median and 0.84 for the upper 84% quantile. Passed to weighted_quantile.

param_ind: int, optional

Index of parameter for which the credible interval should be calculated. This corresponds to the column of ns_run[‘theta’] which contains the parameter.

Returns:
float
nestcheck.estimators.param_mean(ns_run, logw=None, simulate=False, param_ind=0, handle_indexerror=False)[source]

Mean of a single parameter (single component of theta).

Parameters:
ns_run: dict

Nested sampling run dict (see the data_processing module docstring for more details).

logw: None or 1d numpy array, optional

Log weights of samples.

simulate: bool, optional

Passed to ns_run_utils.get_logw if logw needs to be calculated.

param_ind: int, optional

Index of parameter for which the mean should be calculated. This corresponds to the column of ns_run[‘theta’] which contains the parameter.

handle_indexerror: bool, optional

Make the function function return nan rather than raising an IndexError if param_ind >= ndim. This is useful when applying the same list of estimators to data sets of different dimensions.

Returns:
float
nestcheck.estimators.param_squared_mean(ns_run, logw=None, simulate=False, param_ind=0)[source]

Mean of the square of single parameter (second moment of its posterior distribution).

Parameters:
ns_run: dict

Nested sampling run dict (see the data_processing module docstring for more details).

logw: None or 1d numpy array, optional

Log weights of samples.

simulate: bool, optional

Passed to ns_run_utils.get_logw if logw needs to be calculated.

param_ind: int, optional

Index of parameter for which the second moment should be calculated. This corresponds to the column of ns_run[‘theta’] which contains the parameter.

Returns:
float
nestcheck.estimators.r_cred(ns_run, logw=None, simulate=False, probability=0.5)[source]

One-tailed credible interval on the value of the radial coordinate (magnitude of theta vector).

Parameters:
ns_run: dict

Nested sampling run dict (see the data_processing module docstring for more details).

logw: None or 1d numpy array, optional

Log weights of samples.

simulate: bool, optional

Passed to ns_run_utils.get_logw if logw needs to be calculated.

probability: float, optional

Quantile to estimate - must be in open interval (0, 1). For example, use 0.5 for the median and 0.84 for the upper 84% quantile. Passed to weighted_quantile.

Returns:
float
nestcheck.estimators.r_mean(ns_run, logw=None, simulate=False)[source]

Mean of the radial coordinate (magnitude of theta vector).

Parameters:
ns_run: dict

Nested sampling run dict (see the data_processing module docstring for more details).

logw: None or 1d numpy array, optional

Log weights of samples.

simulate: bool, optional

Passed to ns_run_utils.get_logw if logw needs to be calculated.

Returns:
float
nestcheck.estimators.weighted_quantile(probability, values, weights)[source]

Get quantile estimate for input probability given weighted samples using linear interpolation.

Parameters:
probability: float

Quantile to estimate - must be in open interval (0, 1). For example, use 0.5 for the median and 0.84 for the upper 84% quantile.

values: 1d numpy array

Sample values.

weights: 1d numpy array

Corresponding sample weights (same shape as values).

Returns:
quantile: float

io_utils

Helper functions for saving, loading and input/output.

nestcheck.io_utils.pickle_load(name, extension='.pkl')[source]

Load data with pickle.

Parameters:
name: str

Path to save to (includes dir, excludes extension).

extension: str, optional

File extension.

Returns:
Contents of file path.
nestcheck.io_utils.pickle_save(data, name, **kwargs)[source]

Saves object with pickle.

Parameters:
data: anything picklable

Object to save.

name: str

Path to save to (includes dir, excludes extension).

extension: str, optional

File extension.

overwrite existing: bool, optional

When the save path already contains file: if True, file will be overwritten, if False the data will be saved with the system time appended to the file name.

nestcheck.io_utils.save_load_result(func)[source]

Saves and/or loads func output (must be picklable).

nestcheck.io_utils.timing_decorator(func)[source]

Prints the time func takes to execute.

parallel_utils

Parallel wrapper functions using the concurrent.futures module.

nestcheck.parallel_utils.parallel_apply(func, arg_iterable, **kwargs)[source]

Apply function to iterable with parallelisation and a tqdm progress bar.

Roughly equivalent to

>>> [func(*func_pre_args, x, *func_args, **func_kwargs) for x in
     arg_iterable]

but will not necessarily return results in input order.

Parameters:
func: function

Function to apply to list of args.

arg_iterable: iterable

argument to iterate over.

func_args: tuple, optional

Additional positional arguments for func.

func_pre_args: tuple, optional

Positional arguments to place before the iterable argument in func.

func_kwargs: dict, optional

Additional keyword arguments for func.

parallel: bool, optional

To turn off parallelisation if needed.

parallel_warning: bool, optional

To turn off warning for no parallelisation if needed.

max_workers: int or None, optional

Number of processes. If max_workers is None then concurrent.futures.ProcessPoolExecutor defaults to using the number of processors of the machine. N.B. If max_workers=None and running on supercomputer clusters with multiple nodes, this may default to the number of processors on a single node.

Returns:
results_list: list of function outputs
nestcheck.parallel_utils.parallel_map(func, *arg_iterable, **kwargs)[source]

Apply function to iterable with parallel map, and hence returns results in order. functools.partial is used to freeze func_pre_args and func_kwargs, meaning that the iterable argument must be the last positional argument.

Roughly equivalent to

>>> [func(*func_pre_args, x, **func_kwargs) for x in arg_iterable]
Parameters:
func: function

Function to apply to list of args.

arg_iterable: iterable

argument to iterate over.

chunksize: int, optional

Perform function in batches

func_pre_args: tuple, optional

Positional arguments to place before the iterable argument in func.

func_kwargs: dict, optional

Additional keyword arguments for func.

parallel: bool, optional

To turn off parallelisation if needed.

parallel_warning: bool, optional

To turn off warning for no parallelisation if needed.

max_workers: int or None, optional

Number of processes. If max_workers is None then concurrent.futures.ProcessPoolExecutor defaults to using the number of processors of the machine. N.B. If max_workers=None and running on supercomputer clusters with multiple nodes, this may default to the number of processors on a single node.

Returns:
results_list: list of function outputs
nestcheck.parallel_utils.select_tqdm()[source]

If running in a jupyter notebook, then returns tqdm_notebook. Otherwise returns a regular tqdm progress bar.

Returns:
progress: function

pandas_functions

Useful transformations and operations on pandas DataFrames.

nestcheck.pandas_functions.array_ratio_std(values_n, sigmas_n, values_d, sigmas_d)[source]

Gives error on the ratio of 2 floats or 2 1-dimensional arrays given their values and uncertainties. This assumes the covariance = 0, and that the input uncertainties are small compared to the corresponding input values. _n and _d denote the numerator and denominator respectively.

Parameters:
values_n: float or numpy array

Numerator values.

sigmas_n: float or numpy array

\(1\sigma\) uncertainties on values_n.

values_d: float or numpy array

Denominator values.

sigmas_d: float or numpy array

\(1\sigma\) uncertainties on values_d.

Returns:
std: float or numpy array

\(1\sigma\) uncertainty on values_n / values_d.

nestcheck.pandas_functions.efficiency_gain_df(method_names, method_values, est_names, **kwargs)[source]

Calculated data frame showing

\[\mathrm{efficiency\,gain} = \frac{\mathrm{Var[base\,method]}}{\mathrm{Var[new\,method]}}\]

See the dynamic nested sampling paper (Higson et al. 2019) for more details.

The standard method on which to base the gain is assumed to be the first method input.

The output DataFrame will contain rows:
mean [dynamic goal]: mean calculation result for standard nested

sampling and dynamic nested sampling with each input dynamic goal.

std [dynamic goal]: standard deviation of results for standard

nested sampling and dynamic nested sampling with each input dynamic goal.

gain [dynamic goal]: the efficiency gain (computational speedup)

from dynamic nested sampling compared to standard nested sampling. This equals (variance of standard results) / (variance of dynamic results); see the dynamic nested sampling paper for more details.

Parameters:
method names: list of strs
method values: list

Each element is a list of 1d arrays of results for the method. Each array must have shape (len(est_names),).

est_names: list of strs

Provide column titles for output df.

true_values: iterable of same length as estimators list

True values of the estimators for the given likelihood and prior.

Returns:
results: pandas data frame

Results data frame.

nestcheck.pandas_functions.get_eff_gain(base_std, base_std_unc, meth_std, meth_std_unc, adjust=1)[source]

Calculates efficiency gain for a new method compared to a base method. Given the variation in repeated calculations’ results using the two methods, the efficiency gain is:

\[\mathrm{efficiency\,gain} = \frac{\mathrm{Var[base\,method]}}{\mathrm{Var[new\,method]}}\]

The uncertainty on the efficiency gain is also calculated.

See the dynamic nested sampling paper (Higson et al. 2019) for more details.

Parameters:
base_std: 1d numpy array
base_std_unc: 1d numpy array

Uncertainties on base_std.

meth_std: 1d numpy array
meth_std_unc: 1d numpy array

Uncertainties on base_std.

Returns:
gain: 1d numpy array
gain_unc: 1d numpy array

Uncertainties on gain.

nestcheck.pandas_functions.paper_format_efficiency_gain_df(eff_gain_df)[source]

Transform efficiency gain data frames output by nestcheck into the format shown in the dynamic nested sampling paper (Higson et al. 2019).

Parameters:
eff_gain_df: pandas DataFrame

DataFrame of the from produced by efficiency_gain_df.

Returns:
paper_df: pandas DataFrame
nestcheck.pandas_functions.rmse_and_unc(values_array, true_values)[source]

Calculate the root meet squared error and its numerical uncertainty.

With a reasonably large number of values in values_list the uncertainty on sq_errors should be approximately normal (from the central limit theorem). Uncertainties are calculated via error propagation: if \(\sigma\) is the error on \(X\) then the error on \(\sqrt{X}\) is \(\frac{\sigma}{2 \sqrt{X}}\).

Parameters:
values_array: 2d numpy array

Array of results: each row corresponds to a different estimate of the quantities considered.

true_values: 1d numpy array

Correct values for the quantities considered.

Returns:
rmse: 1d numpy array

Root-mean-squared-error for each quantity.

rmse_unc: 1d numpy array

Numerical uncertainties on each element of rmse.

nestcheck.pandas_functions.summary_df(df_in, **kwargs)[source]

Make a panda data frame of the mean and std devs of an array of results, including the uncertainties on the values.

This is similar to pandas.DataFrame.describe but also includes estimates of the numerical uncertainties.

The output DataFrame has multiindex levels:

‘calculation type’: mean and standard deviations of the data. ‘result type’: value and uncertainty for each quantity.

calculation type result type column_1 column_2 … mean value mean uncertainty std value std uncertainty

Parameters:
df_in: pandas DataFrame
true_values: array

Analytical values if known for comparison with mean. Used to calculate root mean squared errors (RMSE).

include_true_values: bool, optional

Whether or not to include true values in the output DataFrame.

include_rmse: bool, optional

Whether or not to include root-mean-squared-errors in the output DataFrame.

Returns:
df: MultiIndex DataFrame
nestcheck.pandas_functions.summary_df_from_array(results_array, names, axis=0, **kwargs)[source]

Make a panda data frame of the mean and std devs of an array of results, including the uncertainties on the values.

This function converts the array to a DataFrame and calls summary_df on it.

Parameters:
results_array: 2d numpy array
names: list of str

Names for the output df’s columns.

axis: int, optional

Axis on which to calculate summary statistics.

Returns:
df: MultiIndex DataFrame

See summary_df docstring for more details.

nestcheck.pandas_functions.summary_df_from_list(results_list, names, **kwargs)[source]

Make a panda data frame of the mean and std devs of each element of a list of 1d arrays, including the uncertainties on the values.

This just converts the array to a DataFrame and calls summary_df on it.

Parameters:
results_list: list of 1d numpy arrays

Must have same length as names.

names: list of strs

Names for the output df’s columns.

kwargs: dict, optional

Keyword arguments to pass to summary_df.

Returns:
df: MultiIndex DataFrame

See summary_df docstring for more details.

nestcheck.pandas_functions.summary_df_from_multi(multi_in, inds_to_keep=None, **kwargs)[source]

Apply summary_df to a multiindex while preserving some levels.

Parameters:
multi_in: multiindex pandas DataFrame
inds_to_keep: None or list of strs, optional

Index levels to preserve.

kwargs: dict, optional

Keyword arguments to pass to summary_df.

Returns:
df: MultiIndex DataFrame

See summary_df docstring for more details.

dummy_data

Create dummy nested sampling run data for testing.

nestcheck.dummy_data.get_dummy_dynamic_run(nsamples, **kwargs)[source]

Generate dummy data for a dynamic nested sampling run.

Loglikelihood values of points are generated from a uniform distribution in (0, 1), sorted, scaled by logl_range and shifted by logl_start (if it is not -np.inf). Theta values of each point are each generated from a uniform distribution in (0, 1).

Parameters:
nsamples: int

Number of samples in thread.

nthread_init: int

Number of threads in the inital run (starting at logl=-np.inf).

nthread_dyn: int

Number of threads in the inital run (starting at randomly chosen points in the initial run).

ndim: int, optional

Number of dimensions.

seed: int, optional

If not False, the seed is set with np.random.seed(seed).

logl_start: float, optional

logl at which thread starts.

logl_range: float, optional

Scale factor applied to logl values.

nestcheck.dummy_data.get_dummy_run(nthread, nsamples, **kwargs)[source]

Generate dummy data for a nested sampling run.

Log-likelihood values of points are generated from a uniform distribution in (0, 1), sorted, scaled by logl_range and shifted by logl_start (if it is not -np.inf). Theta values of each point are each generated from a uniform distribution in (0, 1).

Parameters:
nthreads: int

Number of threads in the run.

nsamples: int

Number of samples in thread.

ndim: int, optional

Number of dimensions.

seed: int, optional

If not False, the seed is set with np.random.seed(seed).

logl_start: float, optional

logl at which thread starts.

logl_range: float, optional

Scale factor applied to logl values.

nestcheck.dummy_data.get_dummy_thread(nsamples, **kwargs)[source]

Generate dummy data for a single nested sampling thread.

Log-likelihood values of points are generated from a uniform distribution in (0, 1), sorted, scaled by logl_range and shifted by logl_start (if it is not -np.inf). Theta values of each point are each generated from a uniform distribution in (0, 1).

Parameters:
nsamples: int

Number of samples in thread.

ndim: int, optional

Number of dimensions.

seed: int, optional

If not False, the seed is set with np.random.seed(seed).

logl_start: float, optional

logl at which thread starts.

logl_range: float, optional

Scale factor applied to logl values.

write_polychord_output

Functions for writing PolyChord-format output files given a nested sampling run dictionary stored in the nestcheck format.

nestcheck.write_polychord_output.run_dead_birth_array(run, logl_init=-1e+30, **kwargs)[source]

Converts input run into an array of the format of a PolyChord <root>_dead-birth.txt file. Note that this in fact includes live points remaining at termination as well as dead points.

Parameters:
ns_run: dict

Nested sampling run dict (see data_processing module docstring for more details).

logl_init: float, optional

Value used to identify the inital live points which were sampled from the whole prior. Default value is set to -1e30 as in PolyChord.

kwargs: dict, optional

Options for check_ns_run.

Returns:
samples: 2d numpy array

Array of dead points and any remaining live points at termination. Has #parameters + 2 columns: param_1, param_2, … , logl, birth_logl

nestcheck.write_polychord_output.write_run_output(run, **kwargs)[source]

Writes PolyChord output files corresponding to the input nested sampling run. The file root is

root = os.path.join(run['output']['base_dir'],
                    run['output']['file_root'])

Output files which can be made with this function (see the PolyChord documentation for more information about what each contains):

  • [root].stats

  • [root].txt

  • [root]_equal_weights.txt

  • [root]_dead-birth.txt

  • [root]_dead.txt

Files produced by PolyChord which are not made by this function:

  • [root].resume: for resuming runs part way through (not relevant for a completed run).

  • [root]_phys_live.txt and [root]phys_live-birth.txt: for checking runtime progress (not relevant for a completed run).

  • [root].paramnames: for use with getdist (not needed when calling getdist from within python).

Parameters:
ns_run: dict

Nested sampling run dict (see data_processing module docstring for more details).

write_dead: bool, optional

Whether or not to write [root]_dead.txt and [root]_dead-birth.txt.

write_stats: bool, optional

Whether or not to write [root].stats.

posteriors: bool, optional

Whether or not to write [root].txt.

equals: bool, optional

Whether or not to write [root]_equal_weights.txt.

stats_means_errs: bool, optional

Whether or not to calculate mean values of \(\log \mathcal{Z}\) and each parameter, and their uncertainties.

fmt: str, optional

Formatting for numbers written by np.savetxt. Default value is set to make output files look like the ones produced by PolyChord.

n_simulate: int, optional

Number of bootstrap replications to use when estimating uncertainty on evidence and parameter means.

logl_init: float, optional

Value used to identify the inital live points which were sampled from the whole prior. Default value is set to -1e30 as in PolyChord.

nestcheck.write_polychord_output.write_stats_file(run_output_dict)[source]

Writes a dummy PolyChord format .stats file for tests functions for processing stats files. This is written to:

base_dir/file_root.stats

Also returns the data in the file as a dict for comparison.

Parameters:
run_output_dict: dict

Output information to write to .stats file. Must contain file_root and base_dir. If other settings are not specified, default values are used.

Returns:
output: dict

The expected output of nestcheck.process_polychord_stats(file_root, base_dir)