3.1. mbl.schema.RandomHeisenbergEDSchema#

class mbl.schema.RandomHeisenbergEDSchema(*args, **kwargs)[source]#

Bases: pandera.model.SchemaModel

Check if all columns in a dataframe have a column in the Schema.

Parameters
  • check_obj (pd.DataFrame) – the dataframe to be validated.

  • head – validate the first n rows. Rows overlapping with tail or sample are de-duplicated.

  • tail – validate the last n rows. Rows overlapping with head or sample are de-duplicated.

  • sample – validate a random sample of n rows. Rows overlapping with head or tail are de-duplicated.

  • random_state – random seed for the sample argument.

  • lazy – if True, lazily evaluates dataframe against all validation checks and raises a SchemaErrors. Otherwise, raise SchemaError as soon as one occurs.

  • inplace – if True, applies coercion to the object of validation, otherwise creates a copy of the data.

Returns

validated DataFrame

Raises

SchemaError – when DataFrame violates built-in or custom checks.

Example

Return type

pandera.typing.common.DataFrameBase[pandera.model.TSchemaModel]

Calling schema.validate returns the dataframe.

>>> import pandas as pd
>>> import pandera as pa
>>>
>>> df = pd.DataFrame({
...     "probability": [0.1, 0.4, 0.52, 0.23, 0.8, 0.76],
...     "category": ["dog", "dog", "cat", "duck", "dog", "dog"]
... })
>>>
>>> schema_withchecks = pa.DataFrameSchema({
...     "probability": pa.Column(
...         float, pa.Check(lambda s: (s >= 0) & (s <= 1))),
...
...     # check that the "category" column contains a few discrete
...     # values, and the majority of the entries are dogs.
...     "category": pa.Column(
...         str, [
...             pa.Check(lambda s: s.isin(["dog", "cat", "duck"])),
...             pa.Check(lambda s: (s == "dog").mean() > 0.5),
...         ]),
... })
>>>
>>> schema_withchecks.validate(df)[["probability", "category"]]
   probability category
0         0.10      dog
1         0.40      dog
2         0.52      cat
3         0.23     duck
4         0.80      dog
5         0.76      dog
__init__()#

Methods

__init__()

bipartite_entropy_bounded_in(df)

bounded_in(series)

close_to_integer(series)

example(*[, size])

Create a hypothesis strategy for generating a DataFrame.

monotonically_increasing(series)

strategy(*[, size])

Create a hypothesis strategy for generating a DataFrame.

to_schema()

Create DataFrameSchema from the SchemaModel.

to_yaml([stream])

Convert Schema to yaml using io.to_yaml.

validate(check_obj[, head, tail, sample, ...])

Check if all columns in a dataframe have a column in the Schema.

Attributes

bipartite_entropy

disorder

edge_entropy

en

level_id

offset

penalty

s_target

seed

system_size

total_sz

trial_id

level_id: pandera.typing.pandas.Series[int] = 'level_id'#
en: pandera.typing.pandas.Series[float] = 'en'#
total_sz: pandera.typing.pandas.Series[float] = 'total_sz'#
edge_entropy: pandera.typing.pandas.Series[float] = 'edge_entropy'#
bipartite_entropy: pandera.typing.pandas.Series[float] = 'bipartite_entropy'#
system_size: pandera.typing.pandas.Series[int] = 'system_size'#
disorder: pandera.typing.pandas.Series[float] = 'disorder'#
trial_id: pandera.typing.pandas.Series[str] = 'trial_id'#
seed: pandera.typing.pandas.Series[int] = 'seed'#
penalty: pandera.typing.pandas.Series[float] = 'penalty'#
s_target: pandera.typing.pandas.Series[int] = 's_target'#
offset: pandera.typing.pandas.Series[float] = 'offset'#
classmethod monotonically_increasing(series)[source]#
Parameters

series (pandera.typing.pandas.Series[float]) –

Return type

bool

classmethod close_to_integer(series)[source]#
Parameters

series (pandera.typing.pandas.Series[float]) –

Return type

pandera.typing.pandas.Series[bool]

classmethod bounded_in(series)[source]#
Parameters

series (pandera.typing.pandas.Series[float]) –

Return type

pandera.typing.pandas.Series[bool]

classmethod bipartite_entropy_bounded_in(df)[source]#
Parameters

df (pandas.core.frame.DataFrame) –

Return type

pandera.typing.pandas.Series[bool]

class Config#

Bases: pandera.typing.config.BaseConfig

name: Optional[str] = 'RandomHeisenbergEDSchema'#

name of schema