1NB Interact #

Features#

Interact with your data with AI.
AI generates python code that is executed in an executer. The result is observed by the AI.
With workers, your data does not leave your computer.

Choose executer#

executer	features	privacy	user requirements	tokens needed
1nb executer	Code executes in 1NB's server	Data saved securely in 1NB	None	Compute + AI
Self hosted executer	Code executes in User's computer	Data saved securely in 1NB	Start a worker	AI
Worker	Code executes in User's computer	Data resides in User's computer	Start a worker	AI

OneNB: CLI/Python API for 1NB #

Features#

Use 1NB with your existing git repository and jupyter notebooks

Export data directly from your notebooks to 1nb:

from OneNB.exports import DataX
with DataX.writer(".df") as wr:
    my_dataframe.to_pickle(wr)
Python

Directly import data from a saved tag in 1nb into your jupyter notebooks

from OneNB.i.myrepo.tag import MyData
data_df = pd.read_csv(MyData.reader())
Python

Import data from other notebooks you are working on (pipelining):
- from OneNB.working import DataX
  Python
Worker will automatically rerun notebooks (or new interacts) in your local computer. So you can use your data without it leaving your system
And many more

Installation #

Supported Python version: >= 3.10

Linux/Mac#

Install the wheel (download) with pip (preferably in a virtual env):

python3 -m venv ~/.1nbvenv
source ~/.1nbvenv/bin/activate
python3 -m pip install OneNB-X.X.X-py3-none-any.whl
Cmd

If you have sudo permission, make 1nb command accessible from anywhere:

sudo ln -s ~/.1nbvenv/bin/1nb /usr/local/bin/1nb
Cmd

Otherwise, activate the environment when running 1nb commands:

source ~/.1nbvenv/bin/activate
1nb init
Cmd

Windows#

Warning: the worker function does not currently run in windows.

Make sure that the python version is at least 3.10: python --version

Install the wheel (download) with pip (preferably in a virtual env):

cd C:\path\to\home\
python -m venv .1nbvenv
.1nbvenv\Scripts\activate
python -m pip install C:\path\to\downloads\OneNB-X.X.X-py3-none-any.whl
Cmd

When running 1nb commands activate the env:

C:\path\to\home\.1nbvenv\Scripts\activate
1nb init
Cmd

Create a repository #

To be able to submit code & data to your 1nb repository, you will first need to create a repository locally. Since 1nb heavily uses version control with Git, you can only create a 1nb repo in a Git repository.

git init
1nb init --name repository name --help
Cmd

Executing this will setup the necessary 1nb files (in .1nb). It will also add those files to the git changeset. So make sure to git commit after that.

Choose your data storage#

Your data (exports) can be stored in one of several ways:

type	provider	options	Stores in	Local executer	1NB executer	Command
Local			Local machine	yes	no	`1nb init local`
Cloud	AWS S3		1NB's secure data store	yes	yes	`1nb init cloud s3`
Cloud	AWS S3	Bucket name	User's S3 bucket	yes	no	`1nb init cloud --bucket <bucket> s3`
Cloud	Azure	Bucket name	User's Azure data store	yes	no	`1nb init cloud --bucket <bucket> azure`
Cloud	GCP	Bucket name	User's GCP data store	yes	no	`1nb init cloud --bucket <bucket> gcp`

Use 1nb init --help to access more options.

Local
- Data is stored only on your local system.
Cloud (aws s3/azure/gcs)
- Data is stored in the cloud (azure and gcs are not functional yet).
- Cloud credentials will be accessed from the current system (will not be saved).
- You can specify your own bucket and a key prefix with --bucket and --key-prefix
- If --bucket is not set, data will be saved securely in 1nb's remote storage.
- You can provide an --profile aws profile in case a non-default profile of aws credentials is to be used.

Executer is needed to run the Interact (New). If 1NB is chosen, code will be executed in 1nb's server. local worker will execute the code in a user's system.

Additional options#

--requirements-file Packages needed by your python project (other than OneNB). These packages will be automatically installed in a virtual env before running the project notebooks.
--profile AWS profile from which credentials will be used if cloud S3 storage is used.

1nb init --name repository name --requirements-file pip requirements file
Cmd

Local code#

Python modules (.py) files are not saved in 1nb. If your notebooks depend on it, the code (as well as any additional metadata) needs to be packaged in setuptools style with pyproject.toml.

This will make the package pip installable.

Finally add the package path to your requirements file above: ./my/project/path

Prepare a Notebook #

Saving data#

from OneNB.exports import MyData
Python

This code creates a data export named MyData. Writing data to it is similar to using the open syntax for saving binary data to disk:

with MyData.open(suffix=".npy") as out:
    np.save(array, out)
Python

More examples:

import pickle
from OneNB.exports import labels

vocab = {...}

with labels.open(suffix=".pkl", mode='wb') as fe:
    pickle.dump(vocab, fe)
Python

import pandas as pd
from OneNB.exports import Sales

out = pd.DataFrame(...)

with Sales.open(suffix=".csv") as f:
    out.to_csv(f)
Python

Suffix is required to identify the type of data being saved. To save text data change the mode to text: .open(mode="t", suffix=".json").

This code can be run from any script or notebook as many times. The name of the export MyData will be an unique identifier. So exporting with the same name from other locations will overwrite it.

Parameterize your data & notebooks#

The parameters in a notebook should additionally be defined in a cell that is tagged parameters. This cell should be the first cell of the notebook:

year = 2023
foo = "bar"
Jupyter
Cell
Tags: [parameters]

Later:

p1 = dict(year=year, foo=foo)
from OneNB.exports._p1 import Report
Jupyter
Cell

Defining the parameters in this way allows easily modifiying key assumptions while executing the notebook. The parameters values should be simple literals and not include objects.

Also, if the data values are simple (boolean, string, number) then 1nb will allow the user to modify it in the web UI while executing it later.

Warning: you should include all the variables in the parameters cell to the import definition too (p1). Failing to do so will result in errors in the push stage.

Mark Results#

Often a notebook has some key cells that have the final result in them. It maybe some charts or text data. Tagging these cells as result will make the output from those cells easily viewable in the web ui. Code from these cells will not be shown.

Save run #

After executing your notebook with a parameter set you should add it to 1nb:

1nb addnb --nb notebook.ipynb
Cmd

This operation will identify the parameters in the notebook and save it for later upload. This operation needs to be done every time you have a notebook ready with a unique parameter set.

In case this notebook is not the one that was originally executed (e.g. it was renamed or moved after execution) then please add the original one with --original. In that case --nb should be the notebook that was saved after execute:

1nb addnb --nb notebook.ipynb --original original.notebook.ipynb
Cmd

Push #

Before your work can be uploaded, it first needs to be commited and tagged to git:

git commit
git tag reportsv1

1nb push
Cmd

On performing 1nb push all the data exports and notebooks are going to be uploaded to the cloud. If this is the first push, use 1nb push --create instead. After this you should be able to view this repository in 1nb.ai

Worker #

To use interact nb (with self hosted / keep data locally) or run nb (rerun notebooks) you will need to create a worker process. Install the python client and perform:

1nb worker --repo repository
Cmd

If you are using interact nb + keep data locally, you will also need to provide the file path to the data:

1nb worker --repo repository --use-data path
Cmd

If the repository does not already exist, use --create-repo after the repository name: --repo <repository> --create-repo.

This feature is currently not supported in Windows.

Reuse data #

Data can be loaded with syntax similar to exporting, but with the tag/commit info to identify the revision version of the code that exported it. It then needs to be read as usual in python, but using .reader() wherever a file handle is expected :

from OneNB.i.repository.tag import MyData

# example: read csv
import pandas as pd
my_data = pd.read_csv(MyData.reader())

# example: read pickle
import pickle
my_data = pickle.load(MyData.reader())

# parameterized data
p = {"year": 2023, "foo": "bar"}
from OneNB.i.repository.tag._p import Report
Python

Exporting data from such a script/notebook will produce a dependency on the exports MyData and Report. Viewing the export in the web UI will list these dependancies.

Also, if this is a notebook and the dependant exports are also generated by notebooks, 1nb will automatically create a pipeline such that the dependant notebooks are rerun (if required) before reunning this notebook.

In case the repository name has special characters so that python throws SyntaxError, replace those characters with an underscore (_).

How it works #

When you execute a jupyter notebook that imports/exports data from/to 1NB/1NB:
- It identifies the path of the executing notebook
- Fetches the data from 1NB if data is being imported from a saved tag
  - For local data storage only data path is fetched
- When data is being exported, it stores the binary data into an internal file
  - For local storage only the path to this internal file is uploaded to 1nb
- Stores the import/export metadata to .1nb/exports.json
  - If notebook parameters are used, then it also associates the data with the current parameter dictionary
After executing and saving the notebook, use 1nb addnb
- It saves a copy of the notebook to .1nb/
- Allows you to re-execute the notebook while only changing the parameters cell.
- On re-executing and repeating addnb, it will record all the executions with unique parameter sets.
On finalizing your work
- Save your work to git: git commit, git tag, git push
- Tagging the commit allows 1nb to associate the exported data with this version tag
- It identifies the current git branch and parent git branches:
  - If a parent branch has a notebook with the same name, then the current notebook is recorded as a revision of the parent
- It only uses the local branch name, not the remote
- Push the notebooks and data (metadata) to 1nb push
On Push
- Notebooks that export data are always uploaded to 1NB
- Exported data is uploaded according to data storage
- Local python modules (.py) are not uploaded by default to 1NB. Follow package local code so notebooks that depend on it will not fail on rerun/interact.
Worker
- You start a worker with 1nb worker
- It listens for active task from 1NB
- It creates a virtualenv (or reuses existing ones)
- Executes the incoming code and sends back the results
- When the notebook is finalized, it will itself save and push the notebook back to 1NB

API #

Data Exports#

class Exported(ModuleType, SaveableValue)
Python

Export data to 1nb. You can write to this like a file (only writing is supported) using the file open syntax:

from OneNB.exports import exp

with exp.open(mode="wt", suffix=".txt") as f:
    f.write("Hello world")
Python

Other writers: text_writer, writer, write_csv, write_as_json and write_stream

open#

@contextmanager
def open(*, suffix: str, mode: str = "wt", encoding: str = "utf-8")
Python

Writer similar to file open:

with _object_.open(suffix=".txt") as f:
    f.write("Hello world")
Python

Arguments:

mode - 't' for text data, 'b' for binary data;
suffix - file name suffix to signify the file type;
encoding - byte encoding format if text data, default utf-8

text_writer#

def text_writer(file_ext: str)
Python

A file like writer for text data. Shorthand for .open

writer#

def writer(file_ext: str)
Python

A file like writer for binary data. Shorthand for .open

write_csv#

def write_csv(data: list[list[str]])
Python

Write object (list[list[str]]) as csv with excel dialect

write_as_json#

def write_as_json(data, cls: Type[json.JSONEncoder] | None = None)
Python

Write object as json, cls is same as in json.dump

write_stream#

def write_stream(stream: io.BytesIO | io.StringIO, file_ext: str,
                 encoding: str)
Python

Write data directly from a IO buffer

save_value#

def save_value(data: any, _special: any = None)
Python

Save value directly outside of data storage. Can save upto 100 bytes and supports only native python types.

Note: this will be saved in 1NB, regardless of this repository's data storage settings.

open_no_upload#

def open_no_upload(*,
                   fn_write_bytes_to: Optional[WriteToBuffer] = None,
                   fn_returning_fileobj: Optional[ReturnsReadable] = None,
                   source_type: str,
                   params: dict[str, bool | int | str | None])
Python

Directly load data from an external source, and store the source parameters.

exp.open_no_upload(source_type="my_file", params=("/path/to/my/file",))
Python

s3_client = boto3.client("s3")

def download_from_s3(buffer, bucket, key):
    s3_client.download_fileobj(bucket, key, buffer)

exp.open_no_upload(source_type="my_s3", fn_write_bytes_to=download_from_s3, params=("my_bucket", "my_key"))
Python

Arguments:

fn_write_bytes_to optional - a function that writes the data into a buffer and the buffer is supplied as the first parameter
fn_returning_fileobj optional - a function that returns a readable file like object that points to the data
source_type - string for user reference
params - parameters passed to the function (or python's builtin open(file, ...) function)

Data Import#

class Imported(ModuleType, SupportsBytes, LocalStorePaths)
Python

Buffered imported data. Get a file like readable object with .reader(). Use it wherever a readable file-like object can be used.

from OneNB.i.repo.tag import MyData

json.load(MyData.reader())
Python

If its a value, use: .value

reader#

def reader()
Python

Get a readable binary file like object that can be used wherever a file like object can be used.

pd.read_csv(MyData.reader())
Python

value#

@property
def value()
Python

Get the value if it is stored with Exported.save_value or Exported.open_no_upload methods.