1NB Interact
Features
- Interact with your data with AI.
- AI generates python code that is executed in an executer. The result is observed by the AI.
- With workers, your data does not leave your computer.
Choose executer
executer | features | privacy | user requirements | tokens needed |
---|---|---|---|---|
1nb executer | Code executes in 1NB's server | Data saved securely in 1NB | None | Compute + AI |
Self hosted executer | Code executes in User's computer | Data saved securely in 1NB | Start a worker | AI |
Worker | Code executes in User's computer | Data resides in User's computer | Start a worker | AI |
OneNB: CLI/Python API for 1NB
Features
- Use 1NB with your existing git repository and jupyter notebooks
- Export data directly from your notebooks to 1nb:
from OneNB.exports import DataX with DataX.writer(".df") as wr: my_dataframe.to_pickle(wr)
Python
- Directly import data from a saved tag in 1nb into your jupyter notebooks
from OneNB.i.myrepo.tag import MyData data_df = pd.read_csv(MyData.reader())
Python
- Import data from other notebooks you are working on (pipelining):
from OneNB.working import DataX
Python
- Worker will automatically rerun notebooks (or new interacts) in your local computer. So you can use your data without it leaving your system
- And many more
Installation
Supported Python version: >= 3.10
Linux/Mac
Install the wheel (download) with pip (preferably in a virtual env):
python3 -m venv ~/.1nbvenv source ~/.1nbvenv/bin/activate python3 -m pip install OneNB-X.X.X-py3-none-any.whl
If you have sudo
permission, make 1nb
command accessible from anywhere:
sudo ln -s ~/.1nbvenv/bin/1nb /usr/local/bin/1nb
Otherwise, activate the environment when running 1nb
commands:
source ~/.1nbvenv/bin/activate
1nb init
Windows
Warning: the worker function does not currently run in windows.
Make sure that the python version is at least 3.10: python --version
Install the wheel (download) with pip (preferably in a virtual env):
cd C:\path\to\home\ python -m venv .1nbvenv .1nbvenv\Scripts\activate python -m pip install C:\path\to\downloads\OneNB-X.X.X-py3-none-any.whl
When running 1nb commands activate the env:
C:\path\to\home\.1nbvenv\Scripts\activate
1nb init
Create a repository
To be able to submit code & data to your 1nb repository, you will first need to create a repository locally. Since 1nb heavily uses version control with Git, you can only create a 1nb repo in a Git repository.
git init
1nb init --name repository name --help
Executing this will setup the necessary 1nb files (in .1nb). It will also add those files to the git changeset. So make sure to git commit after that.
Choose your data storage
Your data (exports) can be stored in one of several ways:
type | provider | options | Stores in | Local executer | 1NB executer | Command |
---|---|---|---|---|---|---|
Local | Local machine | yes | no | 1nb init local | ||
Cloud | AWS S3 | 1NB's secure data store | yes | yes | 1nb init cloud s3 | |
Cloud | AWS S3 | Bucket name | User's S3 bucket | yes | no | 1nb init cloud --bucket <bucket> s3 |
Cloud | Azure | Bucket name | User's Azure data store | yes | no | 1nb init cloud --bucket <bucket> azure |
Cloud | GCP | Bucket name | User's GCP data store | yes | no | 1nb init cloud --bucket <bucket> gcp |
Use 1nb init --help
to access more options.
- Local
- Data is stored only on your local system.
- Cloud (aws s3/azure/gcs)
- Data is stored in the cloud (azure and gcs are not functional yet).
- Cloud credentials will be accessed from the current system (will not be saved).
- You can specify your own bucket and a key prefix with --bucket and --key-prefix
- If --bucket is not set, data will be saved securely in 1nb's remote storage.
- You can provide an --profile aws profile in case a non-default profile of aws credentials is to be used.
Executer is needed to run the Interact (New). If 1NB is chosen, code will be executed in 1nb's server. local worker will execute the code in a user's system.
Additional options
- --requirements-file Packages needed by your python project (other than
OneNB
). These packages will be automatically installed in a virtual env before running the project notebooks. - --profile AWS profile from which credentials will be used if cloud S3 storage is used.
1nb init --name repository name --requirements-file pip requirements file
Local code
Python modules (.py) files are not saved in 1nb. If your notebooks depend on it, the code (as well as any additional metadata) needs to be packaged in setuptools style with pyproject.toml.
This will make the package pip installable.
Finally add the package path to your requirements file above: ./my/project/path
Prepare a Notebook
Saving data
from OneNB.exports import MyData
This code creates a data export named MyData. Writing data to it is similar to using the open syntax for saving binary data to disk:
with MyData.open(suffix=".npy") as out: np.save(array, out)
More examples:
import pickle
from OneNB.exports import labels
vocab = {...}
with labels.open(suffix=".pkl", mode='wb') as fe:
pickle.dump(vocab, fe)
import pandas as pd
from OneNB.exports import Sales
out = pd.DataFrame(...)
with Sales.open(suffix=".csv") as f:
out.to_csv(f)
Suffix is required to identify the type of data being saved. To save text data change the mode to text: .open(mode="t", suffix=".json")
.
This code can be run from any script or notebook as many times. The name of the export MyData will be an unique identifier. So exporting with the same name from other locations will overwrite it.
Parameterize your data & notebooks
The parameters in a notebook should additionally be defined in a cell that is tagged parameters. This cell should be the first cell of the notebook:
year = 2023 foo = "bar"
Cell
Later:
p1 = dict(year=year, foo=foo) from OneNB.exports._p1 import Report
Cell
Defining the parameters in this way allows easily modifiying key assumptions while executing the notebook. The parameters values should be simple literals and not include objects.
Also, if the data values are simple (boolean, string, number) then 1nb will allow the user to modify it in the web UI while executing it later.
Warning: you should include all the variables in the parameters cell to the import definition too (p1). Failing to do so will result in errors in the push stage.
Mark Results
Often a notebook has some key cells that have the final result in them. It maybe some charts or text data. Tagging these cells as result
will make the output from those cells easily viewable in the web ui. Code from these cells will not be shown.
Save run
After executing your notebook with a parameter set you should add it to 1nb:
1nb addnb --nb notebook.ipynb
This operation will identify the parameters in the notebook and save it for later upload. This operation needs to be done every time you have a notebook ready with a unique parameter set.
In case this notebook is not the one that was originally executed (e.g. it was renamed or moved after execution) then please add the original one with --original
. In that case --nb
should be the notebook that was saved after execute:
1nb addnb --nb notebook.ipynb --original original.notebook.ipynb
Push
Before your work can be uploaded, it first needs to be commited and tagged to git:
git commit git tag reportsv1 1nb push
On performing 1nb push
all the data exports and notebooks are going to be uploaded to the cloud. If this is the first push, use 1nb push --create
instead. After this you should be able to view this repository in 1nb.ai
Worker
To use interact nb (with self hosted / keep data locally) or run nb (rerun notebooks) you will need to create a worker process. Install the python client and perform:
1nb worker --repo repository
If you are using interact nb + keep data locally, you will also need to provide the file path to the data:
1nb worker --repo repository --use-data path
If the repository does not already exist, use --create-repo
after the repository name: --repo <repository> --create-repo
.
This feature is currently not supported in Windows.
Reuse data
Data can be loaded with syntax similar to exporting, but with the tag/commit info to identify the revision version of the code that exported it. It then needs to be read as usual in python, but using .reader()
wherever a file handle is expected :
from OneNB.i.repository.tag import MyData # example: read csv import pandas as pd my_data = pd.read_csv(MyData.reader()) # example: read pickle import pickle my_data = pickle.load(MyData.reader()) # parameterized data p = {"year": 2023, "foo": "bar"} from OneNB.i.repository.tag._p import Report
Exporting data from such a script/notebook will produce a dependency on the exports MyData
and Report
. Viewing the export in the web UI will list these dependancies.
Also, if this is a notebook and the dependant exports are also generated by notebooks, 1nb will automatically create a pipeline such that the dependant notebooks are rerun (if required) before reunning this notebook.
In case the repository name has special characters so that python throws SyntaxError, replace those characters with an underscore (_
).
How it works
- When you execute a jupyter notebook that imports/exports data from/to 1NB/1NB:
- It identifies the path of the executing notebook
- Fetches the data from 1NB if data is being imported from a saved tag
- For local data storage only data path is fetched
- When data is being exported, it stores the binary data into an internal file
- For local storage only the path to this internal file is uploaded to 1nb
- Stores the import/export metadata to
.1nb/exports.json
- If notebook parameters are used, then it also associates the data with the current parameter dictionary
- After executing and saving the notebook, use
1nb addnb
- It saves a copy of the notebook to
.1nb/
- Allows you to re-execute the notebook while only changing the parameters cell.
- On re-executing and repeating
addnb
, it will record all the executions with unique parameter sets.
- It saves a copy of the notebook to
- On finalizing your work
- Save your work to git:
git commit
,git tag
,git push
- Tagging the commit allows 1nb to associate the exported data with this version tag
- It identifies the current git branch and parent git branches:
- If a parent branch has a notebook with the same name, then the current notebook is recorded as a revision of the parent
- It only uses the local branch name, not the remote
- Push the notebooks and data (metadata) to
1nb push
- Save your work to git:
- On Push
- Notebooks that export data are always uploaded to 1NB
- Exported data is uploaded according to data storage
- Local python modules (.py) are not uploaded by default to 1NB. Follow package local code so notebooks that depend on it will not fail on rerun/interact.
- Worker
- You start a worker with
1nb worker
- It listens for active task from 1NB
- It creates a virtualenv (or reuses existing ones)
- Executes the incoming code and sends back the results
- When the notebook is finalized, it will itself save and push the notebook back to 1NB
- You start a worker with
API
Data Exports
class Exported(ModuleType, SaveableValue)
Export data to 1nb. You can write to this like a file (only writing is supported) using the file open syntax:
from OneNB.exports import exp with exp.open(mode="wt", suffix=".txt") as f: f.write("Hello world")
Other writers: text_writer
, writer
, write_csv
, write_as_json
and write_stream
open
@contextmanager def open(*, suffix: str, mode: str = "wt", encoding: str = "utf-8")
Writer similar to file open:
with _object_.open(suffix=".txt") as f: f.write("Hello world")
Arguments:
mode
- 't' for text data, 'b' for binary data;suffix
- file name suffix to signify the file type;encoding
- byte encoding format if text data, default utf-8
text_writer
def text_writer(file_ext: str)
A file like writer for text data. Shorthand for .open
writer
def writer(file_ext: str)
A file like writer for binary data. Shorthand for .open
write_csv
def write_csv(data: list[list[str]])
Write object (list[list[str]]) as csv with excel dialect
write_as_json
def write_as_json(data, cls: Type[json.JSONEncoder] | None = None)
Write object as json, cls is same as in json.dump
write_stream
def write_stream(stream: io.BytesIO | io.StringIO, file_ext: str, encoding: str)
Write data directly from a IO buffer
save_value
def save_value(data: any, _special: any = None)
Save value directly outside of data storage. Can save upto 100 bytes and supports only native python types.
Note: this will be saved in 1NB, regardless of this repository's data storage settings.
open_no_upload
def open_no_upload(*, fn_write_bytes_to: Optional[WriteToBuffer] = None, fn_returning_fileobj: Optional[ReturnsReadable] = None, source_type: str, params: dict[str, bool | int | str | None])
Directly load data from an external source, and store the source parameters.
exp.open_no_upload(source_type="my_file", params=("/path/to/my/file",))
s3_client = boto3.client("s3") def download_from_s3(buffer, bucket, key): s3_client.download_fileobj(bucket, key, buffer) exp.open_no_upload(source_type="my_s3", fn_write_bytes_to=download_from_s3, params=("my_bucket", "my_key"))
Arguments:
fn_write_bytes_to
optional - a function that writes the data into a buffer and the buffer is supplied as the first parameterfn_returning_fileobj
optional - a function that returns a readable file like object that points to the datasource_type
- string for user referenceparams
- parameters passed to the function (or python's builtin open(file, ...) function)
Data Import
class Imported(ModuleType, SupportsBytes, LocalStorePaths)
Buffered imported data.
Get a file like readable object with .reader()
.
Use it wherever a readable file-like object can be used.
from OneNB.i.repo.tag import MyData json.load(MyData.reader())
If its a value, use: .value
reader
def reader()
Get a readable binary file like object that can be used wherever a file like object can be used.
pd.read_csv(MyData.reader())
value
@property def value()
Get the value if it is stored with Exported.save_value or Exported.open_no_upload methods.