Rely more on ADBC when possible

This commit is contained in:
José Valim 2025-10-24 21:12:06 +02:00
parent d66895b382
commit 158c949b36
5 changed files with 22 additions and 171 deletions

View file

@ -62,13 +62,6 @@ defmodule Livebook.Notebook.Learn do
cover_filename: "learn-deploy.svg"
}
},
%{
path: Path.join(__DIR__, "learn/intro_to_explorer.livemd"),
details: %{
description: "Intuitive data visualizations and data pipelines on the fly.",
cover_filename: "explorer.png"
}
},
%{
path: Path.join(__DIR__, "learn/intro_to_vega_lite.livemd"),
details: %{

View file

@ -1,131 +0,0 @@
# Data transform with Explorer
```elixir
Mix.install([
{:kino_explorer, "~> 0.1.20"}
])
```
## Introduction
To explore and transform data in Livebook we use two libraries:
* The [`explorer`](https://hexdocs.pm/explorer/)
package brings series (one-dimensional) and dataframes (two-dimensional)
for fast data exploration to Elixir.
* The [`kino_explorer`](https://hexdocs.pm/kino_explorer/)
package automatically renders an `Explorer.DataFrame` or `Explorer.Series`
as a data table.
We will make extensive use of `Explorer.DataFrame`'s functions, so it's handy
to alias the module to something shorter. We will also
[`require`](https://hexdocs.pm/elixir/Kernel.SpecialForms.html#require/2)
the `Explorer.DataFrame` module to use its
[querying facilities](https://hexdocs.pm/explorer/Explorer.Query.html#content):
```elixir
alias Explorer.DataFrame, as: DF
require Explorer.DataFrame
```
All set, let's go.
## Quick introduction to Explorer and data tables
In short, Explorer is the DataFrame library for Elixir. It brings the
essential data analysis, exploration, and transformation tools to the
Elixir ecosystem, while the integration provided by KinoExplorer makes
it effortless to visualize and interact with data through data tables.
Data tables offers a variety of features to make viewing data more
convenient. You can easily select cells, columns or even the entire table,
switch between paging and infinite scrolling, sort the columns or search
for specific data. [Keyboard shortcuts](#shortcuts), including copy
selection, are also available. Let's render a DataFrame to see an example:
```elixir
Explorer.Datasets.fossil_fuels()
|> DF.filter(contains(country, "A") and year < 2013)
|> DF.select(["year", "country", "total"])
```
<!-- livebook:{"branch_parent_index":0} -->
## The Data transform smart cell
Data tables let us quickly view the raw data, without modifying it,
while the Data transform cell allows us to transform it, creating insightful
and flexible data pipelines and seeing the results on the fly.
Before we explore its features, we need some data to work with.
```elixir
teams =
DF.new(
weekday: [
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday"
],
team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
hour: [10, 9, 10, 10, 11, 15, nil, 16, 14, 16]
)
```
<!-- livebook:{"attrs":{"assign_to":"weekdays","data_frame":"teams","data_frame_alias":"Elixir.DF","operations":[{"active":true,"column":"hour","message":null,"operation_type":"fill_missing","scalar":null,"strategy":"forward","type":"integer"},{"active":true,"column":"hour","filter":"greater","message":null,"operation_type":"filters","type":"integer","value":"10"},{"active":true,"column":"team","filter":"not equal","message":null,"operation_type":"filters","type":"string","value":"B"},{"active":true,"direction":"asc","operation_type":"sorting","sort_by":"weekday"},{"active":true,"names_from":"team","operation_type":"pivot_wider","values_from":"hour"}]},"chunks":null,"kind":"Elixir.KinoExplorer.DataTransformCell","livebook_object":"smart_cell"} -->
```elixir
weekdays =
teams
|> DF.mutate(hour: fill_missing(hour, :forward))
|> DF.filter(hour > 10 and team != "B")
|> DF.arrange(asc: weekday)
|> DF.pivot_wider("team", "hour")
```
Let's break down what happened in the previous Data transform cell.
Currently, the Data transform cell supports several operations, such as `sorting`,
`filter`, `pivot_wider`, and more. Each operation has its own colored card and
you can move the cards to reorder the operations and see the changes in real time.
Except for `pivot_wider`, you can have multiple operations of any type. If two
similar operations are in a row, they are grouped, and a single query command is
generated for them in the code. However, you still have individual control over each.
You can also use the toggle button to enable/disable the operations. This is
particularly useful for seeing the implications of each step in your pipeline
without having to rewrite it.
The initial state is purely a suggestion. You can easily add and remove operations
to get the pipeline that meets your needs.
Finally, the `assign to` field allows you to save the resulting DataFrame in a
variable for later use in the notebook or in conjunction with other Smart Cells.
For example, to plot a chart using the Chart cell.
## Shortcuts
| Key Combo | Description |
| ------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------- |
| <kbd>↑</kbd> <kbd>↓</kbd> <kbd>←</kbd> <kbd>→</kbd> | Moves the currently selected cell and clears other selections |
| <kbd>shift</kbd> + <kbd>↑</kbd> <kbd>↓</kbd> <kbd>←</kbd> <kbd>→</kbd> | Extends the current selection range in the direction pressed |
| <kbd>alt</kbd> + <kbd>↑</kbd> <kbd>↓</kbd> <kbd>←</kbd> <kbd>→</kbd> | Moves the currently selected cell and retains the current selection |
| <kbd>ctrl</kbd> + <kbd>↑</kbd> <kbd>↓</kbd> <kbd>←</kbd> <kbd>→</kbd> | Moves the selection as far as possible in the direction pressed |
| <kbd>ctrl</kbd> + <kbd>shift</kbd> + <kbd>↑</kbd> <kbd>↓</kbd> <kbd>←</kbd> <kbd>→</kbd> | Extends the selection as far as possible in the direction pressed |
| <kbd>shift</kbd> + <kbd>home</kbd> <kbd>end</kbd> | Extends the selection as far as possible in the direction pressed |
| <kbd>ctrl</kbd> + <kbd>A</kbd> | Selects all cells |
| <kbd>shift</kbd> + <kbd>␣</kbd> | Selects the current row |
| <kbd>ctrl</kbd> + <kbd>␣</kbd> | Selects the current col |
| <kbd>esc</kbd> | Clear the current selection |
| <kbd>ctrl</kbd> + <kbd>C</kbd> | Copies the current selection |
| <kbd>ctrl</kbd> + <kbd>home</kbd> <kbd>end</kbd> | Moves the selection to the first/last cell in the data table |
| <kbd>ctrl</kbd> + <kbd>shift</kbd> + <kbd>home</kbd> <kbd>end</kbd> | Extends the selection to the first/last cell in the data table |

View file

@ -10,10 +10,8 @@ Mix.install([
Throughout the Learning section, we have used Kino several times.
Sometimes we use built-in Kinos, such as using `Kino.Control` and
`Kino.Frame` to [deploy applications](/learn/notebooks/deploy-apps),
other times we used custom Kinos tailored for
[data exploration](/learn/notebooks/intro-to-explorer) or
[plotting](/learn/notebooks/intro-to-vega-lite).
`Kino.Frame` to [deploy applications](/learn/notebooks/deploy-apps)
or for [plotting](/learn/notebooks/intro-to-vega-lite).
In this notebook, we will explore several of the built-in Kinos.
`kino` is already listed as a dependency, so let's get started.

View file

@ -17,7 +17,7 @@ defmodule Livebook.Runtime.Definitions do
kino_db = %{
name: "kino_db",
dependency: %{dep: {:kino_db, "~> 0.3.0"}, config: []}
dependency: %{dep: {:kino_db, "~> 0.4.0"}, config: []}
}
exqlite = %{
@ -50,11 +50,6 @@ defmodule Livebook.Runtime.Definitions do
dependency: %{dep: {:torchx, ">= 0.0.0"}, config: [nx: [default_backend: Torchx.Backend]]}
}
kino_explorer = %{
name: "kino_explorer",
dependency: %{dep: {:kino_explorer, "~> 0.1.20"}, config: []}
}
kino_flame = %{
name: "kino_flame",
dependency: %{dep: {:kino_flame, "~> 0.1.5"}, config: []}
@ -85,6 +80,11 @@ defmodule Livebook.Runtime.Definitions do
dependency: %{dep: {:yaml_elixir, "~> 2.0"}, config: []}
}
adbc = %{
name: "adbc",
dependency: %{dep: {:adbc, "~> 0.8"}, config: []}
}
windows? = match?({:win32, _}, :os.type())
nx_backend_package = if(windows?, do: torchx, else: exla)
@ -119,7 +119,6 @@ defmodule Livebook.Runtime.Definitions do
name: "DuckDB",
packages: [
kino_db,
kino_explorer,
%{
name: "adbc",
dependency: %{dep: {:adbc, ">= 0.0.0"}, config: [adbc: [drivers: [:duckdb]]]}
@ -130,7 +129,6 @@ defmodule Livebook.Runtime.Definitions do
name: "Google BigQuery",
packages: [
kino_db,
kino_explorer,
%{
name: "adbc",
dependency: %{dep: {:adbc, ">= 0.0.0"}, config: [adbc: [drivers: [:bigquery]]]}
@ -155,7 +153,6 @@ defmodule Livebook.Runtime.Definitions do
name: "Snowflake",
packages: [
kino_db,
kino_explorer,
%{
name: "adbc",
dependency: %{dep: {:adbc, ">= 0.0.0"}, config: [adbc: [drivers: [:snowflake]]]}
@ -225,16 +222,6 @@ defmodule Livebook.Runtime.Definitions do
}
]
},
%{
kind: "Elixir.KinoExplorer.DataTransformCell",
name: "Data transform",
requirement_presets: [
%{
name: "Default",
packages: [kino_explorer]
}
]
},
%{
kind: "Elixir.Kino.RemoteExecutionCell",
name: "Remote execution",
@ -318,24 +305,28 @@ defmodule Livebook.Runtime.Definitions do
%{
type: :file_action,
file_types: ["text/csv"],
description: "Create a dataframe",
description: "Load into DuckDB",
source: """
df =
Kino.FS.file_path("{{NAME}}")
|> Explorer.DataFrame.from_csv!()\
Adbc.download_driver!(:duckdb)
db = Kino.start_child!({Adbc.Database, driver: :duckdb})
conn = Kino.start_child!({Adbc.Connection, database: db})
path = Kino.FS.file_path("{{NAME}}")
Adbc.Connection.query!(conn, "SELECT * FROM read_csv($1)", [path])
""",
packages: [kino, kino_explorer]
packages: [kino, kino_db, adbc]
},
%{
type: :file_action,
file_types: [".parquet"],
description: "Create a dataframe",
description: "Load into DuckDB",
source: """
df =
Kino.FS.file_spec("{{NAME}}")
|> Explorer.DataFrame.from_parquet!(lazy: true)\
Adbc.download_driver!(:duckdb)
db = Kino.start_child!({Adbc.Database, driver: :duckdb})
conn = Kino.start_child!({Adbc.Connection, database: db})
path = Kino.FS.file_path("{{NAME}}")
Adbc.Connection.query!(conn, "SELECT * FROM read_parquet($1)", [path])
""",
packages: [kino, kino_explorer]
packages: [kino, kino_db, adbc]
},
%{
type: :file_action,

Binary file not shown.

Before

Width:  |  Height:  |  Size: 73 KiB