mirror of
https://github.com/livebook-dev/livebook.git
synced 2025-11-23 18:11:06 +08:00
Rely more on ADBC when possible
This commit is contained in:
parent
d66895b382
commit
158c949b36
5 changed files with 22 additions and 171 deletions
|
|
@ -62,13 +62,6 @@ defmodule Livebook.Notebook.Learn do
|
|||
cover_filename: "learn-deploy.svg"
|
||||
}
|
||||
},
|
||||
%{
|
||||
path: Path.join(__DIR__, "learn/intro_to_explorer.livemd"),
|
||||
details: %{
|
||||
description: "Intuitive data visualizations and data pipelines on the fly.",
|
||||
cover_filename: "explorer.png"
|
||||
}
|
||||
},
|
||||
%{
|
||||
path: Path.join(__DIR__, "learn/intro_to_vega_lite.livemd"),
|
||||
details: %{
|
||||
|
|
|
|||
|
|
@ -1,131 +0,0 @@
|
|||
# Data transform with Explorer
|
||||
|
||||
```elixir
|
||||
Mix.install([
|
||||
{:kino_explorer, "~> 0.1.20"}
|
||||
])
|
||||
```
|
||||
|
||||
## Introduction
|
||||
|
||||
To explore and transform data in Livebook we use two libraries:
|
||||
|
||||
* The [`explorer`](https://hexdocs.pm/explorer/)
|
||||
package brings series (one-dimensional) and dataframes (two-dimensional)
|
||||
for fast data exploration to Elixir.
|
||||
|
||||
* The [`kino_explorer`](https://hexdocs.pm/kino_explorer/)
|
||||
package automatically renders an `Explorer.DataFrame` or `Explorer.Series`
|
||||
as a data table.
|
||||
|
||||
We will make extensive use of `Explorer.DataFrame`'s functions, so it's handy
|
||||
to alias the module to something shorter. We will also
|
||||
[`require`](https://hexdocs.pm/elixir/Kernel.SpecialForms.html#require/2)
|
||||
the `Explorer.DataFrame` module to use its
|
||||
[querying facilities](https://hexdocs.pm/explorer/Explorer.Query.html#content):
|
||||
|
||||
```elixir
|
||||
alias Explorer.DataFrame, as: DF
|
||||
require Explorer.DataFrame
|
||||
```
|
||||
|
||||
All set, let's go.
|
||||
|
||||
## Quick introduction to Explorer and data tables
|
||||
|
||||
In short, Explorer is the DataFrame library for Elixir. It brings the
|
||||
essential data analysis, exploration, and transformation tools to the
|
||||
Elixir ecosystem, while the integration provided by KinoExplorer makes
|
||||
it effortless to visualize and interact with data through data tables.
|
||||
|
||||
Data tables offers a variety of features to make viewing data more
|
||||
convenient. You can easily select cells, columns or even the entire table,
|
||||
switch between paging and infinite scrolling, sort the columns or search
|
||||
for specific data. [Keyboard shortcuts](#shortcuts), including copy
|
||||
selection, are also available. Let's render a DataFrame to see an example:
|
||||
|
||||
```elixir
|
||||
Explorer.Datasets.fossil_fuels()
|
||||
|> DF.filter(contains(country, "A") and year < 2013)
|
||||
|> DF.select(["year", "country", "total"])
|
||||
```
|
||||
|
||||
<!-- livebook:{"branch_parent_index":0} -->
|
||||
|
||||
## The Data transform smart cell
|
||||
|
||||
Data tables let us quickly view the raw data, without modifying it,
|
||||
while the Data transform cell allows us to transform it, creating insightful
|
||||
and flexible data pipelines and seeing the results on the fly.
|
||||
|
||||
Before we explore its features, we need some data to work with.
|
||||
|
||||
```elixir
|
||||
teams =
|
||||
DF.new(
|
||||
weekday: [
|
||||
"Monday",
|
||||
"Tuesday",
|
||||
"Wednesday",
|
||||
"Thursday",
|
||||
"Friday",
|
||||
"Monday",
|
||||
"Tuesday",
|
||||
"Wednesday",
|
||||
"Thursday",
|
||||
"Friday"
|
||||
],
|
||||
team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
|
||||
hour: [10, 9, 10, 10, 11, 15, nil, 16, 14, 16]
|
||||
)
|
||||
```
|
||||
|
||||
<!-- livebook:{"attrs":{"assign_to":"weekdays","data_frame":"teams","data_frame_alias":"Elixir.DF","operations":[{"active":true,"column":"hour","message":null,"operation_type":"fill_missing","scalar":null,"strategy":"forward","type":"integer"},{"active":true,"column":"hour","filter":"greater","message":null,"operation_type":"filters","type":"integer","value":"10"},{"active":true,"column":"team","filter":"not equal","message":null,"operation_type":"filters","type":"string","value":"B"},{"active":true,"direction":"asc","operation_type":"sorting","sort_by":"weekday"},{"active":true,"names_from":"team","operation_type":"pivot_wider","values_from":"hour"}]},"chunks":null,"kind":"Elixir.KinoExplorer.DataTransformCell","livebook_object":"smart_cell"} -->
|
||||
|
||||
```elixir
|
||||
weekdays =
|
||||
teams
|
||||
|> DF.mutate(hour: fill_missing(hour, :forward))
|
||||
|> DF.filter(hour > 10 and team != "B")
|
||||
|> DF.arrange(asc: weekday)
|
||||
|> DF.pivot_wider("team", "hour")
|
||||
```
|
||||
|
||||
Let's break down what happened in the previous Data transform cell.
|
||||
|
||||
Currently, the Data transform cell supports several operations, such as `sorting`,
|
||||
`filter`, `pivot_wider`, and more. Each operation has its own colored card and
|
||||
you can move the cards to reorder the operations and see the changes in real time.
|
||||
|
||||
Except for `pivot_wider`, you can have multiple operations of any type. If two
|
||||
similar operations are in a row, they are grouped, and a single query command is
|
||||
generated for them in the code. However, you still have individual control over each.
|
||||
|
||||
You can also use the toggle button to enable/disable the operations. This is
|
||||
particularly useful for seeing the implications of each step in your pipeline
|
||||
without having to rewrite it.
|
||||
|
||||
The initial state is purely a suggestion. You can easily add and remove operations
|
||||
to get the pipeline that meets your needs.
|
||||
|
||||
Finally, the `assign to` field allows you to save the resulting DataFrame in a
|
||||
variable for later use in the notebook or in conjunction with other Smart Cells.
|
||||
For example, to plot a chart using the Chart cell.
|
||||
|
||||
## Shortcuts
|
||||
|
||||
| Key Combo | Description |
|
||||
| ------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------- |
|
||||
| <kbd>↑</kbd> <kbd>↓</kbd> <kbd>←</kbd> <kbd>→</kbd> | Moves the currently selected cell and clears other selections |
|
||||
| <kbd>shift</kbd> + <kbd>↑</kbd> <kbd>↓</kbd> <kbd>←</kbd> <kbd>→</kbd> | Extends the current selection range in the direction pressed |
|
||||
| <kbd>alt</kbd> + <kbd>↑</kbd> <kbd>↓</kbd> <kbd>←</kbd> <kbd>→</kbd> | Moves the currently selected cell and retains the current selection |
|
||||
| <kbd>ctrl</kbd> + <kbd>↑</kbd> <kbd>↓</kbd> <kbd>←</kbd> <kbd>→</kbd> | Moves the selection as far as possible in the direction pressed |
|
||||
| <kbd>ctrl</kbd> + <kbd>shift</kbd> + <kbd>↑</kbd> <kbd>↓</kbd> <kbd>←</kbd> <kbd>→</kbd> | Extends the selection as far as possible in the direction pressed |
|
||||
| <kbd>shift</kbd> + <kbd>home</kbd> <kbd>end</kbd> | Extends the selection as far as possible in the direction pressed |
|
||||
| <kbd>ctrl</kbd> + <kbd>A</kbd> | Selects all cells |
|
||||
| <kbd>shift</kbd> + <kbd>␣</kbd> | Selects the current row |
|
||||
| <kbd>ctrl</kbd> + <kbd>␣</kbd> | Selects the current col |
|
||||
| <kbd>esc</kbd> | Clear the current selection |
|
||||
| <kbd>ctrl</kbd> + <kbd>C</kbd> | Copies the current selection |
|
||||
| <kbd>ctrl</kbd> + <kbd>home</kbd> <kbd>end</kbd> | Moves the selection to the first/last cell in the data table |
|
||||
| <kbd>ctrl</kbd> + <kbd>shift</kbd> + <kbd>home</kbd> <kbd>end</kbd> | Extends the selection to the first/last cell in the data table |
|
||||
|
|
@ -10,10 +10,8 @@ Mix.install([
|
|||
|
||||
Throughout the Learning section, we have used Kino several times.
|
||||
Sometimes we use built-in Kinos, such as using `Kino.Control` and
|
||||
`Kino.Frame` to [deploy applications](/learn/notebooks/deploy-apps),
|
||||
other times we used custom Kinos tailored for
|
||||
[data exploration](/learn/notebooks/intro-to-explorer) or
|
||||
[plotting](/learn/notebooks/intro-to-vega-lite).
|
||||
`Kino.Frame` to [deploy applications](/learn/notebooks/deploy-apps)
|
||||
or for [plotting](/learn/notebooks/intro-to-vega-lite).
|
||||
|
||||
In this notebook, we will explore several of the built-in Kinos.
|
||||
`kino` is already listed as a dependency, so let's get started.
|
||||
|
|
|
|||
|
|
@ -17,7 +17,7 @@ defmodule Livebook.Runtime.Definitions do
|
|||
|
||||
kino_db = %{
|
||||
name: "kino_db",
|
||||
dependency: %{dep: {:kino_db, "~> 0.3.0"}, config: []}
|
||||
dependency: %{dep: {:kino_db, "~> 0.4.0"}, config: []}
|
||||
}
|
||||
|
||||
exqlite = %{
|
||||
|
|
@ -50,11 +50,6 @@ defmodule Livebook.Runtime.Definitions do
|
|||
dependency: %{dep: {:torchx, ">= 0.0.0"}, config: [nx: [default_backend: Torchx.Backend]]}
|
||||
}
|
||||
|
||||
kino_explorer = %{
|
||||
name: "kino_explorer",
|
||||
dependency: %{dep: {:kino_explorer, "~> 0.1.20"}, config: []}
|
||||
}
|
||||
|
||||
kino_flame = %{
|
||||
name: "kino_flame",
|
||||
dependency: %{dep: {:kino_flame, "~> 0.1.5"}, config: []}
|
||||
|
|
@ -85,6 +80,11 @@ defmodule Livebook.Runtime.Definitions do
|
|||
dependency: %{dep: {:yaml_elixir, "~> 2.0"}, config: []}
|
||||
}
|
||||
|
||||
adbc = %{
|
||||
name: "adbc",
|
||||
dependency: %{dep: {:adbc, "~> 0.8"}, config: []}
|
||||
}
|
||||
|
||||
windows? = match?({:win32, _}, :os.type())
|
||||
nx_backend_package = if(windows?, do: torchx, else: exla)
|
||||
|
||||
|
|
@ -119,7 +119,6 @@ defmodule Livebook.Runtime.Definitions do
|
|||
name: "DuckDB",
|
||||
packages: [
|
||||
kino_db,
|
||||
kino_explorer,
|
||||
%{
|
||||
name: "adbc",
|
||||
dependency: %{dep: {:adbc, ">= 0.0.0"}, config: [adbc: [drivers: [:duckdb]]]}
|
||||
|
|
@ -130,7 +129,6 @@ defmodule Livebook.Runtime.Definitions do
|
|||
name: "Google BigQuery",
|
||||
packages: [
|
||||
kino_db,
|
||||
kino_explorer,
|
||||
%{
|
||||
name: "adbc",
|
||||
dependency: %{dep: {:adbc, ">= 0.0.0"}, config: [adbc: [drivers: [:bigquery]]]}
|
||||
|
|
@ -155,7 +153,6 @@ defmodule Livebook.Runtime.Definitions do
|
|||
name: "Snowflake",
|
||||
packages: [
|
||||
kino_db,
|
||||
kino_explorer,
|
||||
%{
|
||||
name: "adbc",
|
||||
dependency: %{dep: {:adbc, ">= 0.0.0"}, config: [adbc: [drivers: [:snowflake]]]}
|
||||
|
|
@ -225,16 +222,6 @@ defmodule Livebook.Runtime.Definitions do
|
|||
}
|
||||
]
|
||||
},
|
||||
%{
|
||||
kind: "Elixir.KinoExplorer.DataTransformCell",
|
||||
name: "Data transform",
|
||||
requirement_presets: [
|
||||
%{
|
||||
name: "Default",
|
||||
packages: [kino_explorer]
|
||||
}
|
||||
]
|
||||
},
|
||||
%{
|
||||
kind: "Elixir.Kino.RemoteExecutionCell",
|
||||
name: "Remote execution",
|
||||
|
|
@ -318,24 +305,28 @@ defmodule Livebook.Runtime.Definitions do
|
|||
%{
|
||||
type: :file_action,
|
||||
file_types: ["text/csv"],
|
||||
description: "Create a dataframe",
|
||||
description: "Load into DuckDB",
|
||||
source: """
|
||||
df =
|
||||
Kino.FS.file_path("{{NAME}}")
|
||||
|> Explorer.DataFrame.from_csv!()\
|
||||
Adbc.download_driver!(:duckdb)
|
||||
db = Kino.start_child!({Adbc.Database, driver: :duckdb})
|
||||
conn = Kino.start_child!({Adbc.Connection, database: db})
|
||||
path = Kino.FS.file_path("{{NAME}}")
|
||||
Adbc.Connection.query!(conn, "SELECT * FROM read_csv($1)", [path])
|
||||
""",
|
||||
packages: [kino, kino_explorer]
|
||||
packages: [kino, kino_db, adbc]
|
||||
},
|
||||
%{
|
||||
type: :file_action,
|
||||
file_types: [".parquet"],
|
||||
description: "Create a dataframe",
|
||||
description: "Load into DuckDB",
|
||||
source: """
|
||||
df =
|
||||
Kino.FS.file_spec("{{NAME}}")
|
||||
|> Explorer.DataFrame.from_parquet!(lazy: true)\
|
||||
Adbc.download_driver!(:duckdb)
|
||||
db = Kino.start_child!({Adbc.Database, driver: :duckdb})
|
||||
conn = Kino.start_child!({Adbc.Connection, database: db})
|
||||
path = Kino.FS.file_path("{{NAME}}")
|
||||
Adbc.Connection.query!(conn, "SELECT * FROM read_parquet($1)", [path])
|
||||
""",
|
||||
packages: [kino, kino_explorer]
|
||||
packages: [kino, kino_db, adbc]
|
||||
},
|
||||
%{
|
||||
type: :file_action,
|
||||
|
|
|
|||
Binary file not shown.
|
Before Width: | Height: | Size: 73 KiB |
Loading…
Add table
Reference in a new issue