Tort Rodeo: A Framework for Audio Research and Production
Hello everyone, my name is Vincent Kendall Miller, and I'm a software engineer at Facebook. Today, I'll be discussing tort rodeo in the context of the summer hackathon. The goal of tor to do is to provide building blocks to other researchers and engineers that allows them to bring research to production.
To achieve this goal, we're building around three core functionalities: IO, transforms, and compatibility.
First, let's talk about IO. Our first functionality is to read and save tensors from various file formats like mp3, wav, and flat files. We can also download and use common audio datasets where samples are loaded in parallel using torch multi-processing workers. This allows us to efficiently process large datasets and load them into our framework.
Here's an example of how we can load a dataset and process it:
```python
import torch
# Load the dataset
dataset = torch.load('path/to/dataset.pth')
# Create a data loader
data_loader = torch.utils.data.DataLoader(dataset, batch_size=32)
# Iterate over the batches
for batch in data_loader:
# Process the batch
processed_batch = ...
```
In this example, we load a dataset from a file, create a data loader to process the data, and then iterate over the batches. We can use torch multi-processing workers to process each batch in parallel.
Our second functionality is transforms for audio in signal processing. We provide spectrogram, mHDC, and other resembling transforms as neural network modules in tort rodeo. These transforms are written using pure PyTorch operations, which allows us to compile the computations on the GPU and use torch script for compilation.
Here's an example of how we can use a transform to process audio data:
```python
import torodeo
# Load the audio data
audio_data = ...
# Create a spectrogram transform
transform = torodeo.transforms.Spectrogram()
# Apply the transform to the audio data
output = transform(audio_data)
```
In this example, we load audio data, create a spectrogram transform, and apply it to the data. The output is a spectrogram tensor that can be used for further processing.
Our third functionality is compatibility with the C++ library CallECology. We provide reading and writing of these binary files as well as equivalent features like spectrogram and FBank.
Here's an example of how we can use the CallECology library to read a file:
```python
import torodeo
# Read the file
file_data = ...
# Create a CallECology object
call_ecology = torodeo.CallEcology()
# Load the data from the file
data = call_ecology.load_file(file_data)
```
In this example, we create a CallECology object and load data from a file using the `load_file` method.
Additionally, we provide a code snippet that uses the LibriSpeech dataset. This dataset is usually too large to fit in memory, so we use a technique called "demand loading" where each data point is loaded on demand as needed.
```python
import torodeo
# Load the dataset
dataset = torch.load('path/to/dataset.pth')
# Create a data loader
data_loader = torch.utils.data.DataLoader(dataset, batch_size=32)
# Iterate over the batches
for batch in data_loader:
# Process the batch
processed_batch = ...
# Use demand loading to load each data point on demand
for i, batch in enumerate(data_loader):
# Load the data point for this batch
data_point = dataset[i]
```
In this example, we create a data loader and iterate over the batches. For each batch, we use demand loading to load the corresponding data point from the dataset.
We also provide a compatibility interface that allows us to read CallECology files using torch. This means we can seamlessly integrate our framework with other tools and libraries that support CallECology.
```python
import torodeo
# Read the file
file_data = ...
# Create a torch CallECology object
torch_call_ecology = torodeo.TorchCallEcology(file_data)
```
In this example, we create a torch CallECology object and load data from a file using the `TorchedCallEcology` class.
Finally, we provide an upcoming feature that will allow us to use filter banks with fair seek. This will enable us to transcribe audio data using these filters.
```python
import torodeo
# Create a FairSeekFilterBank object
filter_bank = torodeo.FairSeekFilterBank()
# Apply the filter bank to the audio data
output = filter_bank(audio_data)
```
In this example, we create a FairSeekFilterBank object and apply it to the audio data. The output is a filtered version of the original audio data.
We also provide an example training pipeline for speech recognition that uses a decoder interface. This will enable us to train our models using a variety of different decoding algorithms.
```python
import torodeo
# Create a decoder object
decoder = torodeo.Decoder()
# Train the model
model = decoder.train(model, dataset)
```
In this example, we create a decoder object and train it on a dataset. The output is a trained model that can be used for speech recognition tasks.
Thank you for your attention, and I hope this provides a comprehensive overview of tort rodeo and its capabilities.