OMR Act 2 - Filetype Agnostic Music Rendering?

Our rendering engine is heavily dependent on the SVG format where we use simple tags such as groups, lines, paths and translate. However, thinking of our music renderer from a machine learning perspective, we may want to use this library to create ground truth data without the need for manual labelling. Exemplary outputs would be a pixel-wise segmentation mask or class-wise bounding boxes depending on the use-case.
A rendered PNG.
A rendered segmentation mask.
Bounding boxes.

As we form SVG groups early on, any detailed information about the location of classes such as note heads, note stems or staff lines is lost in the process. But could we do better? Could we offer a filetype-agnostic model that contains all the logic to place things but none of the SVG-related semantics? The answer is yes! But we have to tweak our model that we described in Act 1. Interestingly, the changes that we have to apply are minor, it cleans up the code and we have a better separation of concerns!

To achieve something filetype agnostic, we have to throw away everything that is inherent to SVG and introduce one indirection. This indirection will be a RenderModel struct which is constructed by a Score, handling all the complicated logic where to place note heads, stems and beams, which padding should be applied or whether the measures fit in one row. Once the RenderModel has been created, there should be functions to create all kinds of data. This includes With the introduction of a RenderModel struct, we only have to calculate the logic to place things once, and then create all kinds of outputs. The public interface will look as follows:
impl RenderModel {
    /// Handles all the complicated logic to build a RenderModel
    pub fn from_score(score: &Score, score_max_width: f64) -> RenderModel;

    // RenderModel to SvgDocument
    pub fn svg(&self) -> SvgDocument;

    // RenderModel to Bitamp (= image::RgbaImage)
    pub fn bitmap(&self) -> Bitmap;

    // RenderModel to SegmentationMask
    pub fn segmentation_mask(&self, element_classes: ElementClasses) -> SegmentationMask;

    // RenderModel to BoundingBoxes (= Vec<BoundingBox>)
    pub fn bounding_boxes(&self, element_classes: ElementClasses) -> BoundingBoxes;
}
Now, for the structure of the RenderModel itself, we have to focus on the fundamental building blocks that were needed to render an SVG. Starting with the symbols for music notation, we narrowed them down to three categories These are captured in an ElementData enum which is part of an RenderElement struct. Alongside the ElementData, this struct also contains an ElementClass which holds the information about the class used for the machine learning task. In addition to that, the end-user should be able to specify which classes to use during the dataset creation covered in Act 3.
#[derive(Debug, Clone)]
enum ElementData {
    Line(Line),
    Glyph(rusttype::PositionedGlyph),
    Text(Text),
}

#[derive(Debug, Clone)]
pub struct RenderElement {
    data: ElementData,
    cls: ElementClass,
}

#[derive(Debug, Clone, Copy)]
pub enum ElementClass {
    Background,
    AccidentalNatural,
    AccidentalSharp,
    AccidentalFlat,
    Barline,
    Beats,
    BeatsType,
    Beam,
    ClefG,
    ClefF,
    NoteWhole,
    NoteHalf,
    NoteFilled,
    NoteDot,
    NoteStem,
    NoteStemArm,
    NoteRest,
    Staff,
    StaffHelper,
}
Now that we can represent a single element, we have to connect them. A key part of building an SVG was about creating groups, connecting them hierarchically and translating each element. This is not inherent to SVGs alone and thus, we define a RenderTree element which holds data in form of Vec<RenderElement>, children to other RenderTrees, as well an optional translation Translate that is applied to both the data and the children objects. Finally, the RenderModel struct will simply represent the root of the tree and serve as public API endpoint.
#[derive(Debug, Clone)]
struct RenderTree {
    transform: Option<Translate>,
    data: Vec<RenderElement>,
    children: Vec<RenderTree>,
}

#[derive(Debug, Clone)]
struct Translate {
    x: f64,
    y: f64,
}

#[derive(Debug, Clone)]
pub struct RenderModel {
    tree: RenderTree,
}
Sparing implementation details, we are now able to generate svgs, bitmaps, segmentation masks or bounding boxes given a RenderModel instance. Therefore, we are set to start with the real part, the part where we create some cool datasets that we use in a machine learning pipeline!

Act 1

Writing a Music Renderer from Scratch

Act 3

Creating Simple Datasets for OMR