From Concept to Creation: The Journey of Building Montagix SDK

/

Aug 22, 2023

Building a video editor is no easy challenge, doing it in the browser is even harder. In this article, we are going to discuss the issues we faced, the decisions we took, and the insights we gathered during the development process of Montagix SDK.

Before diving in, let’s briefly mention the problem Montagix SDK is solving. Historically, video editing on the web requires specialized applications, be it Adobe Premiere Pro or web-based tools such as veed.io. Though efficient, these tools disrupt the user from a continuous flow. Imagine the following use case: You are uploading a video on a platform and it does not match the necessary dimensions, what do you do? Or, let’s say you have recorded a video in Loom and want to remove a specific section from it, what do you do?

The examples can go on and on, but in most situations, you would have to take the video and upload it somewhere else to edit it, then download the video and share it with others, or reupload it on your preferred platform. If you have limited experience with editing tools, it adds a few more steps to the whole editing process. Montagix SDK steps in to offer a seamless solution, allowing developers to integrate it directly into websites, adding the editing experience they need, like allowing users to trim videos directly on the website, or even access to a fully-fledged video editor without ever leaving the site.

Underlying Technology: Canvas Decisions

We can all agree that images, videos, and text should be rendered on a canvas since it’s highly performant. Canvases have two contexts: WebGL or 2D, each of them being tailored for different use cases.

Initially, the 2D canvas API was more API-bound to using the CPU but as browsers evolved, many started to hardware-accelerate 2D canvas operations, using the GPU for better performance, especially for certain operations like scaling, translations, and rotations. On the other hand, the WebGL canvas was explicitly designed to take advantage of the GPU capabilities, allowing developers to write shader programs that run directly on the GPU.

Both of these technologies seemed like a good solution, but after a few experiments, we came to the realization that processing effects and filters on lower-end devices is far more problematic. Ensuring the editor’s smooth performance across devices was crucial to us. Moreover, there is one more feature that the SDK is going to support and it’s imperative that the rendering is performant, but more on that in another article.

We decided to go with the WebGL canvas for all the reasons mentioned above. One of the main disadvantages of using the WebGL canvas is its complexity and the debugging difficulty. Pixi.js became our go-to rendering engine due to its ability to abstract the complexities of WebGL, offering intuitive APIs. With its recent v8 version introducing WebGPU support, making the implementation 2.5x faster, our choice was even more justified.

We’re harnessing the power of Pixi.js for drawing the elements on the screen, moving them around, resizing, rotating, applying filters and effects, loading and unloading, preloading, and much more.

Structuring Video Display

Deciding what to display on the screen and when to show it to the user is another challenge. For this one, we took inspiration from VideoContext.js, an experiment that leverages the power of Scene Graphs to decide how to display elements on the screen. In essence, everything is structured as a node, and depending on the node’s start and end time, we can decide what’s displayed.

Handling User Uploads

Moving on, you might be wondering what happens when a user is uploading a clip, be it an image, a video file, or an audio file.

Images are transformed into webp format, which offers outstanding compression. This optimization makes the image 10x smaller (in some cases even more). The image is even stored in a serializable format, as an Uint8Array in our storage system for easy retrieval.

When it comes to video files, the situation slightly changes as we can’t apply the same type of compression on the spot. Compressing videos involves splitting the video into a series of images (decoding), then transforming all these images into a highly optimized format (encoding). Imagine having a 15-minute video running at 30fps — this would imply working with 27000 images in the browser. This is one of the few instances in the SDK where a server is needed as it can process this amount of data in seconds, whereas the browser would take several good minutes. We’ve tried some tricks to avoid the use of a server, but they didn’t show a huge improvement. Maybe we’ll revisit the solution in a nearby feature.

Storage

Storage is another interesting aspect of the video editor. Typically, desktop applications have access to the native file systems, making it easy to store files in the directory the app has access to. There was a similar initiative for the web some time ago, but it didn’t gain enough popularity due to security concerns and storage alternatives. Though the Filesystem API has potential, its limited support made us seek alternatives.

Local Storage goes directly off the list because it has a storage limit of 5-10MB, however, there is IndexedDB, which can store up to 50GB or even more depending on the browser. To abstract the logic away from sticking only to IndexedDB, we’re using browserFS, an emulated FS system for the web and it is working amazingly.

Having all these elements in place, it’s a breeze to track the current time in the playing video, preload and prepare the clip in the background by retrieving it from the storage and loading it to the scene (making it invisible), then displaying the clip on the screen and finally unloading it. This way, we can keep a low memory consumption ensuring that only the needed clips are displayed.

Rendering

You might be familiar with ffmpeg for video compilation. Although feasible, ffmpeg falls short with advanced features, such as special effects or filters. We took a different approach, specifically, we decided to capture each video frame, combine the frames together in a video, synchronize the newly generated video with the recorded audio, and generate the final video.

Let’s dive slightly deeper into how all these steps fit together, starting with capturing the frames. For instance, a 15-minute video at 30fps results in 27000 images, each frame taking 1920×1080×3 bytes = 6,220,800 bytes or roughly 6.22 MB, total 6.22 MB×27,000 = 167.94 GB. Doesn’t look very good, right? This extensive data requires efficient compression, achieved by capturing and encoding short video segments in succession. Parallel processing using workers and OffscreenCanvas also speeds up the process.

When it comes to audio, we have tried several solutions, all of them implying the use of AudioContext API and setting a custom destination so that the end user doesn’t hear the audio as it is being recorded. However, the main challenges we encountered were related to performance and synchronization. The approach we settled on was using a customized version of ffmpeg.wasm. We traversed the scene graph looking for audio nodes and created an ffmpeg query that could generate the entire audio track for us, and it works flawlessly.

Conclusion

The journey of creating Montagix SDK has been filled with numerous decisions, rigorous testing, and continuous optimization. As we've detailed, each component — from the rendering choice to storage — has been meticulously selected to provide a seamless video editing experience in browsers. By understanding these complexities and our commitment to addressing them, users can truly appreciate the power and potential of Montagix SDK in revolutionizing online video editing.

Blog

/

From Concept to Creation: The Journey of Building Montagix SDK

Aug 22, 2023

Building a video editor is no easy challenge, doing it in the browser is even harder. In this article, we are going to discuss the issues we faced, the decisions we took, and the insights we gathered during the development process of Montagix SDK.

Before diving in, let’s briefly mention the problem Montagix SDK is solving. Historically, video editing on the web requires specialized applications, be it Adobe Premiere Pro or web-based tools such as veed.io. Though efficient, these tools disrupt the user from a continuous flow. Imagine the following use case: You are uploading a video on a platform and it does not match the necessary dimensions, what do you do? Or, let’s say you have recorded a video in Loom and want to remove a specific section from it, what do you do?

The examples can go on and on, but in most situations, you would have to take the video and upload it somewhere else to edit it, then download the video and share it with others, or reupload it on your preferred platform. If you have limited experience with editing tools, it adds a few more steps to the whole editing process. Montagix SDK steps in to offer a seamless solution, allowing developers to integrate it directly into websites, adding the editing experience they need, like allowing users to trim videos directly on the website, or even access to a fully-fledged video editor without ever leaving the site.

Underlying Technology: Canvas Decisions

We can all agree that images, videos, and text should be rendered on a canvas since it’s highly performant. Canvases have two contexts: WebGL or 2D, each of them being tailored for different use cases.

Initially, the 2D canvas API was more API-bound to using the CPU but as browsers evolved, many started to hardware-accelerate 2D canvas operations, using the GPU for better performance, especially for certain operations like scaling, translations, and rotations. On the other hand, the WebGL canvas was explicitly designed to take advantage of the GPU capabilities, allowing developers to write shader programs that run directly on the GPU.

Both of these technologies seemed like a good solution, but after a few experiments, we came to the realization that processing effects and filters on lower-end devices is far more problematic. Ensuring the editor’s smooth performance across devices was crucial to us. Moreover, there is one more feature that the SDK is going to support and it’s imperative that the rendering is performant, but more on that in another article.

We decided to go with the WebGL canvas for all the reasons mentioned above. One of the main disadvantages of using the WebGL canvas is its complexity and the debugging difficulty. Pixi.js became our go-to rendering engine due to its ability to abstract the complexities of WebGL, offering intuitive APIs. With its recent v8 version introducing WebGPU support, making the implementation 2.5x faster, our choice was even more justified.

We’re harnessing the power of Pixi.js for drawing the elements on the screen, moving them around, resizing, rotating, applying filters and effects, loading and unloading, preloading, and much more.

Structuring Video Display

Deciding what to display on the screen and when to show it to the user is another challenge. For this one, we took inspiration from VideoContext.js, an experiment that leverages the power of Scene Graphs to decide how to display elements on the screen. In essence, everything is structured as a node, and depending on the node’s start and end time, we can decide what’s displayed.

Handling User Uploads

Moving on, you might be wondering what happens when a user is uploading a clip, be it an image, a video file, or an audio file.

Images are transformed into webp format, which offers outstanding compression. This optimization makes the image 10x smaller (in some cases even more). The image is even stored in a serializable format, as an Uint8Array in our storage system for easy retrieval.

When it comes to video files, the situation slightly changes as we can’t apply the same type of compression on the spot. Compressing videos involves splitting the video into a series of images (decoding), then transforming all these images into a highly optimized format (encoding). Imagine having a 15-minute video running at 30fps — this would imply working with 27000 images in the browser. This is one of the few instances in the SDK where a server is needed as it can process this amount of data in seconds, whereas the browser would take several good minutes. We’ve tried some tricks to avoid the use of a server, but they didn’t show a huge improvement. Maybe we’ll revisit the solution in a nearby feature.

Storage

Storage is another interesting aspect of the video editor. Typically, desktop applications have access to the native file systems, making it easy to store files in the directory the app has access to. There was a similar initiative for the web some time ago, but it didn’t gain enough popularity due to security concerns and storage alternatives. Though the Filesystem API has potential, its limited support made us seek alternatives.

Local Storage goes directly off the list because it has a storage limit of 5-10MB, however, there is IndexedDB, which can store up to 50GB or even more depending on the browser. To abstract the logic away from sticking only to IndexedDB, we’re using browserFS, an emulated FS system for the web and it is working amazingly.

Having all these elements in place, it’s a breeze to track the current time in the playing video, preload and prepare the clip in the background by retrieving it from the storage and loading it to the scene (making it invisible), then displaying the clip on the screen and finally unloading it. This way, we can keep a low memory consumption ensuring that only the needed clips are displayed.

Rendering

You might be familiar with ffmpeg for video compilation. Although feasible, ffmpeg falls short with advanced features, such as special effects or filters. We took a different approach, specifically, we decided to capture each video frame, combine the frames together in a video, synchronize the newly generated video with the recorded audio, and generate the final video.

Let’s dive slightly deeper into how all these steps fit together, starting with capturing the frames. For instance, a 15-minute video at 30fps results in 27000 images, each frame taking 1920×1080×3 bytes = 6,220,800 bytes or roughly 6.22 MB, total 6.22 MB×27,000 = 167.94 GB. Doesn’t look very good, right? This extensive data requires efficient compression, achieved by capturing and encoding short video segments in succession. Parallel processing using workers and OffscreenCanvas also speeds up the process.

When it comes to audio, we have tried several solutions, all of them implying the use of AudioContext API and setting a custom destination so that the end user doesn’t hear the audio as it is being recorded. However, the main challenges we encountered were related to performance and synchronization. The approach we settled on was using a customized version of ffmpeg.wasm. We traversed the scene graph looking for audio nodes and created an ffmpeg query that could generate the entire audio track for us, and it works flawlessly.

Conclusion

The journey of creating Montagix SDK has been filled with numerous decisions, rigorous testing, and continuous optimization. As we've detailed, each component — from the rendering choice to storage — has been meticulously selected to provide a seamless video editing experience in browsers. By understanding these complexities and our commitment to addressing them, users can truly appreciate the power and potential of Montagix SDK in revolutionizing online video editing.

Blog

/

From Concept to Creation: The Journey of Building Montagix SDK

Aug 22, 2023

Building a video editor is no easy challenge, doing it in the browser is even harder. In this article, we are going to discuss the issues we faced, the decisions we took, and the insights we gathered during the development process of Montagix SDK.

Before diving in, let’s briefly mention the problem Montagix SDK is solving. Historically, video editing on the web requires specialized applications, be it Adobe Premiere Pro or web-based tools such as veed.io. Though efficient, these tools disrupt the user from a continuous flow. Imagine the following use case: You are uploading a video on a platform and it does not match the necessary dimensions, what do you do? Or, let’s say you have recorded a video in Loom and want to remove a specific section from it, what do you do?

The examples can go on and on, but in most situations, you would have to take the video and upload it somewhere else to edit it, then download the video and share it with others, or reupload it on your preferred platform. If you have limited experience with editing tools, it adds a few more steps to the whole editing process. Montagix SDK steps in to offer a seamless solution, allowing developers to integrate it directly into websites, adding the editing experience they need, like allowing users to trim videos directly on the website, or even access to a fully-fledged video editor without ever leaving the site.

Underlying Technology: Canvas Decisions

We can all agree that images, videos, and text should be rendered on a canvas since it’s highly performant. Canvases have two contexts: WebGL or 2D, each of them being tailored for different use cases.

Initially, the 2D canvas API was more API-bound to using the CPU but as browsers evolved, many started to hardware-accelerate 2D canvas operations, using the GPU for better performance, especially for certain operations like scaling, translations, and rotations. On the other hand, the WebGL canvas was explicitly designed to take advantage of the GPU capabilities, allowing developers to write shader programs that run directly on the GPU.

Both of these technologies seemed like a good solution, but after a few experiments, we came to the realization that processing effects and filters on lower-end devices is far more problematic. Ensuring the editor’s smooth performance across devices was crucial to us. Moreover, there is one more feature that the SDK is going to support and it’s imperative that the rendering is performant, but more on that in another article.

We decided to go with the WebGL canvas for all the reasons mentioned above. One of the main disadvantages of using the WebGL canvas is its complexity and the debugging difficulty. Pixi.js became our go-to rendering engine due to its ability to abstract the complexities of WebGL, offering intuitive APIs. With its recent v8 version introducing WebGPU support, making the implementation 2.5x faster, our choice was even more justified.

We’re harnessing the power of Pixi.js for drawing the elements on the screen, moving them around, resizing, rotating, applying filters and effects, loading and unloading, preloading, and much more.

Structuring Video Display

Deciding what to display on the screen and when to show it to the user is another challenge. For this one, we took inspiration from VideoContext.js, an experiment that leverages the power of Scene Graphs to decide how to display elements on the screen. In essence, everything is structured as a node, and depending on the node’s start and end time, we can decide what’s displayed.

Handling User Uploads

Moving on, you might be wondering what happens when a user is uploading a clip, be it an image, a video file, or an audio file.

Images are transformed into webp format, which offers outstanding compression. This optimization makes the image 10x smaller (in some cases even more). The image is even stored in a serializable format, as an Uint8Array in our storage system for easy retrieval.

When it comes to video files, the situation slightly changes as we can’t apply the same type of compression on the spot. Compressing videos involves splitting the video into a series of images (decoding), then transforming all these images into a highly optimized format (encoding). Imagine having a 15-minute video running at 30fps — this would imply working with 27000 images in the browser. This is one of the few instances in the SDK where a server is needed as it can process this amount of data in seconds, whereas the browser would take several good minutes. We’ve tried some tricks to avoid the use of a server, but they didn’t show a huge improvement. Maybe we’ll revisit the solution in a nearby feature.

Storage

Storage is another interesting aspect of the video editor. Typically, desktop applications have access to the native file systems, making it easy to store files in the directory the app has access to. There was a similar initiative for the web some time ago, but it didn’t gain enough popularity due to security concerns and storage alternatives. Though the Filesystem API has potential, its limited support made us seek alternatives.

Local Storage goes directly off the list because it has a storage limit of 5-10MB, however, there is IndexedDB, which can store up to 50GB or even more depending on the browser. To abstract the logic away from sticking only to IndexedDB, we’re using browserFS, an emulated FS system for the web and it is working amazingly.

Having all these elements in place, it’s a breeze to track the current time in the playing video, preload and prepare the clip in the background by retrieving it from the storage and loading it to the scene (making it invisible), then displaying the clip on the screen and finally unloading it. This way, we can keep a low memory consumption ensuring that only the needed clips are displayed.

Rendering

You might be familiar with ffmpeg for video compilation. Although feasible, ffmpeg falls short with advanced features, such as special effects or filters. We took a different approach, specifically, we decided to capture each video frame, combine the frames together in a video, synchronize the newly generated video with the recorded audio, and generate the final video.

Let’s dive slightly deeper into how all these steps fit together, starting with capturing the frames. For instance, a 15-minute video at 30fps results in 27000 images, each frame taking 1920×1080×3 bytes = 6,220,800 bytes or roughly 6.22 MB, total 6.22 MB×27,000 = 167.94 GB. Doesn’t look very good, right? This extensive data requires efficient compression, achieved by capturing and encoding short video segments in succession. Parallel processing using workers and OffscreenCanvas also speeds up the process.

When it comes to audio, we have tried several solutions, all of them implying the use of AudioContext API and setting a custom destination so that the end user doesn’t hear the audio as it is being recorded. However, the main challenges we encountered were related to performance and synchronization. The approach we settled on was using a customized version of ffmpeg.wasm. We traversed the scene graph looking for audio nodes and created an ffmpeg query that could generate the entire audio track for us, and it works flawlessly.

Conclusion

The journey of creating Montagix SDK has been filled with numerous decisions, rigorous testing, and continuous optimization. As we've detailed, each component — from the rendering choice to storage — has been meticulously selected to provide a seamless video editing experience in browsers. By understanding these complexities and our commitment to addressing them, users can truly appreciate the power and potential of Montagix SDK in revolutionizing online video editing.