Media sets can be read from and written to incrementally. For an overview of incremental transforms and when to use them, see the incremental overview and incremental reference.
To make your media transforms incremental, use the incremental decorator and set v2_semantics=True
. If v2_semantics
is not set, then media sets cannot be used incrementally.
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
from transforms.api import transform, incremental from transforms.mediasets import MediaSetInput, MediaSetOutput @incremental(v2_semantics=True) @transform( input_PNGs=MediaSetInput('/examples/input_PNGs'), output_PNGs=MediaSetOutput('/examples/output_PNGs'), ) def upload_pngs(input_PNGs, output_PNGs): # Returns a dataframe that only includes the media items added since the last build listed_pngs = input_PNGs.dataframe() def fast_copy_media_item(row): output_PNGs.fast_copy_media_item(input_PNGs, row.mediaItemRid, row.path) # Fast copies all of the items in `listed_pngs` into the output media set # These items will be appended to the output if this transform is running incrementally, or they will replace the # output if the transform is not running incrementally listed_pngs.foreach(fast_copy_media_item)
In the example above, the transform will write to output_PNGs
using the modify
write mode. Only the media items that have been added to the input media set since the last build will be processed. If the transform cannot run incrementally, the output will be written with the replace
write mode and the entire input will be read. See below for requirements.
When v2_semantics
is set to True
, incremental media sets can be used in combination with any number of other incremental inputs and outputs. This includes datasets and virtual tables.
Every incremental input and output contributes to determining whether a transform can run incrementally. Refer to the incremental transforms reference for more information on when a dataset will prevent a transform from running incrementally.
A media set output can prevent a transform from running incrementally when:
A media set input can prevent a transform from running incrementally when:
replace
write mode.If the media set input is included as a snapshot_input
, then it will not prevent the build from running incrementally, even if its contents are replaced. See the documentation on snapshot inputs.
Unlike datasets, path overwrites will not prevent a transform from running incrementally.
In an incremental transform, media set inputs can be listed using one of three modes:
added
: Only the items added to the branch since the last build will be included.previous
: Only the items in the branch that existed when the last build ran will be included.current
: All items in the media set branch will be included.The union of added
and previous
is always equal to current
.
If the transform is not running incrementally, for example, if the contents of the input were replaced since the last build, then a listing using the previous
mode will be empty. The listing will not include the items that were present in the previous build.
The default read mode is added
when running incrementally, and current
when not. However, the read mode can be specified using the mode
parameter in any listing method:
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14
from transforms.api import transform, incremental from transforms.mediasets import MediaSetInput, MediaSetOutput @incremental(v2_semantics=True) @transform( input_PNGs=MediaSetInput('/examples/input_PNGs'), output_PNGs=MediaSetOutput('/examples/output_PNGs'), ) def upload_pngs(input_PNGs, output_PNGs): # Will use `added` if running incrementally, or `current` if not listed_pngs = input_PNGs.dataframe(deduplicate_by_path=False) # Will always read in `previous` mode previous_listed_pngs = input_PNGs.dataframe(deduplicate_by_path=False, mode="previous")
If a path is overwritten and the listing deduplicates by path, only the most recent item will be included. If you want to process all input items at a given path, then you must always specify deduplicate_by_path=False
.
When writing to an incremental media set output, the write mode can be set at runtime. This is useful if the transform contains custom logic that determines whether to run the build incrementally. In the example below, the build will not run incrementally if any paths were overwritten since the previous build:
Copied!1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
from transforms.api import transform, incremental from transforms.mediasets import MediaSetInput, MediaSetOutput @incremental(v2_semantics=True) @transform( input_PNGs=MediaSetInput('/examples/input_PNGs'), output_PNGs=MediaSetOutput('/examples/output_PNGs'), ) def upload_pngs(input_PNGs, output_PNGs): previous_dataframe = input_PNGs.dataframe(deduplicate_by_path=False, mode="previous") added_dataframe = input_PNGs.dataframe(deduplicate_by_path=False, mode="added") # Calculates if any paths have been overwritten in the `input_PNGs` media set since # the most recent run of this transform paths_overwritten = previous_dataframe.join(added_dataframe, mode="inner", on="path").count() > 0 if paths_overwritten: # The full input media set will be read and the output media set will be replaced # with the items written in this transform read_mode = "current" output_PNGs.set_write_mode("replace") else: # Only the newly added items in the input media set will be read and the items written in this transform will # be appended to the output media set read_mode = "added" output_PNGs.set_write_mode("modify")
Media sets do not support incremental fallback branches. When running an incremental transform on a new branch, the incremental decorator will recommend a snapshot, as the output is currently empty. Therefore, running the same build on the main branch will not necessarily result in a snapshot.
Transactionless media sets use the modify
write mode and cannot use the replace
write mode. This means that a transactionless media set cannot be a snapshot. If a transactionless media set is an output of an incremental transform, but the transform can't run incrementally, the build will fail. In this case, you should investigate why the build cannot run incrementally.
It can be risky to abort outputs during an incremental build. For more information, see the documentation on aborted transactions.
Individual media set outputs cannot be aborted during a build. Instead, we recommend using the .abort_job()
method on the TransformContext
to abort the entire job rather than aborting individual outputs. This will allow subsequent runs to be incremental.