Viewing checkpoint information
Log in to the Stream Compute Service console, select Jobs on the left sidebar, and click the Checkpoints tab of a job to view its checkpoints. The checkpoint list of the job is displayed there. The checkpoint list provides the following information:
Checkpoint ID/description: The ID uniquely identifies the current checkpoint, and the description is the checkpoint information specified by you or automatically generated by the system.
Trigger time: The time when the checkpointing is triggered.
Completion time: The time when the checkpointing is completed.
Time: The time taken to perform checkpointing.
Status: The checkpoint status. Valid values: Creating, Present, Cleared, Timeout, Failed, and so on.
Source: The checkpoint source. Created during running means the checkpoint is manually taken by a user, while Created when the job is stopped means the Create a checkpoint when stopping the job option is selected and the checkpoint is taken.
Job version: The job configuration version to which the checkpoint corresponds.
Location: The storage address of the checkpoint, currently a COS path.
Note
Cleared means the checkpoint has been manually or automatically cleared from its COS path and is unavailable for job start.
Manually creating a checkpoint
You can manually create a checkpoint of a running job, which contains all the current state data of the job and can be used for job upgrade and testing. Steps are as follows: On the Checkpoints page of a job, click Trigger checkpoint, enter a description in the pop-up window, and click Confirm.
Then, a checkpoint whose source is Created during running will appear in the checkpoint list. Please wait until its status changes from Running to Completed. A Completed checkpoint can be used to recover the job state during job start.
Note
If the Checkpoints tab shows that the current cluster does not support checkpoints, submit a ticket to upgrade the cluster. Recovering a job from checkpoint
When running a job, you can select Use a checkpoint to recover the state of the job. Specifically, you select a desired checkpoint and click Confirm.
Setting a checkpoint storage policy
By default, the latest 5 checkpoints of a job are saved. You can adjust the number of checkpoints saved using state.checkpoints.num-retained
in the advanced parameters.
Was this page helpful?