tencent cloud

Feedback

Video Content Recognition

Last updated: 2024-11-04 10:11:11
    Since August 1, 2022, the audio/video content recognition feature of VOD is paid feature. For more information, see Video Recognition to Become Paid Feature.
    Audio/Video content recognition is an offline task that intelligently recognizes audio/video content with the aid of AI. It recognizes faces, text, opening and closing segments, and speech in the video, helping you accurately and efficiently manage your audio/videos. Specifically, it includes the following features:
    Feature
    Description
    Use Cases
    Face recognition
    Recognizes faces in video images.
    Marks where celebrities appear in video images
    Checks for particular people in video images
    Full speech recognition
    Recognizes all words that occur in speech
    Generates subtitles for speech content
    Performs data analysis on video speech content
    Full text recognition
    Recognizes all text that occurs in video images
    Performs data analysis on text in video images
    Speech keyword recognition
    Recognizes keywords in speech
    Checks for sensitive words in speech
    Retrieves specific keywords in speech
    Text keyword recognition
    Recognizes keywords in video images
    Checks for sensitive words in video images
    Retrieves specific keywords in video images
    Opening and closing segment recognition
    Recognizes opening and closing segments in videos
    Marks the positions of the opening segment, and closing segment, and feature presentation in the progress bar
    Removes the opening and closing segments of multiple videos at a time
    Speech translation recognition
    Recognizes all words that occur in speech and translates them into the specified language.
    Generates translated subtitles for short dramas.
    Generates multilingual subtitles for recorded files of cross-border audio and video conferences.
    Some content recognition features depend on a material library. There are two types of libraries: public library and custom library.
    Public library: VOD's preset material library.
    Custom library: Your own library
    Recognition Type
    Public Library
    Custom Library
    Face recognition
    Supported. The library includes celebrities in the sports and entertainment industries, as well as other people.
    Supported. You can use a server API to manage the custom face library.
    Speech recognition
    Not supported yet
    Supported. You can use a server API to manage the custom keyword library.
    Text recognition
    Not supported yet
    Supported. You can use a server API to manage the custom keyword library.

    Audio/Video Content Recognition Template

    Audio/Video content recognition integrates a number of recognition features. You can use parameters to control the following:
    Which content recognition features to enable
    Whether to use the public library or custom library for face recognition
    The confidence score threshold to return face recognition results
    The labels of the faces to return
    VOD provides preset video content recognition templates for common parameter combinations. You can also use a server API to create and manage custom templates.

    Initiating a Task

    You can initiate an audio/video recognition task by calling a server API, via the console, or by specifying the task when uploading videos. For details, see Task Initiation.
    Below are the details:
    Initiate a task by calling a server API: Call ProcessMedia, setting Definition in the request parameter AiRecognitionTask to the ID of the audio/video content recognition template.
    Initiate a task via the console: Call the server API CreateProcedureTemplate to create an audio/video recognition task flow (MediaProcessTask.AiRecognitionTask), and use it to process videos in the console.
    Specify a task when uploading videos from the server: Call the server API CreateProcedureTemplate to create an audio/video recognition task flow (MediaProcessTask.AiRecognitionTask). When calling ApplyUpload, set the parameter procedure to the task flow.
    Specify a task when uploading videos from a client: Call the server API CreateProcedureTemplate to create an audio/video recognition task flow (MediaProcessTask.AiRecognitionTask). When generating a signature for upload, set the parameter procedure to the task flow.
    Specify a task when uploading videos via the console: Call the server API CreateProcedureTemplate to create an audio/video recognition task flow (MediaProcessTask.AiRecognitionTask). When uploading videos via the console, select Auto-processing after upload and choose the task flow.

    Getting the Result

    After initiating an audio/video content recognition task, you can wait for the result notification asynchronously or perform a task query synchronously to get the task execution result. Below is an example of getting the result notification in normal callback mode after a content recognition task is initiated (the fields with null value are omitted):
    {
    "EventType":"ProcedureStateChanged",
    "ProcedureStateChangeEvent":{
    "TaskId":"1400155958-Procedure-2e1af2456351812be963e309cc133403t0",
    "Status":"FINISH",
    "FileId":"5285890784363430543",
    "FileName":"Collection",
    "FileUrl":"http://1400155958.vod2.myqcloud.com/xxx/xxx/aHjWUx5Xo1EA.mp4",
    "MetaData":{
    "AudioDuration":243,
    "AudioStreamSet":[
    {
    "Bitrate":125599,
    "Codec":"aac",
    "SamplingRate":48000
    }
    ],
    "Bitrate":1459299,
    "Container":"mov,mp4,m4a,3gp,3g2,mj2",
    "Duration":243,
    "Height":1080,
    "Rotate":0,
    "Size":44583593,
    "VideoDuration":243,
    "VideoStreamSet":[
    {
    "Bitrate":1333700,
    "Codec":"h264",
    "Fps":29,
    "Height":1080,
    "Width":1920
    }
    ],
    "Width":1920
    },
    "AiRecognitionResultSet":[
    {
    "Type":"FaceRecognition",
    "FaceRecognitionTask":{
    "Status":"SUCCESS",
    "ErrCode":0,
    "Message":"",
    "Input":{
    "Definition":10
    },
    "Output":{
    "ResultSet":[
    {
    "Id":183213,
    "Type":"Default",
    "Name":"John Smith",
    "SegmentSet":[
    {
    "StartTimeOffset":10,
    "EndTimeOffset":12,
    "Confidence":97,
    "AreaCoordSet":[
    830,
    783,
    1030,
    599
    ]
    },
    {
    "StartTimeOffset":12,
    "EndTimeOffset":14,
    "Confidence":97,
    "AreaCoordSet":[
    844,
    791,
    1040,
    614
    ]
    }
    ]
    },
    {
    "Id":236099,
    "Type":"Default",
    "Name":"Jane Smith",
    "SegmentSet":[
    {
    "StartTimeOffset":120,
    "EndTimeOffset":122,
    "Confidence":96,
    "AreaCoordSet":[
    579,
    903,
    812,
    730
    ]
    }
    ]
    }
    ]
    }
    }
    }
    ],
    "TasksPriority":0,
    "TasksNotifyMode":""
    }
    }
    
    In the callback result, ProcedureStateChangeEvent.AiRecognitionResultSet contains the result of face recognition (Type is FaceRecognition).
    According to the content of Output.ResultSet, two people are recognized: John Smith and Jane Smith. SegmentSet indicates when (from StartTimeOffset to EndTimeOffset) and where (coordinates specified by AreaCoordSet) the two people appear in the video.
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support