The REST API allows you to send images and get an analysis of the persons visible in the image.

# Authentication

Requests to the API are authenticated and authorized with an API key. The key has to be sent along as a HTTP header:

Authorization: Bearer <API_KEY>

# Requests

In order to analyze an image, you send a POST request to https://vision.tawnyapis.com/v1/images with the following payload:

    "requests": [
            "image": "<BASE64_ENCODED_IMAGE>",
            "features": ["FACE_DETECTION", "FACE_EMOTION"],
            "maxResults": 1
  • requests is a list of image annotation requests, i.e., you can send one or also multiple images in a single HTTP request.
  • Each image annotation request consists of
    • image: The base64 representation of the actual image which itself must be in a common image format (i.e., JPEG, PNG, TIFF).
    • features: A list of analysis types to be applied. Currently, the following are available:
      • FACE_DETECTION: Detects the faces within the image and returns their bounding boxes. If you omit this feature, the API assumes that the image is already cropped to a single face and tries to apply the other analyses to the whole image.
      • FACE_LANDMARKS: Determines a set of 68 landmarks in each detected face.
      • FACE_EMOTION: Predicts a set of emotion values for each detected face.
      • ATTENTION: Estimates whether and to which object each person is paying attention. Currently, it tries to judge whether the person is looking at the camera or not. For many remote use cases, in which an integrated webcam is used, looking at the camera is largely similar to looking at the screen (which probably is the more interesting information).
      • HEAD_POSE: Calculates each person's head pose.
      • FACE_DESCRIPTOR: Produces a vector representation for each face which can be used to identify the same person in multiple requests.
    • maxResults: Determines how many faces should be analyzed at most. The detected faces are ordered by the size of their bounding boxes, i.e., if you set maxResults to 1, you only get the analysis for the largest (usually primary) face.

# Responses

A successful request will return a response similar to the following:

    "images": [
            "faces": [
                    "boundingBox": {
                        "x1": 326,
                        "y1": 145,
                        "x2": 387,
                        "y2": 231
                    "descriptor": [-0.11876228451728821, 0.09895002096891403, ...],
                    "landmarks": [[-3, 21], [-2, 28], ...], 
                    "emotionAnalysis": {
                        "emotionProbabilities": {
                            "neutral": 0.024,
                            "happy": 0.97,
                            "surprised": 0.006,
                            "angry": 0.0,
                            "sad": 0.0
                        "intensityProbabilities": {
                            "low": 0.128,
                            "medium": 0.387,
                            "high": 0.485
                        "affectLevels": {
                            "valence": 0.539,
                            "arousal": 0.246
                        "lookingAtCam": 1
                        "yaw": 2.187211763678912,
                        "pitch": -0.6323527259643501,
                        "roll": 4.446590981308942
  • images is a list of all the analysis results for each image you sent in the request. The images list has the same order as the image annotation requests of the original HTTP request.
  • Each image result has a list of faces which were detected in the image. The faces are ordered by the size of their bounding box, from large to small. The list contains at most maxResults entries as defined in the request.
  • For each face, you get the following analysis results (depending on whether it was specified as a feature in the request):
    • boundingBox: The coordinates of the bounding box of the detected face.
    • descriptor: A descriptor of the face, i.e., a vector of length 128 representing the identity of the face which can be used to match faces in consecutive requests.
    • landmarks: A list of the coordinates of 68 landmarks of the face.
    • emotionAnalysis: A set of different analyses regarding the emotion shown by the detected face. The analyses are:
      • emotionProbabilities: The probability for each of the five emotions neutral, happy, surprised, angry and sad to be present (each from 0.0 to 1.0, all together summing to 1.0).
      • intensityProbabilities: The probabilities that the shown emotion is of low, medium or high intensity (each from 0.0 to 1.0, all together summing to 1.0).
      • affectLevels: An estimation of the valence and arousal levels of the detected face (each from -1.0 to +1.0).
    • attention: A set of predictions whether and to what the person is paying attention.
      • lookingAtCam: 1 if the person looks at the camera, 0 otherwise.
    • headPose: A set of angles (in degrees) describing the head's offset from looking frontally into the camera.
      • yaw: The offset with regard to looking left or right.
      • pitch: The offset with regard to looking up or down.
      • roll: The offset with regard to tilting the head.