中文

layoutlm-funsd

This model is a fine-tuned version of microsoft/layoutlm-base-uncased on the funsd dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0045
  • Answer: {'precision': 0.7348314606741573, 'recall': 0.8084054388133498, 'f1': 0.7698646262507357, 'number': 809}
  • Header: {'precision': 0.44285714285714284, 'recall': 0.5210084033613446, 'f1': 0.47876447876447875, 'number': 119}
  • Question: {'precision': 0.8211009174311926, 'recall': 0.8403755868544601, 'f1': 0.8306264501160092, 'number': 1065}
  • Overall Precision: 0.7599
  • Overall Recall: 0.8083
  • Overall F1: 0.7866
  • Overall Accuracy: 0.8106

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 15
  • mixed_precision_training: Native AMP

Deploy Model with Inference Endpoints

Before we can get started, make sure you meet all of the following requirements:

  • An Organization/User with an active plan and WRITE access to the model repository.
  • Can access the UI: https://ui.endpoints.huggingface.co
  • 1. Deploy LayoutLM and Send requests

    In this tutorial, you will learn how to deploy a LayoutLM to Hugging Face Inference Endpoints and how you can integrate it via an API into your products.

    This tutorial is not covering how you create the custom handler for inference. If you want to learn how to create a custom Handler for Inference Endpoints, you can either checkout the documentation or go through “Custom Inference with Hugging Face Inference Endpoints”

    We are going to deploy philschmid/layoutlm-funsd which implements the following handler.py

    from typing import Dict, List, Any
    from transformers import LayoutLMForTokenClassification, LayoutLMv2Processor
    import torch
    from subprocess import run
    
    # install tesseract-ocr and pytesseract
    run("apt install -y tesseract-ocr", shell=True, check=True)
    run("pip install pytesseract", shell=True, check=True)
    
    # helper function to unnormalize bboxes for drawing onto the image
    def unnormalize_box(bbox, width, height):
        return [
            width * (bbox[0] / 1000),
            height * (bbox[1] / 1000),
            width * (bbox[2] / 1000),
            height * (bbox[3] / 1000),
        ]
    
    # set device
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    class EndpointHandler:
        def __init__(self, path=""):
            # load model and processor from path
            self.model = LayoutLMForTokenClassification.from_pretrained(path).to(device)
            self.processor = LayoutLMv2Processor.from_pretrained(path)
    
        def __call__(self, data: Dict[str, bytes]) -> Dict[str, List[Any]]:
            """
            Args:
                data (:obj:):
                    includes the deserialized image file as PIL.Image
            """
            # process input
            image = data.pop("inputs", data)
    
            # process image
            encoding = self.processor(image, return_tensors="pt")
    
            # run prediction
            with torch.inference_mode():
                outputs = self.model(
                    input_ids=encoding.input_ids.to(device),
                    bbox=encoding.bbox.to(device),
                    attention_mask=encoding.attention_mask.to(device),
                    token_type_ids=encoding.token_type_ids.to(device),
                )
                predictions = outputs.logits.softmax(-1)
    
            # post process output
            result = []
            for item, inp_ids, bbox in zip(
                predictions.squeeze(0).cpu(), encoding.input_ids.squeeze(0).cpu(), encoding.bbox.squeeze(0).cpu()
            ):
                label = self.model.config.id2label[int(item.argmax().cpu())]
                if label == "O":
                    continue
                score = item.max().item()
                text = self.processor.tokenizer.decode(inp_ids)
                bbox = unnormalize_box(bbox.tolist(), image.width, image.height)
                result.append({"label": label, "score": score, "text": text, "bbox": bbox})
            return {"predictions": result}
    

    2. Send HTTP request using Python

    Hugging Face Inference endpoints can directly work with binary data, this means that we can directly send our image from our document to the endpoint. We are going to use requests to send our requests. (make your you have it installed pip install requests )

    import json
    import requests as r
    import mimetypes
    
    ENDPOINT_URL="" # url of your endpoint
    HF_TOKEN="" # organization token where you deployed your endpoint
    
    def predict(path_to_image:str=None):
        with open(path_to_image, "rb") as i:
          b = i.read()
        headers= {
            "Authorization": f"Bearer {HF_TOKEN}",
            "Content-Type": mimetypes.guess_type(path_to_image)[0]
        }
        response = r.post(ENDPOINT_URL, headers=headers, data=b)
        return response.json()
    
    prediction = predict(path_to_image="path_to_your_image.png")
    
    print(prediction)
    # {'predictions': [{'label': 'I-ANSWER', 'score': 0.4823932945728302, 'text': '[CLS]', 'bbox': [0.0, 0.0, 0.0, 0.0]}, {'label': 'B-HEADER', 'score': 0.992474377155304, 'text': 'your', 'bbox': [1712.529, 181.203, 1859.949, 228.88799999999998]},
    

    3. Draw result on image

    To get a better understanding of what the model predicted you can also draw the predictions on the provided image.

    from PIL import Image, ImageDraw, ImageFont
    
    # draw results on image
    def draw_result(path_to_image,result):
      image = Image.open(path_to_image)
      label2color = {
          "B-HEADER": "blue",
          "B-QUESTION": "red",
          "B-ANSWER": "green",
          "I-HEADER": "blue",
          "I-QUESTION": "red",
          "I-ANSWER": "green",
      }
    
      # draw predictions over the image
      draw = ImageDraw.Draw(image)
      font = ImageFont.load_default()
      for res in result:
          draw.rectangle(res["bbox"], outline="black")
          draw.rectangle(res["bbox"], outline=label2color[res["label"]])
          draw.text((res["bbox"][0] + 10, res["bbox"][1] - 10), text=res["label"], fill=label2color[res["label"]], font=font)
      return image
    
    draw_result("path_to_your_image.png", prediction["predictions"])