Token Classification
Transformers
TensorBoard
Safetensors
English
bert
PII
NER
Bert
Token Classification
Eval Results (legacy)
Instructions to use ab-ai/pii_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ab-ai/pii_model with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="ab-ai/pii_model")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("ab-ai/pii_model") model = AutoModelForTokenClassification.from_pretrained("ab-ai/pii_model") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| base_model: bert-base-cased | |
| tags: | |
| - PII | |
| - NER | |
| - Bert | |
| - Token Classification | |
| datasets: | |
| - generator | |
| metrics: | |
| - precision | |
| - recall | |
| - f1 | |
| - accuracy | |
| model-index: | |
| - name: pii_model | |
| results: | |
| - task: | |
| name: Token Classification | |
| type: token-classification | |
| dataset: | |
| name: generator | |
| type: generator | |
| config: default | |
| split: train | |
| args: default | |
| metrics: | |
| - name: Precision | |
| type: precision | |
| value: 0.954751 | |
| - name: Recall | |
| type: recall | |
| value: 0.965233 | |
| - name: F1 | |
| type: f1 | |
| value: 0.959964 | |
| - name: Accuracy | |
| type: accuracy | |
| value: 0.991199 | |
| pipeline_tag: token-classification | |
| language: | |
| - en | |
| <!-- This model card has been generated automatically according to the information the Trainer had access to. You | |
| should probably proofread and complete it, then remove this comment. --> | |
| # Personal Identifiable Information (PII Model) | |
| This model is a fine-tuned version of [bert-base-cased](https://huggingface.co/bert-base-cased) on the generator dataset. | |
| It achieves the following results: | |
| - Training Loss: 0.003900 | |
| - Validation Loss: 0.051071 | |
| - Precision: 95.53% | |
| - Recall: 96.60% | |
| - F1: 96% | |
| - Accuracy:99.11% | |
| ## Model description | |
| Meet our digital safeguard, a savvy token classification model with a knack for spotting personally identifiable information (PII) entities. Trained on the illustrious Bert architecture and fine-tuned on a custom dataset, this model is like a superhero for privacy, swiftly detecting names, addresses, dates of birth, and more. With each token it encounters, it acts as a vigilant guardian, ensuring that sensitive information remains shielded from prying eyes, making the digital realm a safer and more secure place to explore. | |
| ## Model can Detect Following Entity Group | |
| - ACCOUNTNUMBER | |
| - FIRSTNAME | |
| - ACCOUNTNAME | |
| - PHONENUMBER | |
| - CREDITCARDCVV | |
| - CREDITCARDISSUER | |
| - PREFIX | |
| - LASTNAME | |
| - AMOUNT | |
| - DATE | |
| - DOB | |
| - COMPANYNAME | |
| - BUILDINGNUMBER | |
| - STREET | |
| - SECONDARYADDRESS | |
| - STATE | |
| - CITY | |
| - CREDITCARDNUMBER | |
| - SSN | |
| - URL | |
| - USERNAME | |
| - PASSWORD | |
| - COUNTY | |
| - PIN | |
| - MIDDLENAME | |
| - IBAN | |
| - GENDER | |
| - AGE | |
| - ZIPCODE | |
| - SEX | |
| ### Training hyperparameters | |
| The following hyperparameters were used during training: | |
| | Hyperparameter | Value | | |
| |------------------------------|---------------| | |
| | Learning Rate | 5e-5 | | |
| | Train Batch Size | 16 | | |
| | Eval Batch Size | 16 | | |
| | Number of Training Epochs | 7 | | |
| | Weight Decay | 0.01 | | |
| | Save Strategy | Epoch | | |
| | Load Best Model at End | True | | |
| | Metric for Best Model | F1 | | |
| | Push to Hub | True | | |
| | Evaluation Strategy | Epoch | | |
| | Early Stopping Patience | 3 | | |
| ### Training results | |
| | Epoch | Training Loss | Validation Loss | Precision (%) | Recall (%) | F1 Score (%) | Accuracy (%) | | |
| |-------|---------------|-----------------|---------------|------------|--------------|--------------| | |
| | 1 | 0.0443 | 0.038108 | 91.88 | 95.17 | 93.50 | 98.80 | | |
| | 2 | 0.0318 | 0.035728 | 94.13 | 96.15 | 95.13 | 98.90 | | |
| | 3 | 0.0209 | 0.032016 | 94.81 | 96.42 | 95.61 | 99.01 | | |
| | 4 | 0.0154 | 0.040221 | 93.87 | 95.80 | 94.82 | 98.88 | | |
| | 5 | 0.0084 | 0.048183 | 94.21 | 96.06 | 95.13 | 98.93 | | |
| | 6 | 0.0037 | 0.052281 | 94.49 | 96.60 | 95.53 | 99.07 | | |
| ### Author | |
| Abhijeet Santosh Lokhande | |
| abhijeetlokhande1996@gmail.com | https://www.linkedin.com/in/ablds/ | https://github.com/abhijeetscode | |
| ### Framework versions | |
| - Transformers 4.38.2 | |
| - Pytorch 2.1.0+cu121 | |
| - Datasets 2.18.0 | |
| - Tokenizers 0.15.2 |