GLiNER2 Data Mention Extractor (v1-deval-synth-v2)

Fine-tuned GLiNER2 LoRA adapter for extracting structured data mentions from development economics and humanitarian research documents.

v4 key change: Trained with a patched GLiNER2 library (rafmacalaba/GLiNER2@feat/main-mirror) that feeds mean-pooled passage token embeddings into count_pred instead of the schema [P] token. This is the first adapter version where multi-mention recall is trained via a meaningful gradient.

Task

Given a document passage, extracts structured information about each dataset mentioned:

  • Extractive field: mention_name (verbatim from text)
  • Classification fields (fixed choices):
    • specificity_tag: named / descriptive / vague
    • typology_tag: survey / census / database / administrative / indicator / geospatial / microdata / report / other
    • is_used: True / False
    • usage_context: primary / supporting / background

Training

  • Base model: fastino/gliner2-large-v1
  • Method: LoRA (r=16, alpha=32.0)
  • Target modules: ['encoder', 'span_rep', 'classifier', 'count_embed', 'count_pred']
  • Training examples: 8791
  • Val examples: 651
  • Best val loss: 439.4476
  • GLiNER2 branch: rafmacalaba/GLiNER2@feat/main-mirror

Usage

from gliner2 import GLiNER2

# Install the patched library first
# pip install git+https://github.com/rafmacalaba/GLiNER2.git@feat/main-mirror

extractor = GLiNER2.from_pretrained("fastino/gliner2-large-v1")
extractor.load_adapter("rafmacalaba/gliner2-datause-large-v1-deval-synth-v2")

schema = (
    extractor.create_schema()
    .structure("data_mention")
        .field("mention_name",    dtype="str")
        .field("specificity_tag", dtype="str", choices=["named", "descriptive", "vague", "na"])
        .field("typology_tag",    dtype="str", choices=["survey", "census", "administrative",
                                                         "database", "indicator", "geospatial",
                                                         "microdata", "report", "other", "na"])
        .field("is_used",         dtype="str", choices=["True", "False", "na"])
        .field("usage_context",   dtype="str", choices=["primary", "supporting", "background", "na"])
)

result = extractor.extract(text, schema)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rafmacalaba/gliner2-datause-large-v1-deval-synth-v2

Adapter
(6)
this model