GLiNER2 Data Mention Extractor (v1-deval-synth-v2)
Fine-tuned GLiNER2 LoRA adapter for extracting structured data mentions from development economics and humanitarian research documents.
v4 key change: Trained with a patched GLiNER2 library
(rafmacalaba/GLiNER2@feat/main-mirror) that feeds mean-pooled passage token embeddings
into count_pred instead of the schema [P] token. This is the first adapter
version where multi-mention recall is trained via a meaningful gradient.
Task
Given a document passage, extracts structured information about each dataset mentioned:
- Extractive field:
mention_name(verbatim from text) - Classification fields (fixed choices):
specificity_tag: named / descriptive / vaguetypology_tag: survey / census / database / administrative / indicator / geospatial / microdata / report / otheris_used: True / Falseusage_context: primary / supporting / background
Training
- Base model:
fastino/gliner2-large-v1 - Method: LoRA (r=16, alpha=32.0)
- Target modules: ['encoder', 'span_rep', 'classifier', 'count_embed', 'count_pred']
- Training examples: 8791
- Val examples: 651
- Best val loss: 439.4476
- GLiNER2 branch:
rafmacalaba/GLiNER2@feat/main-mirror
Usage
from gliner2 import GLiNER2
# Install the patched library first
# pip install git+https://github.com/rafmacalaba/GLiNER2.git@feat/main-mirror
extractor = GLiNER2.from_pretrained("fastino/gliner2-large-v1")
extractor.load_adapter("rafmacalaba/gliner2-datause-large-v1-deval-synth-v2")
schema = (
extractor.create_schema()
.structure("data_mention")
.field("mention_name", dtype="str")
.field("specificity_tag", dtype="str", choices=["named", "descriptive", "vague", "na"])
.field("typology_tag", dtype="str", choices=["survey", "census", "administrative",
"database", "indicator", "geospatial",
"microdata", "report", "other", "na"])
.field("is_used", dtype="str", choices=["True", "False", "na"])
.field("usage_context", dtype="str", choices=["primary", "supporting", "background", "na"])
)
result = extractor.extract(text, schema)
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for rafmacalaba/gliner2-datause-large-v1-deval-synth-v2
Base model
fastino/gliner2-large-v1