"Image vs Video" Event System

deicool · April 13, 2026, 10:57am

Hello

I built a complete demo of Face Recognition System using Face++, Insightface for users infront of camera. It just allows users to register and mark presence at events all in a automatic agent.

Now which direction should I move to?

Add Intelligence to it like which all users attended which all events, etc
Move from image to video

or something else?

Please advise.

Thank You.

John6666 · April 14, 2026, 12:16am

For now, option 1 seems like better choice:

Choose 1 first, but not as a thin analytics add-on.

The best path is:

1. Add intelligence and operational depth now
Then build a selective “video-assisted entry” layer later
Do not jump straight to full video as the main next move

My actual ranking for your case is:

Something else: turn the demo into an event decision system
Option 1: add intelligence, event history, confidence, and operator workflow
Option 2: move to video later, only for specific bottlenecks

That is the strongest path because the market already has vendors doing face-based event check-in, the real differentiation is in workflow and trust, and video adds a large new layer of technical complexity. Wicket and InEvent already market facial event check-in and access control, while NVIDIA’s current multi-camera workflow shows that video systems quickly become detection + tracking + ID management + storage + analytics platforms, not just “more frames.” (Wicket)

The core judgment

Your demo has already solved the easiest part to explain:

register a face
detect a face
match a face
mark attendance

The next valuable step is not “increase input from image to video.”
It is to answer the real product questions:

Who attended which events?
When did they arrive?
Which gate did they use?
Was the match strong or weak?
Did the system auto-approve or did a human review it?
What happened when the face was low quality or consent was missing?

That matters because NIST evaluates face recognition as a thresholded tradeoff problem in both 1:1 verification and 1:N identification. In plain words, every real deployment must manage false accepts and false rejects, not just maximize demo accuracy. (NIST Pages)

Why option 1 is better than option 2 right now

1. Option 1 creates product value faster

Adding event intelligence makes the system useful to organizers immediately.
Commercial event platforms already position facial check-in inside a broader operations workflow: access control, analytics, onboarding, and onsite execution. Wicket’s Dreamforce 2024 case study says Digital Pass was integrated into registration, opt-in only, and made check-in and badge printing 3x faster; InEvent positions facial recognition as part of check-in, access control, and performance analytics. (Wicket)

That tells you something important:

the buyer does not really want “face recognition.”
The buyer wants faster, safer, auditable event operations. That is where option 1 wins.

2. Option 2 is much harder than it looks

Moving from image to video is not a small upgrade. It changes the system class.

With image or guided kiosk capture, the flow is mostly:

detect → align → embed → compare → decide

With video, the flow becomes:

detect per frame → track across frames → select best frames → suppress duplicates → handle side views/occlusion → maintain persistent IDs → possibly re-identify across cameras → aggregate evidence over time

NVIDIA’s own multi-camera workflow lists object detection, feature embeddings, multi-camera tracking, global ID generation, storage, API outputs, and browser visualization. DeepStream’s tracker docs explicitly describe persistent IDs over time, re-identification features, and target re-association. ByteTrack’s README explains why this is hard: low-score detections often contain true objects under occlusion, so trackers need special association logic to recover them. (NVIDIA)

So if you choose option 2 now, you are not adding one feature.
You are starting a second system.

3. Public issues already show where video breaks

This is where GitHub issues are more useful than polished demos.

In one InsightFace issue, a developer trying to count unique people in video says there are really 4 people, but the system emits around 10 IDs because side-angled faces become hard to recognize and new IDs get created. In another issue, a developer trying real-time streaming reports that the setup is extremely slow and logs show CPUExecutionProvider even though onnxruntime-gpu was installed. (GitHub)

That means your likely next pain points in video are not abstract:

side-angle identity fragmentation
duplicate identities for the same person
missed detections
runtime/provider misconfiguration
latency that destroys the user experience

Those are real engineering costs.

Why “something else” should be your actual next move

The best move is not “just analytics.”
It is to build the layer that turns recognition into a trustworthy event system.

I would define that layer as:

Event intelligence + trust + operations

That means building these objects into the product:

Person
Enrollment
Consent
Event
Gate / checkpoint
Sighting
Attendance decision
Confidence score
Manual review status
Audit log

That gives you a much stronger product than “camera saw face.”

What this unlocks

It lets you answer:

which users attended which events
first seen time
last seen time
late arrivals
no-shows
duplicate entry attempts
zone-level or session-level access
uncertain matches needing review
camera-specific failure patterns
event-level match quality and review rate

This is where your system stops being a demo and starts becoming an operational product. That is also where you can differentiate from basic face-attendance clones, because the category already has vendors doing check-in, but fewer teams build strong confidence handling, review flow, and auditability. This is an inference from how current event vendors frame their products and from NIST’s thresholded evaluation model. (Wicket)

The hidden reason not to rush into video: trust and compliance

There are two trust problems here.

1. Biometric processing is regulated and sensitive

ICO guidance says that when using biometric recognition systems, you must identify both a lawful basis and a separate condition for processing special-category biometric data. It says that in many cases explicit consent is likely to be the most appropriate condition, and that if you rely on consent you must offer a suitable alternative and allow refusal or withdrawal without detriment. (ICO)

That means the better next feature is not passive video.
It is:

explicit opt-in
fallback flow
deletion / retention rules
auditability
transparency

2. Some target markets are weak on necessity and proportionality

The ICO ordered Serco Leisure and associated trusts to stop using facial recognition and fingerprint scanning for employee attendance, saying more than 2,000 employees at 38 facilities had their biometric data processed unlawfully. (ICO)

So if your future thought is “maybe this should become employee attendance,” that is a warning sign.
For your case, controlled event access is stronger than generic workforce attendance.

Another practical issue: your current model path may be demo-only

InsightFace’s repo and PyPI page both say the code is MIT, but the provided pretrained models are for non-commercial research purposes only. (GitHub)

That matters because if you are serious about product direction, your next move should include:

deciding whether the current stack stays demo-only
licensing a commercial path
or replacing the recognition component with a commercially usable stack

There is no point scaling product complexity on top of a model path that may not support commercialization.

So what should you build next, exactly?

Here is the path I would take.

Phase 1. Turn the demo into a real event system

Build:

event history
per-event attendance records
first-seen / last-seen
confidence score per decision
uncertain-match queue
admin review console
explicit opt-in record
fallback method like QR, badge, PIN, or manual desk review

This is your highest-return work. It increases product value, reduces risk, and gives you the data you need to justify later video work. It also aligns with the way real event deployments are framed today. (Wicket)

Phase 2. Add quality and anti-spoofing controls

NIST’s PAD material defines presentation attacks as attempts to interfere with biometric policy using artefacts or human characteristics, often for impersonation or evasion. (NIST Pages)

So before scaling input volume, add:

enrollment quality checks
best-photo guidance
liveness / anti-spoofing
threshold calibration by environment
camera health monitoring

This improves trust more than jumping to video.

Phase 3. Add selective video only where it clearly pays off

After the system is trustworthy, add video-assisted entry at one controlled gate.

Use video for:

smoother walk-up experience
best-frame selection over 1–3 seconds
fewer retries
better throughput at peak entry windows

Do not start with venue-wide passive surveillance.
Start with one choke point where users currently pause too much or where image capture misses too many people.

When option 2 should move earlier

There are cases where video deserves to move up.

Move to video sooner only if most of these are true:

users will not stop and face a kiosk
people arrive in bursts and queues matter a lot
multiple people appear at once
you need passive capture at a gate
you have several entrances
you need to track a person for a short period before deciding
missed check-ins are caused by motion and pose, not by missing product features

If that is your environment, then the right move is not “full video everywhere.”
It is video-assisted gate recognition.

What I would avoid

I would avoid these as your next main direction:

1. Full passive multi-camera tracking everywhere

Too much complexity too early. NVIDIA’s own materials show this becomes a full analytics and tracking platform. (NVIDIA)

2. Generic employee attendance

Too much regulatory exposure for a first market. (ICO)

3. “Better recognition accuracy” as the strategy

Important, but not enough. The category already exists. Product trust and workflow matter more.

My final recommendation

If I compress everything into one sentence:

Do not choose 2 yet. Choose 1, but expand it into “event intelligence + trust + fallback + review,” then add selective video later.

So the answer is:

1 beats 2 right now
But the best real move is “something else” built around 1
2 becomes valuable only after your event system is operationally strong

The simple decision rule

If your current system is best described as:

“It recognizes faces and marks attendance.”

Then your next step should be:

“It makes reliable, auditable event decisions.”

Only after that should it become:

“It does that from short video at selected gates.”

That is the cleanest path from demo to product.

system · April 14, 2026, 12:17pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bleeding Edge Tech Intermediate	3	22	April 19, 2026
A few questions about models Beginners	3	95	December 16, 2025
Identiy camera movements in a video Beginners	0	180	June 26, 2023
Capture image streaming 🔒 Gradio	0	1578	April 19, 2023
Image to Video testing Beginners	2	155	June 7, 2025