2 18 12

Zhaoye Fei

ngc7293

https://ngc7292.github.io/

AI & ML interests

NLP & Ro.

Recent Activity

authored a paper about 18 hours ago

MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization

upvoted a paper 1 day ago

MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization

updated a collection 1 day ago

MOSS Transcribe Diarize

View all activity

Organizations

authored a paper about 18 hours ago

MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization

Paper • 2601.01554 • Published 4 days ago • 47

upvoted a paper 1 day ago

MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization

Paper • 2601.01554 • Published 4 days ago • 47

updated a collection 1 day ago

MOSS Transcribe Diarize

Collection

A unified multimodal large language model for end-to-end speaker-attributed, time-stamped transcription. • 2 items • Updated 1 day ago • 1

submitted a paper to Daily Papers 1 day ago

MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization

Paper • 2601.01554 • Published 4 days ago • 47

upvoted 2 papers 9 days ago

DiRL: An Efficient Post-Training Framework for Diffusion Language Models

Paper • 2512.22234 • Published 16 days ago • 19

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Paper • 2512.23576 • Published 10 days ago • 64

liked a Space 9 days ago

MOSS Transcribe Diarize

🏢

Transcribe audio/video files with speaker identification

upvoted a paper about 1 month ago

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211

upvoted a paper about 2 months ago

SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

Paper • 2511.15605 • Published Nov 19, 2025 • 23

liked a model 2 months ago

OpenMOSS-Team/MOSS-TTSD-v0.7

Text-to-Speech • 2B • Updated Nov 11, 2025 • 983 • 15

upvoted 2 papers 2 months ago

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published Oct 30, 2025 • 108

RoboOmni: Proactive Robot Manipulation in Omni-modal Context

Paper • 2510.23763 • Published Oct 27, 2025 • 53

liked 2 datasets 3 months ago

Sylvest/libero_plus_rlds

Updated Oct 17, 2025 • 420 • 5

Sylvest/LIBERO-plus

Updated Oct 17, 2025 • 443 • 15

upvoted 3 papers 3 months ago

PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning

Paper • 2510.13809 • Published Oct 15, 2025 • 37

LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

Paper • 2510.13626 • Published Oct 15, 2025 • 45

MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance

Paper • 2510.00499 • Published Oct 1, 2025 • 19

liked a model 3 months ago

OpenMOSS-Team/MOSS-Speech

9B • Updated Sep 30, 2025 • 192 • 16

liked a Space 3 months ago

MOSS-Speech Demo

🚀

True Speech-to-Speech Language Model

Zhaoye Fei

AI & ML interests

Recent Activity

Organizations

ngc7293's activity

MOSS Transcribe Diarize

MOSS-Speech Demo