IP Composer

community
Activity Feed

AI & ML interests

None defined yet.

multimodalartย 
posted an update 5 months ago
view post
Post
21234
Want to iterate on a Hugging Face Space with an LLM?

Now you can easily convert any HF entire repo (Model, Dataset or Space) to a text file and feed it to a language model!

multimodalart/repo2txt
  • 1 reply
ยท
multimodalartย 
posted an update 9 months ago
view post
Post
18225
Self-Forcing - a real-time video distilled model from Wan 2.1 by @adobe is out, and they open sourced it ๐Ÿ

I've built a live real time demo on Spaces ๐Ÿ“น๐Ÿ’จ

multimodalart/self-forcing
ยท
linoytsย 
posted an update 10 months ago
view post
Post
17938
FramePack is hands down one of the best OS releases in video generation ๐Ÿ™‡๐Ÿปโ€โ™€๏ธ๐Ÿคฏ
โœ… fully open sourced + amazing quality + reduced memory + improved speed
but more even - its gonna facilitate *soooo* many downstream applications
like this version adapted for landscape rotation ๐Ÿ‘‡https://huggingface.co/spaces/tori29umai/FramePack_rotate_landscape
ยท
linoytsย 
posted an update 11 months ago
linoytsย 
updated a Space 11 months ago
linoytsย 
published a Space 11 months ago
linoytsย 
in IP-composer/ip-composer 12 months ago

initial ui changes

#2 opened 12 months ago by
linoyts

Create app.py

#1 opened 12 months ago by
linoyts
multimodalartย 
posted an update over 1 year ago
multimodalartย 
posted an update almost 2 years ago
view post
Post
28621
The first open Stable Diffusion 3-like architecture model is JUST out ๐Ÿ’ฃ - but it is not SD3! ๐Ÿค”

It is Tencent-Hunyuan/HunyuanDiT by Tencent, a 1.5B parameter DiT (diffusion transformer) text-to-image model ๐Ÿ–ผ๏ธโœจ, trained with multi-lingual CLIP + multi-lingual T5 text-encoders for english ๐Ÿค chinese understanding

Try it out by yourself here โ–ถ๏ธ https://huggingface.co/spaces/multimodalart/HunyuanDiT
(a bit too slow as the model is chunky and the research code isn't super optimized for inference speed yet)

In the paper they claim to be SOTA open source based on human preference evaluation!
multimodalartย 
posted an update about 2 years ago
view post
Post
The Stable Diffusion 3 research paper broken down, including some overlooked details! ๐Ÿ“

Model
๐Ÿ“ 2 base model variants mentioned: 2B and 8B sizes

๐Ÿ“ New architecture in all abstraction levels:
- ๐Ÿ”ฝ UNet; โฌ†๏ธ Multimodal Diffusion Transformer, bye cross attention ๐Ÿ‘‹
- ๐Ÿ†• Rectified flows for the diffusion process
- ๐Ÿงฉ Still a Latent Diffusion Model

๐Ÿ“„ 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness

๐Ÿ—ƒ๏ธ Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)

Variants
๐Ÿ” A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
โœ๏ธ An Instruct Edit 2B model was trained, and learned how to do text-replacement

Results
โœ… State of the art in automated evals for composition and prompt understanding
โœ… Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)

Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf
ยท