AI Weekly 012

AI Weekly 012

🆕 What's New?

Product Update:

  • Nodes and Group nodes support collapsing.
  • Group nodes can be dragged and dropped.
  • Improved the wiring experience with enhanced auto-snap capabilities for connections to node endpoints.
  • Allows setting startup modes to manually control floating-point precision and VAE precision. If you encounter blurred images or black screens, try setting the floating-point or VAE precision to FP32.
  • Fixed some known bugs:
    • Fixed issues with Windows system notifications compatibility.
    • Resolved problems with some Terminal features not working on Windows systems.
    • Fixed errors when importing special Workflows.

Download link: Comflowyspace (opens in a new tab)

Weekly‘s AI highlights

🏗️ Plugins worth trying

Marigold is an advanced monocular depth estimation diffusion model that combines the powerful capabilities of modern generative image models with a specific fine-tuning protocol to accurately infer the 3D depth of a scene from a single image. This model is trained using synthetic data, enhancing its adaptability to diverse scenes, and supports zero-shot learning, enabling it to provide outstanding depth estimation results even in the absence of prior data.

The Comfy plugin is a simple plugin designed for ComfyUI, allowing users to create an image grid (also known as an X/Y Plot) within the ComfyUI environment. The main feature of this plugin is that it offers a user-friendly interface and more setting options, enabling users to display and arrange images more flexibly and efficiently.

ComfyUI-Catcat is an extension that transforms waiting times into fascinating spectacles, infusing joy into ComfyUI rendering with random cat GIFs, making every loading moment an unexpected delight.

📄 Noteworthy papers and technic

MindEye2 is a groundbreaking neuroimaging technology capable of reconstructing visual images in the human brain based on just 1 hour of functional Magnetic Resonance Imaging (fMRI) data. Through the use of pre-trained models and cross-subject data sharing, MindEye2 not only achieves the transformation of brain activity into images but also improves the quality of image reconstruction. Its code is now publicly available on GitHub, opening new possibilities for neuroscience, AI, and medical imaging.

Glyph-ByT5 is a text encoder that enhances character recognition and glyph alignment by fine-tuning the ByT5 encoder in conjunction with selected datasets, significantly improving the accuracy of text rendering. Integrated with SDXL, it greatly enhances the text rendering capabilities of design images, increasing accuracy from 20% to 90%, and optimizes the automatic layout of long texts.

🛠️ Products you should try

SystemAnimatorOnline is a powerful AI tool for full-body and facial motion tracking. It can track movements through a webcam or video, enabling users to control virtual characters with their own movements, introducing a new way of interaction for virtual live streaming and video production. Moreover, it supports the recording and exporting of 3D avatars and motions, and even the loading of 3D actions and custom scenes. With compatibility for the VMC protocol and transparent backgrounds, it opens up broader possibilities for virtual content creation.

Stable Video 3D (SV3D), developed by Stability AI, is a leading 3D content creation model that can transform a flat image into a video or 3D model viewable from multiple angles. SV3D features two modes: SV3D_u, which can generate 360-degree panoramic videos, and SV3D_p, which can create customized 3D videos based on a specific path.

Pipio is a video AI dubbing tool designed specifically to translate dialogues in videos into various languages and automatically dub them imitating the original voice. It also ensures that the dubbing syncs with the characters' lip movements, enhancing the viewing experience for the audience. This tool greatly simplifies the process of creating multilingual video content, improves work efficiency, and ensures the natural flow and professional quality of the videos. Whether you are a content creator or work in a multinational company, if you need video localization, Pipio is a tool worth trying.

Suno AI is a music creation platform powered by deep learning technology, which allows users to quickly create professional-level music based on their textual prompts. This product was developed by a professional team in Cambridge, Massachusetts. Additionally, they have made two of their models, Bark TTS and Chirp, available to the public as open-source. Friends interested in AI-generated music might find it worth trying.

Subscribe for free to receive new posts and support my work. Or join our Discord.