Skip to content Skip to sidebar Skip to footer

0 items - $0.00 0

AI News

UltraCUA: A Foundation Computer-Use Agents Model that Bridges the Gap between General-Purpose GUI Agents and Specialized API-based Agents

AI NewsDecember 26, 202512Views 0Likes 0Comments

Computer-use agents have been limited to primitives. They click, they type, they scroll. Long action chains amplify grounding errors and waste steps. Apple Researchers introduce UltraCUA, a foundation model that builds an hybrid action space that lets an agent interleave low level GUI actions with high level programmatic tool calls. The model chooses the cheaper…

Thinking Machines Lab Makes Tinker Generally Available: Adds Kimi K2 Thinking And Qwen3-VL Vision Input

AI NewsDecember 21, 202519Views 0Likes 0Comments

Thinking Machines Lab has moved its Tinker training API into general availability and added 3 major capabilities, support for the Kimi K2 Thinking reasoning model, OpenAI compatible sampling, and image input through Qwen3-VL vision language models. For AI engineers, this turns Tinker into a practical way to fine tune frontier models without building distributed training…

Why Spatial Supersensing is Emerging as the Core Capability for Multimodal AI Systems?

AI NewsDecember 16, 202521Views 0Likes 0Comments

Even strong ‘long-context’ AI models fail badly when they must track objects and counts over long, messy video streams, so the next competitive edge will come from models that predict what comes next and selectively remember only surprising, important events, not from just buying more compute and bigger context windows. A team of researchers from…

Zhipu AI Releases GLM-4.6V: A 128K Context Vision Language Model with Native Tool Calling

AI NewsDecember 11, 202528Views 0Likes 0Comments

Zhipu AI has open sourced the GLM-4.6V series as a pair of vision language models that treat images, video and tools as first class inputs for agents, not as afterthoughts bolted on top of text. Model lineup and context length The series has 2 models. GLM-4.6V is a 106B parameter foundation model for cloud and…

Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family

AI NewsDecember 6, 202520Views 0Likes 0Comments

How can we get large model level multimodal reasoning for documents, charts and videos while running only a 3B class model in production? Baidu has added a new model to the ERNIE-4.5 open source family. ERNIE-4.5-VL-28B-A3B-Thinking is a vision language model that focuses on document, chart and video understanding with a small active parameter budget.…

Black Forest Labs Releases FLUX.2: A 32B Flow Matching Transformer for Production Image Pipelines

AI NewsDecember 1, 202525Views 0Likes 0Comments

Black Forest Labs has released FLUX.2, its second generation image generation and editing system. FLUX.2 targets real world creative workflows such as marketing assets, product photography, design layouts, and complex infographics, with editing support up to 4 megapixels and strong control over layout, logos, and typography. FLUX.2 product family and FLUX.2 [dev] The FLUX.2…

A Coding Guide to Implement Advanced Hyperparameter Optimization with Optuna using Pruning Multi-Objective Search, Early Stopping, and Deep Visual Analysis

AI NewsNovember 26, 202528Views 0Likes 0Comments

In this tutorial, we implement an advanced Optuna workflow that systematically explores pruning, multi-objective optimization, custom callbacks, and rich visualization. Through each snippet, we see how Optuna helps us shape smarter search spaces, speed up experiments, and extract insights that guide model improvement. We work with real datasets, design efficient search strategies, and analyze trial…

Meta AI Releases Segment Anything Model 3 (SAM 3) for Promptable Concept Segmentation in Images and Videos

AI NewsNovember 21, 202526Views 0Likes 0Comments

How do you reliably find, segment and track every instance of any concept across large image and video collections using simple prompts? Meta AI Team has just released Meta Segment Anything Model 3, or SAM 3, an open-sourced unified foundation model for promptable segmentation in images and videos that operates directly on visual concepts instead…

Zhipu AI Releases ‘Glyph’: An AI Framework for Scaling the Context Length through Visual-Text Compression

AI NewsNovember 1, 202540Views 0Likes 0Comments

Can we render long texts as images and use a VLM to achieve 3–4× token compression, preserving accuracy while scaling a 128K context toward 1M-token workloads? A team of researchers from Zhipu AI release Glyph, an AI framework for scaling the context length through visual-text compression. It renders long textual sequences into images and processes…

Salesforce AI Research Introduces WALT (Web Agents that Learn Tools): Enabling LLM agents to Automatically Discover Reusable Tools from Any Website

AI NewsOctober 27, 202534Views 0Likes 0Comments

A team of Salesforce AI researchers introduced WALT (Web Agents that Learn Tools), a framework that reverse-engineers latent website functionality into reusable invocable tools. It reframes browser automation around callable tools rather than long chains of clicks. Agents then call operations such as search, filter, sort, post_comment, and create_listing. This reduces dependence on large language…