5/22 AI Express: Google Jules Empowers Programming / Gemini Diffusion Reshapes Text

Today’s News Snapshot

🤖 Google Jules: New AI Programming Assistant

✨ Google DeepMind Gemini Diffusion: New Paradigm for Text Generation

🌐 Microsoft Build 2025: Towards an Open Agentic Web

🧠 NVIDIA Cosmos-Reason1: New Breakthrough in AI Physical Reasoning

01 🤖 Google Jules: New AI Programming Assistant

At the I/O developer conference, Google announced that Jules, its AI programming assistant based on the Gemini 2.5 Pro model, has entered global public beta. Jules aims to significantly boost development efficiency by automating tedious coding tasks such as bug fixes, unit test writing, dependency updates, and new feature implementation. It operates in a unique “asynchronous” manner, capable of handling multiple tasks simultaneously and providing detailed code change overviews and reasoning upon completion. It’s deeply integrated with GitHub and runs in an isolated cloud environment to ensure code privacy.

Highlights:

Asynchronous Processing: Jules can clone user codebases in Google Cloud virtual machines and process multiple coding tasks in parallel. Developers can submit tasks and immediately proceed with other work without waiting, significantly improving development efficiency, especially in large and complex projects.
Transparent Control: Before executing tasks, Jules generates a detailed plan and reasoning for developers to preview and adjust. Upon completion, it provides a code diff view and optional audio change logs, ensuring developers have full control and understanding of the AI’s modifications.
Deep Integration: Jules seamlessly integrates with GitHub, running within existing workflows without additional configuration. It can automatically generate multi-step development plans and submit code changes as GitHub Pull Requests (PRs), even automatically adjusting code style.
Privacy Protection: Google commits that Jules operates in an isolated cloud environment, and users’ private code will not be used for model training, ensuring code confidentiality and security. This is crucial for enterprises handling sensitive or proprietary code.

Value Insights: For Professionals: Jules signifies AI programming’s shift from “code completion” to “full-process automation,” allowing developers to focus more on innovation and complex problem-solving. For enterprises, its privacy protection and efficiency gains make it an ideal choice. It currently supports Python and JavaScript, with future expansion to more languages. For the General Public: The AI Agent trend embodied by Jules will accelerate the innovation and iteration of software and applications, meaning we will experience higher-quality, AI-driven products faster. In the long term, it may lower the barrier to software development, enabling more non-professionals to build applications using natural language.

02 ✨ Google DeepMind Gemini Diffusion: New Paradigm for Text Generation

Google DeepMind has launched the experimental language model, Gemini Diffusion, which employs the “diffusion” technique, well-established in image generation, to revolutionize traditional word-by-word text prediction. The system starts from random noise and sculpts it into complete text segments through multiple iterations, akin to “sculpting rather than writing.” This method allows for mid-course corrections, enabling finer control over the output, generating more coherent and logical text. It excels in tasks like code generation and text editing and achieves an astonishing generation speed of 1479 tokens per second.

Highlights:

Diffusion Generation: This marks the first large-scale application of diffusion models to text generation. It starts from random noise and, through multiple denoising iterations, transforms the noise into coherent text, enabling “sculptural” content creation that better maintains overall context and logic.
Extreme Speed: The model can generate complete text segments in one go, with an average sampling speed of up to 1479 tokens per second, and up to 2000 tokens per second in programming tasks. Initial latency is as low as 0.84 seconds, hailed as “software innovation skipping several generations of hardware upgrades.”
Global Coherence: The diffusion mechanism allows for mid-course corrections and consideration of global context during generation, significantly improving the logical coherence and consistency of long texts. This is expected to address issues like “hallucinations” and content drift often seen in traditional models with long texts.
Programming Advantage: In programming benchmarks like HumanEval and MBPP, its performance is comparable to Gemini 2.0 Flash Lite, with a slight advantage in specific programming tasks, demonstrating immense potential in code generation.

Value Insights: For Professionals: Gemini Diffusion represents a paradigm shift in text generation technology. AI-generated text will be more logical, coherent, and controllable, especially beneficial for code writing, technical reports, and script creation. Its high speed will greatly improve efficiency and offer new avenues for solving the AI hallucination problem. For the General Public: It portends a more fluid and natural interaction with AI in the future, with higher quality and more reliable AI-assisted writing, intelligent customer service, and other experiences. It may foster more intelligent text editing tools or AI assistants that “sculpt” content based on user intent, lowering the barrier to complex text creation.

03 🌐 Microsoft Build 2025: Towards an Open Agentic Web

At Microsoft Build 2025 developer conference, CEO Satya Nadella announced a series of significant enhancements to the AI Agent ecosystem, emphasizing autonomy, interoperability, and openness. GitHub Copilot has been upgraded from “pair programming” to more autonomous “peer programming,” capable of independently handling complex development tasks. Microsoft introduced new open protocols to foster cross-platform AI Agent collaboration and integrated multi-agent orchestration capabilities into Copilot Studio. Simultaneously, the open-source project NLWeb was released, aiming to build an “Open Agentic Web” that allows users to interact deeply with any website using natural language.

Highlights:

Copilot Upgrade: GitHub Copilot has evolved into an autonomous “peer programmer,” capable of independently executing complex tasks like bug fixes, new feature development, and code refactoring, and autonomously submitting pull requests. Developers primarily take on task assignment and review roles.
Open Protocols: New open protocols have been introduced, aiming to provide a unified communication and collaboration standard for AI Agents across different platforms and organizations, laying the groundwork for a decentralized, highly collaborative AI Agent ecosystem.
Multi-Agent Orchestration: Copilot Studio now features multi-agent orchestration capabilities, enabling multiple AI Agents to delegate tasks to each other and collaborate to complete complex, cross-functional workflows, such as automated employee onboarding processes.
NLWeb Release: The open-source project NLWeb was launched to simplify AI interface and website integration, allowing developers to add chat interfaces to websites with minimal code, enabling deep interaction with website content via natural language. Nadella likened it to “the HTML of the Agentic Web.”

Value Insights: For Professionals: Copilot’s autonomy will liberate developers to focus on innovation. Open protocols and multi-agent orchestration will make enterprise-level applications more modular and intelligent, reducing integration costs. NLWeb provides web developers with a convenient way to integrate AI capabilities, expanding AI application scenarios. For the General Public: The “Open Agentic Web” will bring about a more intelligent, personalized, and seamless digital experience driven by AI Agents, such as automatic schedule management and information filtering. NLWeb may make websites “talk,” allowing users to interact via natural language, greatly enhancing the convenience and accessibility of digital life.

04 🧠 NVIDIA Cosmos-Reason1: New Breakthrough in AI Physical Reasoning

NVIDIA has released Cosmos-Reason1, a suite of multimodal large language models (available in 7B and 56B versions) designed to significantly enhance AI’s physical common sense and embodied reasoning capabilities. Its goal is to bridge the gap between abstract AI reasoning and real-world applications, enabling AI systems to better perceive, understand, and act in dynamic physical environments. Cosmos-Reason1 employs a unique dual ontology system, a decoder-only LLM architecture combined with a visual encoder, and is trained on a massive dataset of annotated video-text pairs. It is optimized through supervised fine-tuning (SFT) and reinforcement learning (RL), excelling particularly in predicting physical consequences and evaluating action feasibility.

Highlights:

Physical Common Sense: The core objective is to imbue AI with “physical common sense,” enabling it to understand and predict the behavior and outcomes in the physical world, making AI operate more reliably in real-world scenarios.
Dual Ontology: It uses a unique dual ontology system: one for physical common sense subdivided into 16 subcategories like space and time; and another for reasoning capabilities mapped to five types of embodied agents (humans, robots, etc.), guiding training and providing evaluation benchmarks.
Multimodal Fusion: Based on a decoder-only LLM and enhanced with a visual encoder, it can process and integrate both visual (video) and text data simultaneously, improving AI’s perception and reasoning abilities in the physical world.
Reinforcement Learning: Trained in two phases: physical AI supervised fine-tuning (SFT) and physical AI reinforcement learning (RL). The RL phase utilizes rule-based and verifiable rewards derived from human annotations and video self-supervised tasks, enhancing causal reasoning and decision-making capabilities.

Value Insights: For Professionals: This is a crucial step for AI moving from the “digital world” to the “physical world.” For fields like robotics, autonomous driving, and industrial automation, it means AI systems will more robustly understand and cope with the complexities and uncertainties of the real world, accelerating the commercialization and practical application of cutting-edge technologies. For the General Public: As these models advance, we will see more intelligent robots and autonomous vehicles that can understand physical laws, predict consequences, and adapt to changing environments. This means safer autonomous driving, more efficient smart factory robots, and more intelligent home service robots, improving quality of life and safety.

Today’s Summary

The launch of Google Jules: New AI Programming Assistant signifies that AI programming tools are moving from assistance to full-process automation. Jules, based on the Gemini 2.5 Pro model, enhances developer efficiency through asynchronous, transparent, and deeply integrated GitHub workflows, while promising code privacy. This heralds an autonomous wave of AI Agents in software development.

Google DeepMind Gemini Diffusion: New Paradigm for Text Generation brings a revolution in text generation technology. This experimental model is the first to apply the “diffusion” technique from image generation to text, achieving extremely fast, globally coherent text generation, particularly excelling in code generation. It is expected to address issues like coherence and “hallucinations” in long texts from traditional language models, opening new avenues for AI content creation.

Microsoft Build 2025: Towards an Open Agentic Web conference paints a future vision of AI Agent interoperability and openness. GitHub Copilot has been upgraded to an autonomous “peer programmer,” Microsoft has introduced open protocols and multi-agent orchestration capabilities, and released the NLWeb project, aiming to build an “Open Agentic Web” driven by AI Agents. This will profoundly change software development and how users interact with the digital world.

NVIDIA Cosmos-Reason1: New Breakthrough in AI Physical Reasoning fills a gap in AI’s physical common sense and embodied reasoning. This multimodal model suite, through its dual ontology system, fusion of vision and text, and reinforcement learning training, enables it to better understand and predict the physical world.

5/22 AI Express: Google Jules Empowers Programming / Gemini Diffusion Reshapes Text

热门文章

Digitalize Your Business Now

Contact Us

Email: hello@zcdigitalsglobal.com

Phone: +1 347-474-0304 (USA) / +86 185-1613-9433 (China)

WhatsApp:

WeChat: 185-1613-9433

Or Scan QR Code below

Contact us for a consultation

Please email us at hello@zcdigitalsglobal.com to get a free consultation

Or click below to talk to our AI representative for more information.

5/22 AI Express: Google Jules Empowers Programming / Gemini Diffusion Reshapes Text

思考分类

标签

热门文章

项目案例