In this week’s real-time analytics news: Streaming data lake platforms get some love with the graduation of Apache Paimon from an Apache Software Foundation incubation project to a Top-Level Project (TLP).
Keeping pace with news and developments in the real-time analytics market can be a daunting task. We want to help by providing a summary of some of the important real-time analytics and AI news items our staff came across this week. Here is our list:
The Apache Software Foundation (ASF) announced that Apache Paimon has graduated from incubation and is now a Top-Level Project (TLP). Paimon is a data lake format that enables real-time lakehouse architectures built with Apache Flink and Apache Spark for streaming and batch operations. Paimon combines lake format and log-structured merge-tree (LSM) to bring real-time streaming updates into the data lake.
As a streaming data lake platform, Paimon allows users to process data in both batch and streaming modes. Feature highlights and benefits include:
- High-speed Data Processing: Paimon’s append table (no primary-key) provides large scale batch and streaming processing capability.
- Flexible Updates: Paimon gives users the flexibility of choice when updating records including deduplication to keep last row; partial-updates; aggregation records; first-row updates.
- Fast Real-time Analytics: By leveraging Flink Streaming, Paimon’s primary key table supports real-time streaming updates of large amounts of data. Paimon performs real-time query within one minute.
- Simplified Changelog Production: Paimon simplifies users’ streaming analytics by producing accurate and complete changelog updates for merge engines.
- Low-latency Data Queries: Paimon supports data compaction with z-order sorting to optimize file layout. By using indexes such as minmax, Paimon also enables fast queries based on data skipping.
The Linux Foundation announced the launch of Margo, a new open standard initiative for interoperability at the edge of industrial automation ecosystems. Drawing its name from the Latin word for edge, Margo defines the mechanisms for interoperability between edge applications, edge devices, and edge orchestration software. The open standard promises to bring much needed flexibility, simplicity, and scalability – unlocking barriers to innovation in complex, multi-vendor environments and accelerating digital transformation for organizations of all sizes.
Hosted by the Joint Development Foundation, a part of the Linux Foundation family, the initiative is supported by some of the largest automation ecosystem providers globally, including founding members ABB (including B&R), Capgemini, Microsoft, Rockwell Automation, Schneider Electric (including AVEVA) and Siemens.
MLCommons, through the MLCommons AI Safety working group, released the MLCommons AI Safety v0.5 benchmark proof-of-concept (POC). The POC focuses on measuring the safety of large language models (LLMs) by assessing the models’ responses to prompts across multiple hazard categories. MLCommons is now sharing the POC with the community for experimentation and feedback and will incorporate improvements based on that feedback into a comprehensive v1.0 release later this year.
Real-time analytics news in brief
Appian announced that it has signed a Strategic Collaboration Agreement (SCA) with Amazon Web Services (AWS) to make generative artificial intelligence (AI) more accessible to enterprise business processes. To that end, Appian will invest resources to find ways to combine Appian’s native AI capabilities and the Appian data fabric with the large language models (LLMs) provided by Amazon Bedrock and machine learning (ML) capabilities from Amazon SageMaker.
AWS announced that Meta Llama 3 is available on Amazon SageMaker JumpStart. Llama 2 has been available on Amazon SageMaker JumpStart and on Amazon Bedrock since last year. Llama 3 comes in two parameter sizes — 8B and 70B with 8k context length — that can support a broad range of use cases with improvements in reasoning, code generation, and instruction following. Llama 3 uses a decoder-only transformer architecture and a new tokenizer that provides improved model performance with a 128k size. In addition, Meta improved post-training procedures that substantially reduced false refusal rates, improved alignment, and increased diversity in model responses. Users can now derive the combined advantages of Llama 3 performance, and MLOps controls with Amazon SageMaker features such as SageMaker Pipelines, SageMaker Debugger, or container logs.
BMC announced the signing of a definitive agreement to acquire Netreo. With Netreo, the BMC Helix platform will provide customers with a full-stack, open observability, and AIOps solution. Specifically, the acquisition of Netreo will strengthen BMC’s offerings in observability and AIOps and provide customers with visibility into performance across their networks, infrastructure, and applications from a modern, open observability platform.
Cribl announced the launch of Cribl Lake, a data lake solution designed to give IT and security teams complete control and flexibility over their data. Provisioned directly from Cribl.Cloud, organizations collect, analyze, and route a complete view of all IT and Security data across the enterprise. Cribl Lake’s unified management layer allows organizations to leverage low-cost object storage, either Cribl-managed or customer-owned, automate provisioning, unify security and retention policy, and use open formats to eliminate vendor lock-in.
FICO announced major new enhancements to FICO Platform that improve and expand enterprise collaboration. The updates unlock new methods for organizations to break down silos and get more value out of their data and analytics investments. The open and extensible FICO Platform empowers customers to rapidly onboard a wider range of use cases and tap into an ecosystem of data sources and powerful analytics.
Hazelcast announced the latest release of its unified real-time data platform. The new features in Hazelcast Platform 5.4 ensure data consistency, resilience, and high performance. Specifically, Hazelcast Platform 5.4 directly addresses common challenges with several new features, including an advanced CP Subsystem for strong consistency that retains a performance advantage over other comparable systems, thread-per-core (TPC) architecture that extends Hazelcast Platform’s performance, and access to larger data volumes with Tiered Storage.
Hitachi Vantara, a subsidiary of Hitachi, Ltd., announced the availability of Hitachi Virtual Storage Platform One. The hybrid cloud platform aims to transform how organizations manage and leverage their data in today’s rapidly evolving technological landscape. Virtual Storage Platform One simplifies infrastructure for mission-critical applications, with a focus on data availability and strong data resiliency and reliability measures, including mitigation of risks such as downtime, productivity losses, and security threats.
IBM announced the availability of Meta Llama 3 — the next generation of Meta’s open large language model — on its watsonx AI and data platform. The addition of Llama 3 builds on IBM’s collaboration with Meta to advance open innovation for AI. The two companies launched the AI Alliance (a group of leading organizations across industry, startup, academia, research, and government) late last year, and it has since grown to more than 80 members and collaborators.
Immuta announced the general availability of Domains policy enforcement, a new capability in the Immuta Data Security Platform that provides additional controls for data owners to implement a data mesh architecture with domain-specific data access policies. With Domains, data owners define data controls with both broad reach and specific domain controls. This is done by mirroring structures such as business units, geographic regions, or functions. As a result, data owners have increased visibility, insights, and control over data utilization, providing increased security and governance.
Intel announced that it has built the world’s largest neuromorphic system. Code-named Hala Point, the system, initially deployed at Sandia National Laboratories, uses Intel’s Loihi 2 processor. It tackles challenges related to the efficiency and sustainability of today’s AI. Hala Point is the first large-scale neuromorphic system to demonstrate state-of-the-art computational efficiencies on mainstream AI workloads.
Kong Inc. announced the commercial availability of Kong Konnect Dedicated Cloud Gateways on Amazon Web Services (AWS). Available exclusively via Kong Konnect, the company’s unified API platform, this release simplifies deployment and offers dedicated API management, combining zero downtime upgrades and elastic scalability.
Kore.ai announced the release of the Kore.ai Experience Optimization (XO) Platform Version 11.0. The release provides capabilities that help drive AI-driven business interactions across a wide range of use cases, including customer, agent, and search experiences. New capabilities in the release include a unified experience, multi-LLM generative AI, XO GPT models, advanced Retrieval Augmented Generation (RAG), and more.
Neo4j has touted the publication of ISO and IEC’s new international standard for graph query language (GQL), which defines the data structures and basic operations for working with property graphs. The GQL standard is conceived as an ISO sibling to SQL and a close reflection of Cypher, often considered the de facto graph language. The standard is comparable in scope to the SQL-92 standard. With that standard, SQL went on to become the dominant language for accessing relational databases. GQL is expected to have a similar impact.
Neon.tech announced the transition of its platform from technical preview to general availability today. The Postgres platform is aimed at aiding the developer experience for modern applications. The platform’s serverless capabilities enable developer teams to work more efficiently by sharing the same schema and data through branching, while the scale-to-zero feature ensures cost-effectiveness by allowing individual developers to run separate databases without incurring high costs.
Pegasystems introduced Pega GenAI Coach, a generative AI-powered mentor for Pega solutions that proactively advises users to help them achieve optimal outcomes. Leveraging an organization’s own best practices for sales, service, and operations, Coach quickly analyzes a user’s work and relevant data in context to intelligently guide them toward better and faster results. Specifically, Pega GenAI Coach directly integrates into workflows and acts as an always-on mentor within Pega solutions. It analyzes work and guides users with salient advice to overcome roadblocks.
Qlik announced the AI Accelerator. This service is designed as an initial foray for companies looking to explore the possibilities AI can offer, acting as an entry point into the broader landscape of AI-driven analytics. Designed for swift deployment within existing Qlik applications, this service allows companies to practically apply and experiment with AI’s capabilities in a low-commitment manner.
SAS introduced lightweight, industry-specific AI models for individual licenses. With the offering, SAS is equipping organizations with readily deployable AI technology to productionize real-world use cases with efficiency. Additionally, the company announced it has expanded the generative AI capabilities of SAS Viya Expands with the new data maker and industry-specific assistants.
ThoughtSpot announced a series of initiatives to help developers and product builders seamlessly integrate GenAI into their apps and services more efficiently and cost-effectively. Some of the initiatives include a new pricing edition, a Vercel Marketplace listing, support channels, and new courses and certifications. Specifically, the company announced new features and offerings, including a Developer Edition, Vercel Marketplace Integration, Discord Channel, and new ThoughtSpot embedded courses and certifications.
If your company has real-time analytics news, send your announcements to [email protected].
In case you missed it, here are our most recent previous weekly real-time analytics news roundups: