生成型人工智能和操作型机器学习通过使组织能够利用数据驱动新产品和提升客户满意度,在现代数据领域中扮演着关键角色。这些技术被用于虚拟助理、推荐系统、内容生成等方面。它们帮助组织通过数据驱动的决策制定、自动化、增强的业务流程和客户体验来构建竞争优势。
Apache Airflow is at the core of many teams’ ML operations, and with new integrations for Large Language Models (LLMs), Airflow enables these teams to build production-quality applications with the 机器学习和人工智能的最新进展.
机器学习模型和预测性分析往往是在孤立的环境中创建的,远离生产系统和应用程序。组织面临着将单个数据科学家的笔记本转变为具有稳定性、可伸缩性、符合性等特点的生产就绪应用程序的持续挑战。
Organizations that standardize on one platform for 协调他们的DataOps和MLOps工作流程, however, are able to reduce not only the friction of end-to-end development but also infrastructure costs and IT sprawl. While it may seem counterintuitive, these teams also benefit from more choice. When the centralized orchestration platform, like Apache Airflow, is open-source and includes integrations to nearly every data tool and platform, data and ML teams can pick the tools that work best for their needs while enjoying the benefits of standardization, governance, simplified troubleshooting, and reusability.
Apache Airflow and Astro (Astronomer’s fully managed Airflow orchestration platform) is the place where data engineers and ML engineers meet to create business value from operational ML. With a massive number of data engineering pipelines running on Airflow every day across every industry and sector, it is the workhorse of modern data operations, and ML teams can piggyback off of this foundation for not only model inference but also training, evaluation, and monitoring.
As organizations continue to find ways to leverage large language models, Airflow is increasingly front and center for the operationalization of things like unstructured data processing, 检索增强生成(RAG), feedback processing, and fine-tuning of foundation models. To support these new use-cases and to provide a starting point for Airflow users, Astronomer has worked with the Airflow Community to create Ask Astro—as a public reference implementation of RAG with Airflow for conversational AI.
更广泛地说,Astronomer引领了与向量数据库和LLM提供商的新集成开发,以支持这种新型的应用程序及其所需的管线,以保持它们的安全、新鲜和易于管理。
Apache Airflow结合一些最广泛使用的向量数据库(Weaviate、Pinecone、OpenSearch、pgvector)和自然语言处理(NLP)提供商(OpenAI、Cohere),提供了通过最新的开源开发的可扩展性。一起,它们在如对话AI、聊天机器人、欺诈分析等应用程序中的RAG开发上提供了一流的体验。
OpenAI
OpenAI is an AI research and deployment company that provides an API for accessing state-of-the-art models like GPT-4 and DALL·E 3. The OpenAI Airflow提供者 offers modules to easily integrate OpenAI with Airflow. Users can generate embeddings for data, a foundational step in NLP with LLM-powered applications.
View tutorial → 用Apache Airflow编排OpenAI操作
Cohere
Cohere is an NLP platform that provides an API to access cutting-edge LLMs. The Cohere Airflow提供者 offers modules to easily integrate Cohere with Airflow. Users can leverage these enterprise-focused LLMs to easily create NLP applications using their own data.
View tutorial → 用Apache Airflow编排Cohere LLMs
Weaviate
Weaviate is an open-source vector database, which stores high-dimensional embeddings of objects like text, images, audio, or video. The Weaviate Airflow提供者 offers modules to easily integrate Weaviate with Airflow. Users can process high-dimensional vector embeddings using an open-source vector database, which provides a rich set of features, exceptional scalability, and reliability.
View tutorial → 用Apache Airflow编排Weaviate操作
pgvector
pgvector is an open-source extension for PostgreSQL databases that adds the capability to store and query high-dimensional object embeddings. The pgvector Airflow提供者 offers modules to easily integrate pgvector with Airflow. Users can unlock powerful functionalities for working with vectors in a high-dimensional space with this open-source extension for their PostgreSQL database.
View tutorial → 用Apache Airflow编排pgvector操作
Pinecone
Pinecone is a proprietary vector database platform designed for handling large-scale vector-based AI applications. The Pinecone Airflow提供者 offers modules to easily integrate Pinecone with Airflow.
View tutorial → 用Apache Airflow编排Pinecone操作
OpenSearch
OpenSearch is an open-source distributed search and analytics engine based on Apache Lucene. It offers advanced search capabilities on large bodies of text alongside powerful machine learning plugins. The OpenSearch Airflow提供者 offers modules to easily integrate OpenSearch with Airflow.
View tutorial → 用Apache Airflow编排OpenSearch操作
By enabling data-centric teams to more easily integrate data pipelines and data processing with ML workflows, organizations can streamline the development of operational AI, and realize the potential of AI and natural language processing in an operational setting. Ready to dive deeper on your own? Discover available modules designed for easy integration—访问Astro注册表 to see the latest AI/ML sample DAGs.