生成型人工智能和操作型机器学习在现代数据环境中扮演着关键角色,使得组织可以利用其数据来推动新产品的开发并增加顾客满意度。这些技术被用于虚拟助手、推荐系统、内容生成等领域。它们帮助组织通过数据驱动的决策制定、自动化、增强的业务流程和顾客体验来建立竞争优势。
Apache Airflow is at the core of many teams’ ML operations, and with new integrations for Large Language Models (LLMs), Airflow enables these teams to build production-quality applications with the 机器学习和人工智能的最新进展.
机器学习模型和预测分析常常在孤立的环境中创建,远离生产系统和应用程序。组织面临着将单个数据科学家的笔记本转变为带有稳定性、可伸缩性、合规性等的生产就绪应用程序的持续挑战。
Organizations that standardize on one platform for 编排他们的DataOps和MLOps工作流, however, are able to reduce not only the friction of end-to-end development but also infrastructure costs and IT sprawl. While it may seem counterintuitive, these teams also benefit from more choice. When the centralized orchestration platform, like Apache Airflow, is open-source and includes integrations to nearly every data tool and platform, data and ML teams can pick the tools that work best for their needs while enjoying the benefits of standardization, governance, simplified troubleshooting, and reusability.
Apache Airflow and Astro (Astronomer’s fully managed Airflow orchestration platform) is the place where data engineers and ML engineers meet to create business value from operational ML. With a massive number of data engineering pipelines running on Airflow every day across every industry and sector, it is the workhorse of modern data operations, and ML teams can piggyback off of this foundation for not only model inference but also training, evaluation, and monitoring.
As organizations continue to find ways to leverage large language models, Airflow is increasingly front and center for the operationalization of things like unstructured data processing, 检索增强生成(RAG), feedback processing, and fine-tuning of foundation models. To support these new use-cases and to provide a starting point for Airflow users, Astronomer has worked with the Airflow Community to create Ask Astro—as a public reference implementation of RAG with Airflow for conversational AI.
更广泛地看,Astronomer已经带头开发了新的与向量数据库和LLM提供商的集成,来支持这一新型应用和所需的管道,以保持它们安全、更新和可管理。
Apache Airflow与一些最广泛使用的向量数据库(Weaviate、Pinecone、OpenSearch、pgvector)和自然语言处理(NLP)提供商(OpenAI、Cohere)的结合,通过最新的开源开发提供了可扩展性。它们共同提供了一种在RAG开发中的一流体验,适用于对话AI、聊天机器人、欺诈分析等应用。
OpenAI
OpenAI is an AI research and deployment company that provides an API for accessing state-of-the-art models like GPT-4 and DALL·E 3. The OpenAI Airflow提供商 offers modules to easily integrate OpenAI with Airflow. Users can generate embeddings for data, a foundational step in NLP with LLM-powered applications.
View tutorial → 用Apache Airflow编排OpenAI操作
Cohere
Cohere is an NLP platform that provides an API to access cutting-edge LLMs. The Cohere Airflow提供商 offers modules to easily integrate Cohere with Airflow. Users can leverage these enterprise-focused LLMs to easily create NLP applications using their own data.
View tutorial → 用Apache Airflow编排Cohere LLM操作
Weaviate
Weaviate is an open-source vector database, which stores high-dimensional embeddings of objects like text, images, audio, or video. The Weaviate Airflow提供商 offers modules to easily integrate Weaviate with Airflow. Users can process high-dimensional vector embeddings using an open-source vector database, which provides a rich set of features, exceptional scalability, and reliability.
View tutorial → 用Apache Airflow编排Weaviate操作
pgvector
pgvector is an open-source extension for PostgreSQL databases that adds the capability to store and query high-dimensional object embeddings. The pgvector Airflow提供商 offers modules to easily integrate pgvector with Airflow. Users can unlock powerful functionalities for working with vectors in a high-dimensional space with this open-source extension for their PostgreSQL database.
View tutorial → 用Apache Airflow编排pgvector操作
Pinecone
Pinecone is a proprietary vector database platform designed for handling large-scale vector-based AI applications. The Pinecone Airflow提供商 offers modules to easily integrate Pinecone with Airflow.
View tutorial → 用Apache Airflow编排Pinecone操作
OpenSearch
OpenSearch is an open-source distributed search and analytics engine based on Apache Lucene. It offers advanced search capabilities on large bodies of text alongside powerful machine learning plugins. The OpenSearch Airflow提供商 offers modules to easily integrate OpenSearch with Airflow.
View tutorial → 用Apache Airflow编排OpenSearch操作
By enabling data-centric teams to more easily integrate data pipelines and data processing with ML workflows, organizations can streamline the development of operational AI, and realize the potential of AI and natural language processing in an operational setting. Ready to dive deeper on your own? Discover available modules designed for easy integration—访问Astro Registry to see the latest AI/ML sample DAGs.