Introducing Aana SDK

Open-Source SDK Empowering the Future of Multimodal AI Applications

Aleksandr Movchan , Hossein Rashidi , Evan de Riel , Ashwin Nair Anilil , Appu Shaji

Mobius Labs GmbH


Introduction

The landscape of Artificial Intelligence is rapidly evolving, with multimodal AI at the forefront of this revolution as we stand on the cusp of a new era in technology. It's becoming increasingly clear that multimodal AI will be a cornerstone of the Generative AI stack. The ability to process and understand multiple types of data - text, images, audio, and video - simultaneously is opening doors to a new class of applications that were once the stuff of science fiction. For example, we can now achieve rich understanding of video content, enabling applications to analyze and interpret complex scenes, recognize objects and actions, transcribe speech, and even understand context and emotions.

Fig 1. An example of video understanding. The code to implement the backend using Aana SDK is available here and an explanation video at here

At Mobius Labs, we have years of experience working in computer vision, audio recognition, and multimodal applications, delivering strong AI capabilities to our enterprise customers. Therefore, we understand the challenges that come with this new frontier. Managing diverse inputs, scaling Generative AI applications, and ensuring extensibility are major hurdles that developers face today. That's why we're thrilled to announce the release of Aana SDK, our open-source software development kit designed to address these challenges head-on.

Aana SDK, named after the Malayalam word for "elephant" ("ആന" - pronounced "Aana"), is the core infrastructure that supports all our major applications. It serves as the robust infrastructure layer upon which we've built our suite of AI-powered solutions. By open-sourcing Aana SDK, we're sharing the fruits of our labor and expertise with the wider developer community, enabling others to build powerful multimodal AI applications with greater ease and efficiency.

Visit our GitHub repository https://github.com/mobiusml/aana_sdk or simply pip install aana to get started with Aana today. Join us in shaping the future of machine learning deployment and application development! To get started you can find a tutorial at https://github.com/mobiusml/aana_sdk/blob/main/docs/tutorial.md

From Prototype to Production: Aana SDK's Vision for Enterprise-Grade AI

With new multimodal models being released at an unprecedented pace, the ability to rapidly prototype and deploy new applications is not just an advantage - it's a necessity. Aana was born out of this urgent need. We built it to empower developers, data scientists, and ML engineers to keep pace with the rapidly evolving AI landscape.

Aana simplifies the complex process of integrating multiple AI models, managing various data types, and scaling applications efficiently. It's designed to be the bridge between cutting-edge AI research and practical, deployable Enterprise grade applications.

Addressing Key Challenges

Managing Multimodal Inputs
Aana provides a unified framework for handling diverse data types, from text and images to audio and video, making it easier to build truly multimodal applications. (See here for a simple tutorial to build a video summarization and chat application)
Scaling Generative AI
Built on top of Ray, a distributed computing framework, Aana allows your applications to scale seamlessly from a single machine to a cluster, ensuring that your Generative AI models can handle increasing loads. ( See here on how you scale in cloud environments).
Extensibility
We've designed Aana with the future in mind. Its modular architecture and extensive integration capabilities mean that as new models and technologies emerge, you can easily incorporate them into your existing applications. It also comes with predefined integrations with popular machine learning framework such as huggingface, VLLM etc.

Design Philosophy

To address these challenges and create a truly useful tool for the AI community, we built Aana on the following core principles

  1. Reliability: In the world of AI applications, robustness is key. Aana is designed to be fault-tolerant, gracefully handling the unexpected.
  2. Scalability: From prototype to production, Aana grows with your needs, leveraging distributed computing to scale across multiple servers effortlessly.
  3. Efficiency: We've optimized Aana for speed and resource utilization, ensuring that you get the most out of your hardware.
  4. Ease of Use: Complex doesn't have to mean complicated. Aana's modular design, with extensive automation and abstraction, makes it accessible to developers of all skill levels.

Why Open Source?

Open-source models are increasingly dominating state-of-the-art multimodal benchmarks. We believe this trend will continue in Enterprise AI, offering greater transparency, privacy, and freedom from vendor lock-in. This shift mirrors the adoption of Linux and Android in their respective domains.

By open-sourcing Aana SDK, we're aligning with this trend, empowering businesses and developers to leverage cutting-edge AI while maintaining control over their technology stack. We believe that this open-source approach will significantly simplify the process of bringing cutting-edge machine learning models into production environments. Whether you're working on a small-scale project or developing enterprise-grade applications, Aana SDK provides the flexibility and scalability you need. If you are developer we are eager to learn more on how you are using it and if you are company that wants GenAI in your stack, you can contact us at support@mobiuslabs.com

Why use a Permissive License?

We believe in the power of collaboration and open innovation. That's why we're excited to announce that we are open-sourcing Aana SDK under the permissive Apache license. This decision reflects our commitment to advancing the field of AI and empowering developers worldwide.

The choice of a permissive license is crucial for fostering innovation, collaboration, and adoption:

Foster Innovation
With the Apache license, you can use, modify, and distribute Aana SDK without worrying about legal red tape. Want to experiment with a new feature or adapt the SDK for a unique use case? Go for it. Your innovations are yours to keep - no need to disclose your source code.
Encourage Collaboration
We believe the best ideas come from collaboration. The permissive license means you can share your improvements, contribute to the core SDK, or build plugins without fear of IP conflicts. Let's solve complex AI challenges together and create something greater than the sum of its parts. Looking forward to seeing your pull requests.
Promote Adoption
Whether you're a solo developer, a startup, or an enterprise, you can integrate Aana SDK into your projects worry-free. No hidden fees, no compulsory code sharing. Use it for personal projects, open-source work, or commercial applications - it's up to you.

By open-sourcing Aana SDK under these terms, we're inviting you to join us in pushing the boundaries of multimodal AI.

Thoughts for Future

While GenAI models grow more complex, we focus on making them smaller, faster, and more scalable. Our notable projects include Extreme Quantization (See https://mobiusml.github.io/hqq_blog/ and https://mobiusml.github.io/1bit_blog/) and Fast Kernels(See https://mobiusml.github.io/whisper-static-cache-blog/ and coming soon: Fast CUDA dequantization kernels)

We envision highly capable, scalable AI applications with minimal computational overhead, anticipating growth in multimodal applications from advanced search to personalized experiences and rich analytics. Emerging trends include enhanced multimodal capabilities, agentic workflows, and embodied intelligence. On-device AI is expected to be a major trend, enabling real-time, privacy-preserving applications.

Visit our GitHub repository to get started with Aana today. Join us in shaping the future of machine learning deployment and application development!

Citation


@misc{movchan2024aanasdk,
title = {Introducing Aana SDK: Open-Source SDK Empowering the Future of Multimodal AI Applications},
url = {https://mobiusml.github.io/aana-sdk-introducing-blog},
author = {Aleksandr Movchan, Hossein Rashidi, Evan de Riel, Ashwin Nair Anilil and Appu Shaji},
month = {June},
year = {2024}
}
					

Please feel free to contact us.