A practical 90-day guide to mastering AI using off-the-shelf models, frameworks, and platforms, designed for CTOs, developers, and IT decision-makers.

AI Roadmap: From Zero to Expert in 90 Days with Off-the-Shelf Tools

Introduction: AI Within Reach – Practical Acceleration Without Theory from Scratch

We are witnessing the dynamic development of artificial intelligence, opening doors for companies to optimize and innovate. However, many organizations face a dilemma: how to quickly leverage these opportunities without months, or even years, of investing in learning fundamental algorithms from scratch? Instead of delving into the academic-level implementation details of numpy or pytorch, which is undoubtedly valuable but time-consuming, I propose a pragmatic approach. Ignoring AI's potential, especially in current times, is a straight path to losing competitive advantage. Processes remain inefficient, and valuable business opportunities are missed because the tools or knowledge to capitalize on them are lacking. Imagine if time-consuming data analysis, offer personalization, or customer support could operate much more efficiently – this is precisely what AI offers. This roadmap is a practical guide that, over 90 days, will lead you through the world of ready-made models, frameworks, and platforms. It's aimed at CTOs, developers, product managers, entrepreneurs, and IT decision-makers in companies employing 50 to 500 people – those who want real, implementable solutions. Upon completing this path, you will be able to independently design, build, and deploy applications based on Large Language Models (LLMs), Retrieval Augmented Generation (RAG) systems, and simple yet effective AI agents.

Foundation (Days 1-10): Understanding the Modern AI Ecosystem and Getting Started Locally

Before we start building, we need to understand the basic rules of the game and prepare our workshops. The first 10 days will be dedicated to grasping the context and launching our initial tools.

The "AI from Ready-Made Blocks" Philosophy

The approach presented here relies on using existing, often highly advanced, components. Instead of reinventing the wheel, we leverage the work of thousands of researchers and engineers who have already created powerful models and tools. In my practice, I've seen how such an approach can shorten the time needed to deliver a working prototype or even a finished product from many months to just a few weeks. The time-to-market reduction can be as high as 80-90% compared to a scenario where we build everything from the ground up.

Key Concepts – A Quick Overview

We won't delve into the complex mathematics behind AI here. We'll focus on understanding what the key pieces of the puzzle are and how they work:

LLM (Large Language Models): These are the brains of our operations. Models like OpenAI's GPT-4/GPT-4o, Anthropic's Claude series, or open-source models like Llama 3, Mistral, or Phi-3, have been trained on vast amounts of text data and can generate text, answer questions, translate, summarize, and much more. We typically access them via API or, in the case of open-source models, we can run them locally.
RAG (Retrieval Augmented Generation): Standard LLMs have knowledge "frozen" at the time of their training. RAG is a technique that allows LLMs to access current, private databases or your company's documents. In short: the model first searches for relevant information in your knowledge base and then uses it to formulate an answer. This is key to personalization and fact-based responses.
AI Agents: These are more autonomous systems that not only respond to queries but can also plan and execute sequences of tasks, using various tools (e.g., web search, calculator, APIs of other systems) to achieve a specific goal.
Vector Databases (e.g., Milvus, Pinecone, Weaviate): Specialized databases optimized for storing and searching vector embeddings. Embeddings are numerical representations of data (e.g., text) that capture their semantic meaning. They are fundamental to RAG systems, enabling the rapid retrieval of text fragments semantically similar to the user's query.

Setting Up Your Environment – Your Local AI Lab

To work effectively, we need a properly configured environment:

Python: Version 3.10 or newer. For dependency management, I recommend Poetry or Conda – they help avoid version conflicts and make environment replication easier.
Docker: Essential for running many tools (e.g., Milvus, Ollama in some configurations) in isolated containers. It guarantees environment consistency between development and production.
Development Tools: VS Code with appropriate extensions like Python (from Microsoft), Docker, Pylance, and potentially Jupyter later on. A well-configured IDE significantly speeds up work.

First Steps with Ollama: Running Powerful LLMs Locally

Ollama is a fantastic tool that allows for incredibly simple downloading and running of popular LLMs (like Llama 3, Mistral, Phi-3, Gemma) on your own computer (Windows, macOS, Linux).

Installation and Configuration: The process is trivially simple. Just download the installer from the official Ollama website and follow the instructions. After installation, models are downloaded via a terminal command, e.g., ollama pull llama3.
Practical Example: After downloading a model, we can interact with it directly in the terminal: ollama run llama3 "Tell me briefly about the Renaissance". Ollama also provides a local API server (by default on port 11434), allowing integration with your Python scripts or other tools.
Benefits:
- Data Privacy: Your queries and data do not leave your computer. This is crucial for sensitive information.
- No API Costs: Experiment freely without worrying about bills for using cloud models.
- Rapid Testing: Ideal for prototyping and testing different models.
Micro-CTA: Check out the official Ollama documentation and download your first model today!

Phase 1 (Days 11-40): Building Intelligent Applications with Langchain and Commercial Models

With the basics and a local environment with Ollama, it's time to move on to building more complex applications. In this phase, we'll focus on Langchain as the main framework and on integrating with commercial API models, which often offer the highest performance.

Introduction to Langchain: The Backbone of Your AI Applications

Langchain is an open-source framework that significantly simplifies the creation of LLM-based applications. It provides modular components and ready-made "chains" for common tasks.

Key Langchain Components We'll Focus On:

Models: Abstractions for interacting with various LLMs (LLMs for text models, ChatModels for conversational models) and embedding models (Embeddings). Langchain supports integration with OpenAI, Claude, Hugging Face models, and local models via Ollama.
Prompts (Prompt Templates): Allow dynamic creation of queries to LLMs based on templates and input variables. PromptTemplate and ChatPromptTemplate are fundamental.
Chains: Sequences of component calls (e.g., prompt -> model -> output parser). Langchain promotes the use of LCEL (Langchain Expression Language) – a declarative way to compose chains that facilitates streaming, batch processing, and asynchronicity.
Output Parsers: Tools for structuring LLM responses, e.g., converting text to JSON format or Pydantic objects.

Architecture of a Typical Langchain Application:

Imagine a simple flow:

User inputs data (e.g., a question).
PromptTemplate formats this data into an appropriate query for the LLM.
Model (e.g., OpenAI) processes the query and generates a response.
OutputParser (optionally) transforms the raw model response into the desired structure.
The application presents the processed response to the user. (At this point in the target article, I would include a simple block diagram illustrating this flow).

Integrating with Model APIs: OpenAI (GPT-4/GPT-4o) and Claude AI

Commercial models like OpenAI's GPT-4o or Anthropic's Claude 3 Opus often set the standards for quality and capabilities. Langchain makes their integration easier.

Secure API Key Management: Never place API keys directly in your code! Use the python-dotenv library to load keys from environment variables (a .env file).
Model Comparison:
- OpenAI (GPT-4o, GPT-4 Turbo): Usually the leader in terms of general reasoning, coding, and creativity. GPT-4o is faster and cheaper than GPT-4 Turbo, while also offering multimodality.
- Claude AI (Claude 3 Opus, Sonnet, Haiku): Very strong in tasks requiring long context (e.g., analyzing large documents), generating long texts, and complex reasoning. Sonnet and Haiku models offer a great price-to-performance ratio for less demanding tasks.
Costs: Always check the current API pricing. Remember that cost depends on the number of tokens (input and output).

PydanticAI: Structuring and Validating AI Data in a Flash

A common problem when working with LLMs is that they generate text which then needs to be processed to extract specific information. PydanticAI (or using standard Pydantic models with appropriate parsers in Langchain, like PydanticOutputParser or JsonOutputParser) solves this problem.

Problem: An LLM returns a product description as continuous text, but we need the name, price, and list of features in a structured form.
Solution: We define a Pydantic model describing the desired data structure. Langchain, using the LLM's ability to format responses (e.g., as JSON) and a parser, automatically populates our Pydantic model.

Building Your First RAG System: Your LLM with Access to Your Own Knowledge

RAG systems allow LLMs to answer questions based on your specific data, not just the general knowledge they were trained on. This is a breakthrough in creating useful, contextual AI applications.

Introduction to Milvus:

Milvus is a popular, highly scalable open-source vector database. Ideal for storing and searching document embeddings.

Installation: The easiest way is to run Milvus Lite (for small projects and development) via pip: pip install pymilvus milvus or the full version via Docker Compose, using the official configuration files. For development, alternatives like FAISS, ChromaDB, or even simple files often suffice, but Milvus provides a solid foundation for future scaling.
Basic Collection Configuration: In Milvus, data is stored in "collections." We define a collection schema, specifying, among other things, the dimensionality of embedding vectors and the similarity metric (e.g., cosine).

RAG Process Step-by-Step with Langchain:

Document Preparation and Splitting (Document Loaders, Text Splitters): Langchain offers DocumentLoaders for loading data from various sources (PDF, TXT, CSV, web pages, Notion, etc.). Then TextSplitters (e.g., RecursiveCharacterTextSplitter) divide long texts into smaller chunks to fit into the LLM's context and be efficiently processed into embeddings.
Generating Embeddings: Each text chunk is converted into a numerical vector (embedding) using an embedding model. Langchain integrates with many, e.g., OpenAIEmbeddings (paid, high quality), HuggingFaceEmbeddings (free, many options, e.g., sentence-transformers/all-MiniLM-L6-v2), or embeddings available via Ollama.
Storing Embeddings in Milvus (or another vector database): Langchain provides the Milvus class (or Chroma, FAISS for other databases) as a VectorStore. Vectors, along with the original content of the chunks and metadata, are saved in the database.
Querying: When a user asks a question:
- The question is converted into an embedding using the same model.
- The vector database is searched for chunks with embeddings most similar to the question's embedding (e.g., top 3-5 chunks).
- These chunks (context) are appended to the original question and passed to the LLM, which generates an answer based on the provided context.

Mini-project: Q&A bot on your own documents. Implement the steps above using a few of your own PDF or TXT files. Test how the system answers questions about the content of these documents. This will give you a tangible sense of RAG's power. (At this point in the target article, I would include a diagram illustrating the data flow in a RAG system: Question -> Embedding -> Search in VectorDB -> Context + Question -> LLM -> Answer).

Phase 2 (Days 41-70): Advanced Flows, Agents, and Process Automation

Having mastered the basics of Langchain and RAG, we are ready to create more complex, intelligent systems. We will focus on agents capable of making decisions and automation that connects AI with other systems.

LangGraph: Building Cyclic and Stateful AI Agents

Standard chains in Langchain are typically sequential. LangGraph, an extension of Langchain, allows for the creation of more complex, cyclic flows where an agent can repeatedly use tools, make decisions, and modify its internal state – more closely simulating the human problem-solving process.

Limitations of Standard Chains: Difficulty in implementing loops, conditional execution of steps, or dynamic tool selection.
Concept of a State Graph: LangGraph models an application as a graph where:
- Nodes: Functions or Langchain chains that modify the state. Each node receives the current state and returns its update.
- Edges: Define the control flow between nodes. They can be unconditional (always proceed to the next node) or conditional (choose the next node based on the state).
- State: An object (often a dictionary or Pydantic instance) passed between nodes, aggregating all information needed by the agent.
Example Code (conceptual, simplified research agent): Imagine an agent tasked with writing a report on a given topic. (At this point in the target article, I would include a diagram illustrating such a cyclic agent with LangGraph, showing nodes and conditional transitions).
- Initial State: {"topic": "Impact of AI on the job market", "research_data": [], "report_draft": None, "iterations": 0}
- research Node: Uses a tool for web searching (e.g., Tavily Search API, integrated with Langchain), adds found information to research_data.
- draft_report Node: Based on research_data, generates an initial version of the report, saves it in report_draft.
- critique_report Node (conditional): If iterations < 3, an LLM evaluates the report, identifies shortcomings. If the report is OK or iterations >= 3, it proceeds to the end. Otherwise, it returns to research with improvement suggestions.

Smol Agent: Rapid Prototyping of Task-Oriented Agents

Smol Agent (or smol-dev as its main use case is often referred to) is an approach and set of tools for quickly scaffolding code for entire applications or more complex agents by an LLM. The philosophy suggests that instead of one large, monolithic agent, it's better to have many small, specialized agents (or code modules generated by an agent) that collaborate.

Philosophy: "Think small." Instead of trying to build an agent that does everything, Smol Agent helps an LLM generate a project structure and individual code files based on a high-level description. The user often iteratively refines and develops the generated code.
Architecture and How to Start: The original smol-dev is a Python script that can generate an entire project structure from a single prompt file. The key is to formulate a good initial prompt that describes what needs to be created, what technologies should be used, and what the main functionalities should be.
Practical Application:
- Code-Generating Agent: The most common example – you ask to create a simple web application in Flask with specific functionality, and the LLM (via the Smol Agent mechanism) generates app.py, templates/index.html files, etc.
- Task-Planning Agent: It can be adapted to generate action plans for more complex tasks, breaking them down into smaller, manageable steps.
Comparison with LangGraph:
- LangGraph: For building agents with a defined, often cyclic, operational logic, where the agent itself makes decisions and uses tools in real-time.
- Smol Agent: More for "one-shot" generation of artifacts (e.g., code, plan) based on a detailed description. It focuses less on real-time autonomy and more on supporting the developer. In my assessment, it's a great tool to kick-start a project, but the generated code often requires manual verification and refinement.

n8n: No-Code/Low-Code Workflow Automation with Integrated AI

AI applications rarely operate in a vacuum. They need to communicate with other company systems. n8n is a powerful open-source workflow automation platform that allows you to visually connect hundreds of applications and services, including your own AI solutions.

Problem: How to integrate a Langchain-based chatbot with a CRM system, customer database, Excel, or Slack without writing dozens of lines of code to handle each system's API?
Overview of n8n Capabilities:
- Visual Editor: You create workflows by dragging and connecting "nodes."
- Hundreds of Pre-built Nodes: For popular services (Google Sheets, Gmail, Slack, Discord, SQL databases, CRM systems like HubSpot, Salesforce) and generic nodes (HTTP Request, Function for writing custom JS/Python code).
- Hosting: You can use n8n Cloud (paid) or self-host (e.g., on Docker).
Integrating n8n with Langchain/Ollama/Model APIs:
- You can call your Langchain applications (e.g., exposed as an API via FastAPI) using the HTTP Request node in n8n.
- You can directly communicate with OpenAI/Claude APIs or a local Ollama API from n8n.
- n8n also has dedicated AI nodes, e.g., "OpenAI Node," "Hugging Face Node."
Example Workflow in n8n:
1. Trigger: New email in Gmail with a specific subject (e.g., "Quote request").
2. OpenAI Node / HTTP Request Node to your Langchain API: Pass the email content to an LLM for:
  - Query classification (e.g., "product A," "service B").
  - Extraction of key information (contact details, customer needs) – a Pydantic parser works great here.
3. Google Sheets Node: Save the extracted data in the appropriate spreadsheet.
4. Slack Node: Send a notification to a dedicated channel with a summary of the query and a link to the sheet. (At this point in the article, I would include a screenshot showing such an example workflow in the n8n interface).
Micro-CTA: Download n8n desktop and test integration with the OpenAI API in 15 minutes by automating a simple process.

Phase 3 (Days 71-90): Production, Optimization, and Continuous Skill Development

The final phase is moving from prototypes to more polished solutions, optimizing them, and ensuring our knowledge remains current.

Prompt Engineering Basics: How to Talk to AI to Get the Best Results

The quality of an LLM's response largely depends on the quality of the prompt. It's both an art and a science.

Key Techniques:
- Zero-shot prompting: You simply ask the question without examples.
- Few-shot prompting: You provide a few examples (input/output) in the prompt to show the LLM the kind of response you expect.
- Chain-of-Thought (CoT) prompting: You encourage the model to "think step by step" before giving the final answer, e.g., by adding the phrase "Let's think step by step" to the prompt. This often improves response quality in complex tasks.
- Role-playing: You instruct the LLM to adopt a specific role, e.g., "You are a marketing expert. Write an ad copy for...".
Practical Tips:
- Be precise and unambiguous: Avoid vagueness. The more accurately you describe what you expect, the better the result.
- Provide context: If necessary, give the model essential information in the prompt.
- Specify output format: If you need the response in a specific format (e.g., list, JSON), clearly state it (though Pydantic parsers often handle this automatically).
- Iterate: The first prompt is rarely perfect. Test different variations, analyze results, and refine. Tools like OpenAI's playgrounds or Chatbot UI-type interfaces for Ollama are very helpful here.

Monitoring, Evaluating, and Debugging AI Applications

Once an application is running, monitoring and evaluating its quality become crucial.

Tools:
- LangSmith: A product from the creators of Langchain, specifically designed for tracing, debugging, and evaluating LLM-based applications. It allows you to visualize chain/agent execution, analyze costs, log interactions, and assess response quality. In my opinion, it's an absolute must-have for serious Langchain projects.
- Simple Logging: For smaller projects, even standard Python logging can be sufficient for tracking key steps and errors.
Metrics: How to assess if an LLM is responding well?
- For generative tasks, this might be subjective human evaluation.
- For RAG systems, there are more advanced evaluation frameworks, e.g., RAGAS, which assesses aspects like faithfulness of the response to the context, answer relevancy, and the quality of the retrieved context (context precision/recall).
Logging Interactions and Feedback: Collecting feedback from users (e.g., "thumbs up/down" buttons next to chatbot responses) is invaluable for iterative system improvement.

AI Application Security: Protecting Against New Threats

LLM-based applications introduce new attack vectors and risks.

Basics:
- Prompt Injection: A malicious user tries to modify the original prompt to make the LLM perform unauthorized actions or reveal confidential information. Input sanitization techniques and separation of data from instructions should be applied.
- Data Leakage: If an LLM has access to sensitive data (e.g., in a RAG system), appropriate access control mechanisms and minimization of data passed to the model must be ensured.
- API Access Management: Secure storage of API keys, limiting permissions.
Best Practices: Regular security reviews, applying the principle of least privilege, updating dependencies. OWASP publishes a Top 10 list of threats for LLM applications, which is worth reviewing.

Scaling Solutions and Managing Costs

As an application gains popularity, challenges related to scalability and costs arise.

Choosing the Right Models: The latest and largest model isn't always necessary. For many tasks, smaller, cheaper models (e.g., GPT-3.5-turbo, Claude 3 Haiku, or open-source models via Ollama) may suffice. Test and measure!
Prompt Optimization: Shorter, more precise prompts mean fewer tokens, and thus lower cost.
Caching: If the same queries appear frequently, responses can be cached to avoid repeated, costly LLM calls. Langchain offers caching mechanisms.
Asynchronicity and Batch Processing: For applications serving many users, using asynchronous operations and batch processing can significantly improve performance and reduce latency.

Where to Look Next? Networking and Updating Your Knowledge

The field of AI is evolving at an express pace. Continuous learning is key.

Communities:
- Discord: Langchain, LlamaIndex, n8n, Hugging Face servers – goldmines of knowledge, quick help, discussions about new developments.
- Forums and Subreddits: E.g., r/LocalLLaMA, r/MachineLearning.
Documentation: Official documentation for tools (Langchain, Ollama, Milvus, n8n, OpenAI/Anthropic models) is always the most up-to-date source of information.
Newsletters and Blogs: Many valuable newsletters (e.g., The Batch, Deep Learning Weekly, Import AI) and technical blogs (e.g., from AI tool companies, researchers).
Trends to Watch:
- Multimodality: Models processing different types of data (text, image, audio – e.g., GPT-4o).
- Smaller and More Efficient Models: Advances in model distillation and quantization, enabling them to run on less powerful hardware.
- New Agent Architectures: Development of more autonomous systems capable of complex planning.
- AI On-Device: Running models directly on user devices (phones, laptops) for greater privacy and lower latency.

Architecture of an Example Comprehensive Solution: Intelligent Customer Service Assistant

To illustrate how all these "blocks" can work together, let's consider the architecture of an intelligent customer service assistant.

Description of Components and Data Flow:

User sends a query via a web or chat interface.
n8n receives this query (e.g., via a webhook) and passes it to an agent built in LangGraph.
The LangGraph agent analyzes the query. Depending on its content, it might:
- Use the RAG tool to search the knowledge base (e.g., if the user asks about product features). Context from Milvus is passed to the LLM.
- Use the company API tool to retrieve specific customer data (e.g., if the user asks about their order status).
- Directly generate a response using the LLM (e.g., for general questions).
The LLM (chosen according to needs and budget) generates a response or decision for the agent.
The LangGraph agent passes the formulated response back to n8n.
n8n formats the response and sends it back to the user via the interface.

This system allows for flexible conversation management, use of multiple data sources, and automation of many typical customer queries, significantly offloading human agents.

Potential Problems You'll Encounter (and How to Deal with Them)

The road to working AI solutions is rarely paved only with roses. Here are some common challenges:

Model Hallucinations: LLMs can generate responses that sound plausible but are untrue or not supported by the provided context.
- Solutions: Using RAG (to "ground" the model in facts), precise prompts instructing adherence to context, fact-checking mechanisms (even manual at first), choosing models less prone to hallucinations.
Vendor Lock-in (e.g., OpenAI): Relying on a single API provider can be risky (changes in pricing, policy, availability).
- Solutions: Using tools like Ollama and open-source models as alternatives or for less critical tasks. Designing applications for easy LLM component replacement (thanks to Langchain's abstractions, this is simpler).
API Costs: Popular models can be expensive, especially with high traffic.
- Solutions: Monitoring usage (e.g., via LangSmith), choosing cheaper models for appropriate tasks (e.g., Claude 3 Haiku instead of Opus for simple classifications), caching, prompt optimization.
Dependency and Version Management: The Python and AI ecosystem is dynamic; libraries change frequently.
- Solutions: Using dependency management tools (Poetry, conda), precise versioning, containerization (Docker) for environment consistency.
"In my project, I noticed that...": The biggest challenge is often not the AI technology itself, but precisely defining the business problem we want to solve, and preparing and cleaning the data that will feed our systems (especially in the context of RAG). Often, 80% of the time is spent on data, and 20% on modeling.

ROI and Next Steps: How to Turn Knowledge into Real Business Value

After 90 days, you will possess skills that can bring tangible benefits to your company.

Examples of ROI (Return on Investment):

Customer Service Automation: A RAG-based chatbot and agent can reduce response times for typical customer queries from several hours to a few seconds, potentially lowering service costs by 30-50% and increasing customer satisfaction.
Content Generation: Automating the creation of draft reports, product descriptions, or marketing emails can shorten the time needed for these tasks by 70-80%. If a team spent 20 hours a week on this, the saving is 14-16 hours.
New Products/Services: The ability to quickly prototype and implement new, intelligent features (e.g., personalized recommendations, an intelligent data analyst for clients) can open up new revenue streams. Sometimes creating an MVP for such a product is a matter of weeks, not quarters.

What specific projects can you undertake now?

An intelligent Q&A chatbot for the company's internal knowledge base.
A system for automatic tagging and categorization of incoming documents or emails.
A tool for generating personalized meeting summaries from transcripts.
A simple agent for online research on a given topic.
Micro-CTA: Calculate the potential ROI for automating one specific process in your company (e.g., answering FAQs) using an LLM. How many work hours can be saved per month?

AI Roadmap: From Zero to Expert in 90 Days with Off-the-Shelf Tools 🗺️