• This post aims to run deepseek-r1 model in offline mode and have an AI agent connected to the offline model through VScode with WSL setup
  • I would be installing this VsCode extension as AI Chat agent that connects to my offline model
  • I am using Debian in WSL

Current System config

  • Here is my Windows(HOST) machine config
    • Windows(HOST) machine memory config
    • Windows(HOST) machine memory config
  • Here is my current allocation for WSL in Windows machine

      # https://learn.microsoft.com/en-us/windows/wsl/wsl-config#wslconfig
      # sairaghava_k's custom WSL settings to increase memory
    
      [wsl2]
    
      # Limits VM memory to use no more than 17 GB, this can be set as whole numbers using GB or MB
      memory=17GB
    
      # Sets the VM to use two virtual processors
      processors=10
    

Ollama Installation Steps


Install deepseek-r1 in offline mode

  • Ollama CLI available commands for reference

         Available Commands:
          serve       Start ollama
          create      Create a model from a Modelfile
          show        Show information for a model
          run         Run a model
          stop        Stop a running model
          pull        Pull a model from a registry
          push        Push a model to a registry
          list        List models
          ps          List running models
          cp          Copy a model
          rm          Remove a model
          help        Help about any command
    
  • First, ensure that you have started the ollama server locally with command

    ollama serve

  • Pull the deepseek-r1 model from the registry

    ollama pull deepseek-r1:8b

    • This requires 4.9 GB
  • After pulling the model my WSL crashed. I guess it could be because of low memory allocation
  • Redefined my WSL config. Here is my updated WSL config from C:\Users\<username>\.wslconfig

    # https://learn.microsoft.com/en-us/windows/wsl/wsl-config#wslconfig
    # sairaghava_k's custom WSL setttings to increase memory
    # Actual HOST machine memory is 32GB hardware reserved is 4.3GB remaining 27.7GB
    
    # Kept 21GB for WSL(Debian) having [Docker, VsCode, ollama(local models)]
    # And remaining 6.7 GB for HOST(Windows Machine) having [Browser, Rancher(Desktop), notepad++, Office applications like excel, word, outlook, powerpoint etc]
    
    
    [wsl2]
    # Limits VM memory to use no more than 21 GB, this can be set as whole numbers using GB or MB
    memory=21GB
    
    
    ## Total 16 logical processors, dedicating 10 for WSL, remaining 6 for Windows(Host) machine
    
    # Sets the VM to use two virtual processors
    processors=10
    

Chat/Query with model from CLI without VsCode Extension

  • We can chat through CLI
    • Ensure you have the ollama server started using the command

      ollama serve

    • And run the model

      ollama run deepseek-r1:8b

    • This will launch the CLI prompt and now you can start the Chat

Chat/Query with model through VsCode Extension that acts as light weight interface

  • While trying to install the extension it suggested to upgrade VScode version

    I am currently on VSCode 1.95.3

  • After downloading the latest 1.97.0(At the time this setup) hop to the WSL prompt from WT
    • And do
      • code --version
    • It said wget not installed
      • To install do sudo apt-get install wget
    • Rerun the code --version
      • This will try to remove the previous installation and download the latest VsCode server version
      • And that would be installed to folder ~/.vscode-server/bin/<hash>
      • And the server version, hash will display exactly as it like client version, hash
  • Install the VSCode extension

That’s it, now the extension talks to the deepseek-r1 model through ollama in offline mode without network connection.