Chat with your ElasticSearch data using the OpenAI plugin (RAG)

About this blueprint

AI API Variables

This flow demonstrates how to use the OpenAI plugin to chat with your ElasticSearch data. The flow will do the following:

Search an ElasticSearch index for relevant documents based on a user's question.
Parse the Elasticsearch query results into a context template used in a prompt to the LLM.
Generate a response to the user's question based on the retrieved documents using the OpenAI API plugin.
Log the response to the console.

The flow uses .

To set up the Elasticsearch database locally with Docker, use the following command:

bash

docker run -it \
    --rm \
    --name elasticsearch \
    -m 2G \
    -p 9200:9200 \
    -p 9300:9300 \
    -e "discovery.type=single-node" \
    -e "xpack.security.enabled=false" \
    docker.elastic.co/elasticsearch/elasticsearch:8.14.3

Note that if you see a "Elasticsearch has quit unexpectedly" message, you can use the -m field to increase the memory limit of your container.

To create a sample course_questions index, use the following command:

bash

curl -X PUT "http://localhost:9200/course_questions" -H "Content-Type: application/json" -d'
{
 "settings": {
   "number_of_shards": 1,
   "number_of_replicas": 0
 },
 "mappings": {
   "properties": {
     "text": { "type": "text" },
     "section": { "type": "text" },
     "question": { "type": "text" },
     "course": { "type": "keyword" }
   }
 }
}'

Once your sample index is created, load data into it. We'll use a publicly available DataTalksClub FAQ dataset:

bash

curl -X POST "http://localhost:9200/course_questions/_bulk" \
-H "Content-Type: application/json" \
--data-binary @<(curl -s https://huggingface.co/datasets/kestra/datasets/raw/main/json/zoomcamp_faq.json | jq -c '.[] | {"index":{}}, .')

To get access to the OpenAI API key:

Create an OpenAI account
Go to the API keys section
Create a new key and copy it.

Big thanks to Faithful Adeda for contributing this blueprint! 🫶

yaml

id: chat_with_your_data
namespace: company.team

inputs:
  - id: question
    type: STRING
    defaults: How do I join the course after it has started?

  - id: select_a_zoomcamp
    type: SELECT
    defaults: data-engineering-zoomcamp
    values:
      - data-engineering-zoomcamp
      - machine-learning-zoomcamp
      - data-science-zoomcamp

tasks:
  - id: search
    type: io.kestra.plugin.elasticsearch.Search
    connection:
      hosts: 
        - http://localhost:9200/
    indexes:
      - course_questions
    request:
      size: 5
      query: 
        bool:
          must:
            multi_match:
              query: "{{ inputs.question }}"
              fields: ["question", "text", "section"]
              type: best_fields
          filter:
            term:
              course: "{{ inputs.select_a_zoomcamp }}"

  - id: context_template
    type: io.kestra.plugin.core.debug.Return
    format: >
      {% for row in outputs.search.rows %}
        Section: {{ row.section }}
        Question: {{ row.question }}
        Text: {{ row.text }}
      {% endfor %}

  - id: generate_response
    type: io.kestra.plugin.openai.ChatCompletion
    apiKey: sk-proj-your-OpenAI-API-KEY
    model: gpt-4o
    maxTokens: 500
    prompt: |
      You're a course teaching assistant. 
      Answer the user QUESTION based on CONTEXT - the documents retrieved from our FAQ database. 
      Only use the facts from the CONTEXT. 
      If the CONTEXT doesn't contain the answer, return "NONE".
      QUESTION: {{ inputs.question }}
      CONTEXT: {{ outputs.context_template.value }}

  - id: log_output
    type: io.kestra.plugin.core.log.Log
    message: "{{ outputs.generate_response.choices | jq('.[].message.content') | first }}"

Return

Chat Completion

Log

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra