Blueprints

Chat with your ElasticSearch data using the OpenAI plugin (RAG)

About this blueprint

AI API Variables

This flow demonstrates how to use the OpenAI plugin to chat with your ElasticSearch data. The flow will do the following:

  1. Search an ElasticSearch index for relevant documents based on a user's question.
  2. Parse the Elasticsearch query results into a context template used in a prompt to the LLM.
  3. Generate a response to the user's question based on the retrieved documents using the OpenAI API plugin.
  4. Log the response to the console.

The flow uses .

To set up the Elasticsearch database locally with Docker, use the following command:

bash
docker run -it \
    --rm \
    --name elasticsearch \
    -m 2G \
    -p 9200:9200 \
    -p 9300:9300 \
    -e "discovery.type=single-node" \
    -e "xpack.security.enabled=false" \
    docker.elastic.co/elasticsearch/elasticsearch:8.14.3

Note that if you see a "Elasticsearch has quit unexpectedly" message, you can use the -m field to increase the memory limit of your container.

To create a sample course_questions index, use the following command:

bash
curl -X PUT "http://localhost:9200/course_questions" -H "Content-Type: application/json" -d'
{
 "settings": {
   "number_of_shards": 1,
   "number_of_replicas": 0
 },
 "mappings": {
   "properties": {
     "text": { "type": "text" },
     "section": { "type": "text" },
     "question": { "type": "text" },
     "course": { "type": "keyword" }
   }
 }
}'

Once your sample index is created, load data into it. We'll use a publicly available DataTalksClub FAQ dataset:

bash
curl -X POST "http://localhost:9200/course_questions/_bulk" \
-H "Content-Type: application/json" \
--data-binary @<(curl -s https://huggingface.co/datasets/kestra/datasets/raw/main/json/zoomcamp_faq.json | jq -c '.[] | {"index":{}}, .')

To get access to the OpenAI API key:

  • Create an OpenAI account
  • Go to the API keys section
  • Create a new key and copy it.

Big thanks to Faithful Adeda for contributing this blueprint! 🫶

yaml
id: chat_with_your_data
namespace: company.team

inputs:
  - id: question
    type: STRING
    defaults: How do I join the course after it has started?

  - id: select_a_zoomcamp
    type: SELECT
    defaults: data-engineering-zoomcamp
    values:
      - data-engineering-zoomcamp
      - machine-learning-zoomcamp
      - data-science-zoomcamp

tasks:
  - id: search
    type: io.kestra.plugin.elasticsearch.Search
    connection:
      hosts: 
        - http://localhost:9200/
    indexes:
      - course_questions
    request:
      size: 5
      query: 
        bool:
          must:
            multi_match:
              query: "{{ inputs.question }}"
              fields: ["question", "text", "section"]
              type: best_fields
          filter:
            term:
              course: "{{ inputs.select_a_zoomcamp }}"

  - id: context_template
    type: io.kestra.plugin.core.debug.Return
    format: >
      {% for row in outputs.search.rows %}
        Section: {{ row.section }}
        Question: {{ row.question }}
        Text: {{ row.text }}
      {% endfor %}

  - id: generate_response
    type: io.kestra.plugin.openai.ChatCompletion
    apiKey: sk-proj-your-OpenAI-API-KEY
    model: gpt-4o
    maxTokens: 500
    prompt: |
      You're a course teaching assistant. 
      Answer the user QUESTION based on CONTEXT - the documents retrieved from our FAQ database. 
      Only use the facts from the CONTEXT. 
      If the CONTEXT doesn't contain the answer, return "NONE".
      QUESTION: {{ inputs.question }}
      CONTEXT: {{ outputs.context_template.value }}

  - id: log_output
    type: io.kestra.plugin.core.log.Log
    message: "{{ outputs.generate_response.choices | jq('.[].message.content') | first }}"

Search

Return

Chat Completion

Log

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra