Go Concurrency Design Patterns - Generator

The Generator pattern, the simplest way to get started with concurrent programming in go.

For me the biggest challenge in concurrent programming is making sure all the “jobs” have been ran, and that all channels are closed in a respectable manner. To help with this we have design patterns. I will be going through all of the ones frequently implemented in golang. Today is the simplest of the bunch. The Generator.

The Generator pattern works best when you have a pre-defined set of “jobs” that need to be ran in a concurrent manner. Let’s get to work.

The Task

To create a program that checks the status of the inputted urls, by returning the status code.

The fetch logic

The function that we will be running is;

// A Result structure, this will hold the outcome for each request
type Result struct {
	Url string
	StatusCode int
}

// Go get me that url!
// We will receive two parameters, the url which we would like to check, and the channel to push the result into
func PingUrl(url string, ch chan Result) {

	// Get me the inputted url
	response, err := http.Get(url)

	// For simplicity let's panic on error
	if err != nil {
		panic(err)
	}

	// create a Result struct
	result := Result{
		Url: url,
		StatusCode: response.StatusCode,
	}

	// push the Result struct into the channel
	ch <- result
}

The main function

let’s start at the beginning, we will take a pre-defined slice of urls, we will then pass this slice into the GetStatuses function (more on this next), and finally loop around for the amount of times equal to the items that we have in our url slice.

let’s say that again, loop around for the amount of times equal to the items that we have in our url slice

Each time we loop around the entire program will stop execution and wait on res := <-ch, we will wait for a result to be published to the channel. This will ensure that our program will wait for all of our http requests to be completed. It will wait 3 times for 3 requests.

func main() {
	urlsToScan := []string{
		"https://www.google.com",
		"https://www.facebook.com",
		"http://johnmackenzie.co.uk",
	}

	// The generator function, returns the channel which the results of each url will be published to.
	ch := GetStatuses(urlsToScan)

	for i := 0; i < len(urlsToScan); i++ {
		res := <-ch
		fmt.Printf("url: %s - status_code: %d\n", res.Url, res.StatusCode)
	}
	fmt.Println("Scan complete.")
}

The Generator Function

Now for the good stuff! Here we have a function which takes the slice of urls (which we instantiated above), and returns a read-only channel. (this basically means outside of this function this channel can only be read from, and not written to.

We loop around the slice running the PingUrl function (the one defined above), each time as a separate go routine. The go routines are fired off in the background and this function returns straight away with a channel.

func GetStatuses(urls []string) <-chan Result {
	ch := make(chan Result, len(urls))
	for _, url := range urls {
		go PingUrl(url, ch)
	}
	return ch // Return the channel to the caller.
}

As you will remember from the main function the program will then stop and wait for all three jobs to be completed before exiting.

All Together

package main

import (
	"fmt"
	"net/http"
)

// A Result structure, this will hold the outcome for each request
type Result struct {
	Url string
	StatusCode int
}

func GetStatuses(urls []string) <-chan Result {
	ch := make(chan Result)
	for _, url := range urls {
		go PingUrl(url, ch)
	}
	return ch // Return the channel to the caller.
}

func PingUrl(url string, ch chan Result) {

	// Get me the inputted url
	response, err := http.Get(url)

	// For simplicity let's panic on error
	if err != nil {
		panic(err)
	}

	// create a result struct
	result := Result{
		Url: url,
		StatusCode: response.StatusCode,
	}

	// push that response struct into the channel
	ch <- result
}

func main() {
	urlsToScan := []string{
		"https://www.google.com",
		"https://www.facebook.com",
		"http://johnmackenzie.co.uk",
	}
	ch := GetStatuses(urlsToScan)

	for i := 0; i < len(urlsToScan); i++ {
		res := <-ch
		fmt.Printf("url: %s - status_code: %d\n", res.Url, res.StatusCode)
	}
	fmt.Println("Scan complete.")
}

Same thing again but this time with sync.Wait()

To make this function easier to consume and to not rely on knowing exactly how many urls will be pinged, one redditor pointed out the possible solution of Waitgroups.

To put simply summarise wait groups can be used to keep a track on how many go routines are being fired up, and how many have finished. When there are no more urls to request the wait group will let us know.

package main

import (
	"fmt"
	"net/http"
	"sync"
)

// A Result structure, this will hold the outcome for each request
type Result struct {
	Url string
	StatusCode int
}

func GetStatuses(urls []string) *sync.WaitGroup {
	var wg sync.WaitGroup
	for _, url := range urls {
		wg.Add(1)
		go PingUrl(url, &wg)
	}
	return &wg // Return the channel to the caller.
}

func PingUrl(url string, wg *sync.WaitGroup) {

	// Get me the inputted url
	response, err := http.Get(url)

	// For simplicity let's panic on error
	if err != nil {
		panic(err)
	}

	// create a result struct
	result := Result{
		Url: url,
		StatusCode: response.StatusCode,
	}

	fmt.Printf("url: %s - status: %+v\n", result.Url, result.StatusCode)
	wg.Done()
}

func main() {
	urlsToScan := []string{
		"https://www.google.com",
		"https://www.facebook.com",
		"http://johnmackenzie.co.uk",
	}

	wg := GetStatuses(urlsToScan)

	wg.Wait()

	fmt.Println("Scan complete.")
}

Limitations and Use Cases

A few people have been writing back asking for a bit more clarification on the use cases for this pattern;

  1. Would this pattern work if I want to check 1000 URLs? What about a million?

    Each go routine that is spun up costs 2kb, so spinning up 100 go routines would cost your OS 2mb, not too bad. That means if we spin up a 1,000,000 go routines your program would need to find 2gb of memory, depending on your setup this may not be a probem.
    
  2. Is it problematic to spawn a unknown potentially huge numer of goroutines or will go handle the “limiting” in this case by itself?

    I my self have never come up against the limitation of spinning up too many go routines, however that doesn't mean such as problem doesn't exist and it would be best practice to use some kind of worker pool idea to limit the amount of http requests that you will be firing at one time. I would strongly recommend look(https://brandur.org/go-worker-pool)[here] for a deeper dive into this. However if you have a scenario where your program will need to do 10 - 100 requests at once, the above code should work just fine.