Unraveling the AI Arms Race: Commercial vs. Open-Source – RomanTech – Software development, AI and blockchain blog

A leaked document from a Google researcher reveals that neither Google nor OpenAI are currently positioned to secure victory in the ongoing AI arms race. Interestingly, it suggests that Meta could potentially emerge as the winner due to the rapid adoption of its LLaMA model by the open-source community. The document highlights the astounding progress within this community, where innovations are emerging at a staggering pace. This open-source renaissance is being fueled by active public involvement and is facilitated by recent, cost-effective methods for fine-tuning models in a minimal amount of time, such as through Low Rank Adaptation (LoRA). The result is a rapid generation of ideas and iterative advancements from individuals and institutions worldwide that are outpacing the rate of innovation from larger industry players. Consequently, the document advocates for Google to solidify its leadership position by following Meta’s lead: relinquishing control over its models and fostering open collaboration with the open-source community.

The Experiment

These remarks piqued my curiosity, driving me to investigate just how close open-source large language models are to matching their top commercial counterparts. Thus, I embarked on a project to compare the top three commercial models against the top five open-source models.

Commercial Models

OpenAI ChatGPT-4
Google Bard based on PaLM 2
Anthropic Claude available via their new App for Slack

Open-Source Models

LMSYS Vicuna-13b finetuned from Meta’s LLaMA
Berkeley AI (BAIR) Koala-13b finetuned from Meta’s LLaMA
Stanford Alpaca-13b finetuned from Meta’s LLaMA
Raven RWKV-4-Raven-14B an RNN with transformer-level LLM performance
Nomic AI GPT4All-J v1.3-groovy finetuned from GPT-J

Before sharing the results, it’s important to note a few caveats. First, my selection of the top commercial and open-source models is based on my research and is subjective. Moreover, there are several fine-tuned versions of the same open-source base model, leading to significant discrepancies in performance based on the fine-tuning. As you can see above, many of the top performance open-source models are based on Meta’s LLaMA model. Second, my tests were very brief, unscientific, and driven more by personal curiosity than by any established evaluation metrics. They focused on each model’s ability to analyze, describe, and improve code. More specifically, I provided each of the models the Go code below that I wrote for logging API requests, and asked them to explain what the code does, then posed a very specific question to test their understanding of a subtle performance optimization, and finally asked them for suggestions for improvement.

package logger

import (
	"net/http"
	"net/http/httptest"
	"strings"
	"time"

	"github.com/go-chi/chi/v5/middleware"
)

// Custom middleware logging class for Chi
func ChiLoggerMiddleware() func(next http.Handler) http.Handler {
	return func(next http.Handler) http.Handler {
		return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {

			if Log().IsEnabled(DebugLevel) {
				ww := middleware.NewWrapResponseWriter(w, r.ProtoMajor)
				start := time.Now()
				var rec *httptest.ResponseRecorder
				//log response body when tracing
				if Log().IsEnabled(TraceLevel) {
					rec = httptest.NewRecorder()
				}
				defer func() {
					reqHdrs := make(map[string]string)
					for key, values := range r.Header {
						reqHdrs[key] = strings.Join(values, ", ")
					}
					respHdrs := make(map[string]string)
					for key, values := range ww.Header() {
						respHdrs[key] = strings.Join(values, ", ")
					}

					if rec == nil {
						logWebServerResponse(r, start, ww.Status(), reqHdrs, respHdrs, "")
					} else {
						logWebServerResponse(r, start, ww.Status(), reqHdrs, respHdrs, rec.Body.String())
					}
				}()

				if rec == nil {
					next.ServeHTTP(ww, r)
				} else {
					next.ServeHTTP(rec, r)
					//copy the recorded response to the original response writer
					for k, v := range rec.Header() {
						w.Header()[k] = v
					}
					w.WriteHeader(rec.Code)
					rec.Body.WriteTo(ww)
				}

			} else {
				next.ServeHTTP(w, r)
			}
		})
	}
}

func logWebServerResponse(r *http.Request, start time.Time, status int,
	reqHdrs map[string]string, respHdrs map[string]string,
	respBody string,
) {
	logEvent := Log().ZDebug().
		Int64("msec", time.Since(start).Milliseconds()).
		Str("method", r.Method).
		Str("path", r.URL.Path).
		Str("qryStr", r.URL.RawQuery).
		Int("status", status).
		Interface("reqHdrs", reqHdrs).
		Interface("respHdrs", respHdrs)

	if len(respBody) > 0 {
		logEvent.Str("respBody", respBody)
	}
	logEvent.Msg("Request log")
}

The Results

There is no denying that commercial models outperformed their open-source counterparts, yet some of the open-source models held their ground quite well. All commercials models were generally better at explaining the code and they all could provide the reasoning for the performance optimization. However, there was significant disparity among the open-source models. The Vicuna model performed the best, being the only one able to explain the reason for the performance optimization. In contrast, the Stanford Alpaca model fell short, providing almost useless answers. For the final question about suggesting improvements, there was no significant differentiation among all the models, except for the Alpaca model, which offered no useful information. In general, the proposed improvements were a mixed bag – some useful, yet others incorrect or nonsensical.

You can find the actual chat dialogs with the responses each model provided in the Detailed Results section below the verdict. All questions are in bold, and the Go code was provided as part of the first question, but has been omitted from the dialogs.

The Verdict

I was pleasantly surprised at the state of the open-source models for this use case, as the top models are not far behind their commercial counterparts. While it is anyone’s guess whether open-source models will fully rival commercial models from the big players, their continued improvement is a certainty and the pace of improvement will undoubtedly accelerate, spurred by the fervor in the field. Open-source models are likely to become good enough for many applications in the very near future, broadening the scope of AI’s influence and accessibility. The democratization of AI through open-source models has the potential to drive a new wave of innovation, as diverse perspectives and creative problem-solving approaches are introduced. I eagerly anticipate their continued evolution and the transformative impact they could have across various industries and applications.

Detailed Results

OpenAI ChatGPT-4

You are an expert software developer. Tell me what the Go code below does.

This Go code provides a middleware function for the Chi router, a lightweight, idiomatic and composable router for building Go HTTP services. The middleware function is used for logging HTTP requests and responses.

The ChiLoggerMiddleware function is a middleware factory function that returns a middleware function. This middleware function wraps the HTTP handler with additional functionality, specifically logging of HTTP request and response information.

If the logging level is at least DebugLevel, the middleware wraps the http.ResponseWriter with middleware.NewWrapResponseWriter, which provides additional information such as the HTTP status code of the response. It also records the start time of the request.

If the logging level is at least TraceLevel, the middleware additionally records the entire response using httptest.ResponseRecorder. This is useful for debugging, but could be expensive in terms of memory usage for large responses.

After the request has been processed, the middleware logs various information about the request and response, such as the time it took to process the request, the HTTP method and path, the query string, the status code, and the request and response headers. If the response was recorded, it also logs the response body.

If the response was recorded, it's written to the original response writer after the request has been processed. This is done by copying the headers, setting the status code, and writing the body.

If the logging level is less than DebugLevel, the middleware simply passes the request to the next handler without any additional processing.

The logWebServerResponse function is a helper function that logs the information about the request and response using structured logging. Structured logging is a type of logging that logs events in a structured, often machine-readable format. It's commonly used in modern applications due to its advantages over traditional plain-text logging, such as better support for searching and analyzing logs.

Please note that the Log function, DebugLevel, TraceLevel, and ZDebug method are not defined in the provided code. I assume they're part of a custom logging system used in the rest of your application.

Why is the NewRecorder being used only if logging is enabled at the trace level?

httptest.NewRecorder() is used to create an http.ResponseRecorder, which is an implementation of http.ResponseWriter that records its mutations for later inspection in tests. In other words, it captures or "records" the response, including headers and body content.

In the context of this logger middleware, it's used to capture the response body so that it can be logged when logging is enabled at the TraceLevel.

However, recording the response body can potentially use a significant amount of memory, especially for large responses. For this reason, it's only done when the logging level is set to TraceLevel. This is typically a high level of logging verbosity used only for debugging or tracing execution, where the potential performance and memory usage impact is acceptable because detailed information is needed.

In production environments, the logging level is typically set lower (e.g., InfoLevel or WarnLevel) to avoid this impact while still logging important information. Thus, the use of httptest.NewRecorder() is conditional on the logging level to provide a balance between the need for detailed logging information and the need for efficient use of system resources.

How would you improve the code?

The code is well-structured and follows the common patterns for HTTP middleware in Go. However, there are always opportunities for improvements and optimizations. Here are a few suggestions:

Avoid creating maps for headers when not necessary: In the current code, maps for request and response headers are always created, regardless of whether they are logged. This could be optimized by only creating the maps if they are needed, i.e., if the logging level is at least DebugLevel.

Reduce redundancy in logging decision: The decision to log the response body is based on whether the logging level is TraceLevel. This is checked twice, once when deciding whether to create the ResponseRecorder and once when deciding what to log. It would be more efficient to make this decision only once.

Move response body logging into logWebServerResponse: Instead of deciding whether to log the response body in the middleware function, this decision could be moved into the logWebServerResponse function, which would make the middleware function simpler and more readable.

Add error handling for rec.Body.WriteTo(ww): In the current code, potential errors from the WriteTo function are ignored. It would be a good practice to handle these errors appropriately.

Google Bard

You are an expert software developer. Tell me what the Go code below does.

The Go code you provided is a middleware function that can be used to log HTTP requests and responses. The middleware function works by first checking if the current logging level is enabled. If it is, the middleware function then creates a new response writer and starts a timer. The middleware function then calls the next middleware function, which handles the actual request. After the request is handled, the middleware function stops the timer and logs the request and response information. The information logged includes the request method, path, query string, status code, request headers, response headers, and response body. If the logging level is set to trace, the middleware function will also log the response body.

Why is the NewRecorder being used only if logging is enabled at the trace level?

The NewRecorder is being used only if logging is enabled at the trace level because it can be expensive to create and use. The NewRecorder is a wrapper around an http.ResponseWriter that records all of the mutations made to it. This can be useful for debugging purposes, but it can also be a performance bottleneck. Therefore, it is only used when logging is enabled at the trace level, which is the most detailed logging level.

Here is a breakdown of the pros and cons of using the NewRecorder:

Pros:

- The NewRecorder can be useful for debugging purposes.
- It can be used to record all of the mutations made to an http.ResponseWriter.

Cons:

- The NewRecorder can be expensive to create and use.
- It can be a performance bottleneck.

Overall, the NewRecorder is a useful tool for debugging purposes, but it should be used sparingly to avoid performance bottlenecks.

How would you improve the code?

Here are some ways to improve the code:

- The middleware function could be made more efficient by only logging the request and response information if the logging level is enabled.
- The middleware function could be made more flexible by allowing the user to specify the logging level.
- The middleware function could be made more robust by adding error handling.

Here is an example of how the middleware function could be improved:

...(Bard provided example code)

Anthropic Claude

You are an expert software developer. Tell me what the Go code below does.

This Go code defines a middleware function for the Chi router library. The middleware does the following:

- Logs request details at the Debug log level, including method, path, query string, request headers, response headers, status code and response time.
- If the log level is Trace, it also records the full response body. It does this using a httptest.ResponseRecorder.
- It checks the log level before performing any logging, to avoid overhead if logging is disabled.
- It uses a middleware.WrapResponseWriter to get the response headers, status code, etc from the wrapped response writer.
- It logs the request details at the end of the request using a defer function.
- If response recording was enabled, it copies the details from the httptest.ResponseRecorder to the original response writer, so the response is unchanged.

So in summary, this is a logging middleware for the Chi router that can log request details, response details and full response bodies at the Debug and Trace log levels respectively. (edited) 

Why is the NewRecorder being used only if logging is enabled at the trace level?

The httptest.NewRecorder() is only used if TraceLevel logging is enabled because it buffers the full response body in memory. If logging is disabled or only DebugLevel is enabled, there's no need to buffer the entire response body, since it won't be logged anyway.

Buffering the response body in memory could negatively impact performance and memory usage. So, the middleware only enables response body recording and buffering if the TraceLevel log level is explicitly enabled. At the DebugLevel, it just logs request and response headers, status code and timing information, without the full body.
So it avoids any unnecessary overhead and only logs response bodies when explicitly needed at the Trace log level. (edited) 

How would you improve the code?

Here are some ways I would improve this code:

1. Add configuration for the log levels to enable. Currently it hard codes DebugLevel and TraceLevel. It would be better to allow the levels to be configured.
2. Add the option to log to files instead of just using thezap logging library. The user may want the logs written to disk.
3. Add the option to log request bodies as well, not just response bodies. This could be useful for debugging issues.
4. Add metrics collection in addition to just logging. Things like request counts, response times, status codes counts could be useful metrics to gather.
5. Allow the logger to be swapped out for a different logging library if needed. Currently it is tightly coupled to the zap library.
6. Add an option to compress or redact logged response bodies, in case they contain sensitive data.
7. Make the response recording size configurable instead of buffering the entire body in memory unconditionally. This could improve performance if large response bodies are not needed to be logged.
8. Return the http.Handler instead of an http.HandlerFunc from the middleware function. This would allow the middleware to be used with http.ListenAndServe() directly, instead of needing to wrap another http.Handler.
9. Use a sync.Pool to reuse *httptest.ResponseRecorder instances instead of creating a new one for each request. This could improve performance slightly.
10. Add server timing metrics to the response headers when response recording is enabled. This would provide insight into the overhead the logging is adding.

Those are a few improvements I would suggest to make this a more robust logging middleware. Let me know if you have any other questions!

LMSYS Vicuna-13b

You are an expert software developer. Tell me what the Go code below does.

This Go code defines a middleware function that logs web server responses. The middleware function is a wrapper around the next handler passed to it, which means that the response writer and request are passed to the next handler.

When the IsEnabled method of the Log struct returns true for the DebugLevel key, the middleware function starts a timer and creates a httptest.ResponseRecorder to record the response. The rec variable is used to store the response recorder.

The middleware function then sets the ww variable to a new http.ResponseWriter with the same protocol major version as the incoming request.

The middleware function then logs the response body and headers if the TraceLevel key is enabled in the Log struct. This is done by creating a new httptest.ResponseRecorder and writing the response to it. The WriteTo method of the response recorder is used to write the recorded response to the original response writer.

If the Recorder is not nil, the next handler is called with rec as the response writer.

The logWebServerResponse function is called to log the response.

The logWebServerResponse function logs the response headers, method, path, query string, status code, and response body. It uses the ZDebug method of the Log struct to log these events with a timestamp and debug message.

Finally, the logWebServerResponse function logs the response body as a string if it is not nil.

Why is the NewRecorder being used only if logging is enabled at the trace level?

The NewRecorder is used only when logging is enabled at the TraceLevel because it is used to log the response body. The NewRecorder is a helper function provided by the httptest package that creates a new http.ResponseRecorder with a default Body and Write method.

In the code, if Log().IsEnabled(TraceLevel) returns true, a new httptest.ResponseRecorder is created and assigned to the rec variable. This is used to log the response body. If Log().IsEnabled(TraceLevel) returns false, the Recorder is not created and assigned to rec, and the response body is not logged.

This is because logging the response body can be considered an expensive operation, as it involves writing the response to a new buffer. To minimize the performance impact, the code only logs the response body when it is necessary.

How would you improve the code?

Here are some possible ways to improve this code:

- Separate concerns: The code combines the middleware logic with the logging logic. It would be better to separate these concerns by defining separate functions for the middleware and logging logic. This would make the code easier to read and test.
- Use context: The code uses the r variable to access the incoming request, which is passed to the next handler. Instead of passing r to the next handler, it would be better to use context to provide the request to the next handler. This would make the code more modular and easier to test.
- Avoid unnecessary checks: The code checks if the Recorder is nil before logging the response body. Since the Recorder is always created before the logging logic, the check for nil is unnecessary. Removing the check would simplify the code.
- Use a deferred function: The logWebServerResponse function is called at the end of the middleware function. It would be better to use a deferred function to ensure that the response is logged before the next handler is called. This would ensure that the response is logged even if an error occurs in the next handler.
- Error handling: The code does not handle errors in the logging or middleware logic. It would be better to handle errors and return an appropriate HTTP status code to the client.
- Add comments: The code could benefit from additional comments explaining what each part of the code does and why it is used. This would make the code easier to understand and maintain.

BAIR Koala-13b

You are an expert software developer. Tell me what the Go code below does.

This code defines a middleware for logging web server responses. When a request is made to a web server, the middleware adds extra information to the request headers, including the request method, request path, query string, response status code, and response body. This information is used to generate a log of the response that includes the request context. The response is also copied to the original response writer to be sent back to the client. If the response is recorded using the httptest.NewRecorder function, it is also logged.

Why is the NewRecorder being used only if logging is enabled at the trace level?

The httptest.NewRecorder function is used to record the response that is sent back to the client. If logging is enabled at the trace level, the response is recorded and logged along with the request context. If logging is not enabled at the trace level, the response is not recorded and simply sent back to the client. By only recording responses when logging is enabled at the trace level, the logging mechanism can be triggered by the most important or informative events, which are those that occur at the trace level. This approach reduces the amount of unnecessary logging and improves the performance of the logging system.

How would you improve the code?

One potential improvement to the code could be to add more context to the logging message. For example, the Log().ZDebug(). method logs the response status code, but it does not provide a description of the status code. It would be more useful to include the status code description, such as "Ok", "Created", "Bad Request", etc. Additionally, the code currently uses the Str("method", r.Method), Str("path", r.URL.Path), and Str("qryStr", r.URL.RawQuery) methods to include the request method, request path, and query string in the logging message. These methods use the string representation of the values, which can make the logging message difficult to read and understand. It would be more useful to use the ,, or > operator to concatenate these values, like so: Log().ZDebug().Int64("method", r.Method).Str("path", r.URL.Path).Str("qryStr", r.URL.RawQuery).Msg("Request log").

Another potential improvement could be to add more context to the response body logging message. Currently, the Str("respBody", respBody) method is used to include the response body in the logging message. However, it would be more useful to include the response body in a way that is easy to understand, such as by including the first few characters or by providing a summary of the response. For example, the Stdout.Write() function could be used to write the response body to the console, like so: Stdout.Write(respBody[:100]).

Stanford Alpaca-13b

You are an expert software developer. Tell me what the Go code below does.

That code is for a custom middleware logging class for Chi. It enables logging of requests and responses for debugging and monitoring purposes. It logs the request headers, response headers, and body, as well as timing information. It also enables logging of the request and response bodies if the Log() object's tracing level is enabled.

Why is the NewRecorder being used only if logging is enabled at the trace level?

That is because it is more efficient to use the NewRecorder object when logging at the trace level, as it will create a new response object which will make it easier to inspect the response body for debugging purposes. Additionally, using the NewRecorder will also prevent the response from being written to the client, which could cause an error if the response is large.

How would you improve the code?

The code can be improved by making it more efficient. This can be done by using the NewRecorder object only when logging is enabled at the trace level and using the WrapResponseWriter object for all other levels. This will prevent the need to create a new response object each time a request is made. Additionally, the request and response headers can be merged into a single map which can be used to log the request and response headers together. This will reduce the number of maps that need to be created and also improve the logging efficiency.

Raven RWKV-4-Raven-14B

You are an expert software developer. Tell me what the Go code below does.

This is a piece of code that implements the log.ChiLoggerMiddleware interface in Go. It defines a function named Log that takes a request and a response and returns a response. The function takes a http.ResponseWriter and an http.Request as input parameters and it wraps them with a ChiLoggerMiddleware object. The ChiLoggerMiddleware object defines a wrap function that takes three parameters: start which is a timestamp, w which is the http.ResponseWriter and r which is the http.Request. The middleware also sets some global variables, Log() and Debug().

The Log() function logs the request and response details using the start timestamp and the response status code. It also logs the reqHdrs and respHdrs map that contains the request and response headers. The Log() function calls the Log().ZDebug().Int64("msec", time.Since(start).Milliseconds()).Int64("reqHdrsMap", map[string]string{"reqHdrs": reqHdrs}).Int64("respHdrsMap", map[string]string{"respHdrs": respHdrs}).Int64("respBody", respBody) function call to log the response header and body.

The ChiLoggerMiddleware() function is a middleware that logs the incoming requests with the Log() function. It takes two parameters: start which is a timestamp and w which is a http.ResponseWriter object.

The function first creates a new middleware with the ChiLoggerMiddleware() function. It then wraps the http.ResponseWriter and http.Request objects with the ChiLoggerMiddleware() and sets the "Debug" level for the logging. It also logs the incoming request details to the start timestamp.

Finally, the ChiLoggerMiddleware() function calls the Log().ZDebug().Int64("msec", time.Since(start).Milliseconds()).Int64("reqHdrsMap", map[string]string{"reqHdrs": reqHdrs}).Int64("respHdrsMap", map[string]string{"respHdrs": respHdrs}).Int64("respBody", respBody) function to log the incoming request with the (response got cut off here)


Why is the NewRecorder being used only if logging is enabled at the trace level?

The NewRecorder function is used only if the logging package is enabled at the "trace" level. The logging package defines a set of loggers that can be used to log different types of messages, such as debug, info, warning, error, fatal, and panic. The NewRecorder function is a helper function that creates a new zap.Logger object and returns it.
The zap.Logger object is used to log messages to a Zap logger. The Zap logger is a high-performance logger that is designed to be flexible and customizable. It can be used to log messages to different types of backends, such as files, databases, and sockets.
The NewRecorder function is used to create a new zap.Logger object and set some options, such as the log level, the log format, and the log file path. By default, the zap.New() function creates a new zap.Logger object with a "trace" log level and a zap.File backend that writes log messages to a file.

In summary, the NewRecorder function is used to create a new zap.Logger object that can be used to log messages. The zap.New() function is called with the trace log level and the zap.New() function creates a new zap.Logger object with the "trace" log level. The NewRecorder function is called with the zap.New object as an argument and it creates a new zap.Logger object with the "trace" level.


How would you improve the code?

There are several ways to improve the code, depending on the specific requirements and goals of the program. Here are a few suggestions:

1. Use consistent naming conventions: The code uses different naming conventions for variables, functions, and packages. It would be helpful to use consistent naming conventions to make the code more readable and easier to understand.
2. Use a testing framework: The code could benefit from using a testing framework, such as Go's built-in testing package, to ensure that the code works as expected. This would help to catch bugs and improve the reliability of the code.
3. Use error handling: The code currently does not use error handling to catch and handle errors. It would be helpful to add error handling to the code to make it more robust and reliable.
4. Use configuration files: The code currently uses hardcoded values for the log levels and other configuration options. It would be helpful to use a configuration file to allow users to easily configure the program without modifying the code.
5. Use a logger implementation: The code currently uses a hardcoded implementation of the zap.Logger interface. It would be helpful to use a logging package that supports multiple backends, such as the zap.New interface, to make the code more flexible.
6. Use a configuration file: The code currently hardcodes the log level values in the code. It would be helpful to use a configuration file to allow users to customize the log levels and other settings.
7. Reduce duplication: The code currently has multiple functions that share similar code, such as the Log().ZDebug().Int64("msec", time.Since(start).Milliseconds()).Int64("dur", start.Sub(start).Nanoseconds()) block. It would be helpful to reduce the duplication by factoring out the common code into a separate function or method.
8. Use a dependency injection framework: The code currently uses global variables to set the log level and other configuration options. It would be helpful to use a dependency injection framework, such as the "github.com/google/go-github/github" package, to inject these values into the code.
9. Test the code: The code currently uses a testing package to test the Log() function. It would be helpful to test the code using a testing framework to ensure that it works as expected and to catch bugs early in the development process.

Overall, these suggestions are just a starting point and the specific... (response got cut off)

Nomic AI GPT4All-J v1.3-groovy

You are an expert software developer. Tell me what the Go code below does.

This is a custom middleware function that logs HTTP requests and responses in a web server. The `Chiloggemiddleware` function returns another middleware function that wraps the existing handler function with additional functionality. This new middleware function logs certain information about each request and response, including the method used, the path of the requested resource, and any query parameters or headers present in the request or response. Additionally, it copies the recorded response to the original response writer so that subsequent requests can be logged as well.

Why is the NewRecorder being used only if logging is enabled at the trace level?

The reason why the `NewRecorder` is not used when logging is disabled at the trace level is because it requires access to the underlying `httptest` library, which may not be available during testing. Instead, the `Chiloggemiddleware` function uses the built-in `httptest` library's `ResponseRecorder` object to record the response from the wrapped handler function.

How would you improve the code?

Here are some suggestions to improve the code:

1. Use interfaces instead of directly passing struct fields between functions. This makes the code more modular and easier to maintain.
2. Add comments explaining the purpose of each section of the code. This will help others understand your intent and avoid potential issues down the line.
3. Consider adding error handling for cases where the log event cannot be created due to errors occurring within the middleware function.