Infrastructure
In a production environment, the models will be hosted on a server and the API integrated seamlessly into the electronic health record and other systems. As such technical work is beyond the primary scope of this project, we decided to localize the model to run in the browser.
The models, built primarily in PyTorch and SciKit-Learn, are converted to the ONNX format, which allows us to run them anywhere with the ONNX runtime. Specifically, we're using the ONNX runtime for Javascript to run the models in the browser. The models are loaded into the client's browser, where it is run using the Web Assembly version of the ONNX runtime.
In consideration of user's hardware and network limitations, we've tried to keep the models as small as possible while still maintaining a high level of accuracy. The larger models were quantized from 32-bit to 16-bit floating point numbers, which reduces the size of the model by half. Further work could be done to improve the size of the models, such as pruning non-essential weights, or using a smaller model architecture, however considering the goal of this project, we felt it was more important to maintain a high level of accuracy. We are also using Transformer.js to aid pre and post processing of the inputs, such as tokenization, padding, softmax, etc.