GitHub: scott-a-miller
I've been writing software professionally for over 20 years. For most of that time I've been working at a chem-informatics software company in Columbus (Leadscope), while consulting with a design group in New York (Tonic Group). I have been lead architect on dozens of projects, including scientific desktop applications, online commerce sites, B-to-B web services, and third-party software integrations. As a principal engineer working at a small company, I'm involved with all aspects of software development, including coding, feature definition, UI design, architecture, and technology evaluation.
Some things I've been using most frequently:
Other things I've used in production, but infrequently, or not recently:
Work history:
Formal education:
A sample of projects:
Leadscope's core product is a flexible chem-informatics decision support platform with data warehousing capabilities. I worked closely with toxicologists and other domain expert stakeholders to develop a series of novel tools for predictive toxicology and regulatory submissions. My main contribution was to refactor the client architecture to be more component-based. This facilitated development of new interfaces to support our pivot from high-throughput screening to in silico predictive toxicology.
In recent years, I've taken ownership of more aspects of the software: devops, maintenance, migrations, packaging, and new feature development; e.g. developed RESTful web services, implemented DAOs for PostgreSQL, as well as all of the user interface extensions.
As part of their ChemBioTox database curation, the NIEHS asked us to run 300+ million algorithmically created chemical compounds through our full suite of toxicity models and expert alerts. I designed and implemented an architecture for orchestrating 100 AWS servers running in parallel to perform the 40+ billion predictions in a few months.
I extended the web services of the Leadscope Enterprise Server with RESTful resources; providing access for a new React-based web client I developed for toxicity database searching, and statistical model application.
KNIME is an open-source, data analytics workflow platform. It allows analysts to configure chains of data sources, transformations, and visualizations in an intuitive graphical interface. A number of chem-informatics scientists have adopted KNIME as a way of creating transparent, and reproducible analytic processes.
Our Leadscope Enterprise product provides web services for various chem-informatics tasks; e.g. text and chemistry-based searching, toxicity prediction. I created custom KNIME nodes allowing customers to integrate these services into their data workflows.
Leadscope's products generate a wide variety of reports most of which are highly dynamic. For the first few years we worked with Sitraka JClass, and later JasperReports. Both required a lot of manual coding for our complex reports. To expedite this process, I created a reporting framework in JRuby utilizing the iText library.
It includes an internal DSL for specifying the templates, cascading styles, hooks for testing, and seamlessly supports RTF and PDF.
Later, as the number of reports continued to grow, maintenance and code reuse became more time consuming partly due to the dynamic typing of Ruby. I then ported the framework to Kotlin, a static-typed language which amazingly also supports the DSL features that we were using in Ruby.
Insilicofirst was a collaborative effort among several providers of chemistry-based predictive toxicity software. The main product of that effort was a web portal somewhat analogous to Expedia. The user provides a chemical structure of interest (by connection table, name, or id). The portal submits the structure to participating vendors whose web services search for exact and similar matches in toxicity databases, and perform in silico predictions. The portal collates and summarizes the available information. The user can then purchase the detailed results via credit card for download.
I was responsible for the overall architecture, implemented the portal portion, and led the discussions defining the vendor web services. Each vendor was then responsible for implementing and hosting their own web services.
A similar platform was later created as part of an SBIR grant for NIEHS. In that context, the portal and vendor services were installed at the customer site, and the retail portion was dropped.
ToxML is an open data exchange standard for a variety of toxicity data and structure-related information. Leadscope led the initial consortium defining the standard, funded in part by a grant from NIST. We adopted ToxML as the fundamental storage mechanism for toxicity data in our products. More recently, we've extended the reach of ToxML through the creation of The ToxML Standards Organisation. Contributing to that effort, we created a community website where people can review the current schema, and directly make recommendations for changes and additions. The site facilitates the viewing of all recommended changes, and for curators to manage releases; i.e. accepting changes, tagging releases, and exporting the related artifacts (an XSD schema, and reference parser library).
I designed and implemented the UI, lead requirements discussions, and implemented the export functions for the schema and reference parser library.
A part of supporting ToxML is a tool for visualizing, editing, and creating ToxML documents. This was surprisingly challenging with many hundreds of unique fields across the dozen or so study types and structure-related information. A prior implementation based on JGoodies proved too complex to maintain, especially at that early, fluid point in ToxML's definition. When I took over the project, I shifted the approach to first model ToxML with abstractions for collections, primitive types, and composite objects, implement editor components by type, and move all of the form and vocabulary definitions to XML documents. This allowed ToxML to continue to change and expand without complicating the base code.
More recently, I extended the tool into a repository application: splitting it into client and server components, and adding version control (history, differencing, and tags).
In 2004, the EU banned animal testing for cosmetics. The SEURAT-1 research initiative was then created to advance alternative methods for assessing safety. Leadscope joined the subproject, Toxbank, responsible for warehousing the newly generated toxicity data. The project has some interesting challenges in that the data being collected is from new techniques and study types, whose varied schemas are still in flux. To address this, flexible data representations were adopted (e.g. ISATab and RDF). My personal role has been the user interface; a Play! application that accesses several repositories through RESTful interfaces.
For a few of the older web applications, I used a basic framework that I created. It's a handful of classes built on top of raw Java servlets that handle routing, serialization, session management, and template rendering with StringTemplate. These days, however, I would reach for Jersey under any of the same circumstances.
This is an internal business application we made for a photography representation company. It handles common business tasks like scheduling, invoicing, as well as a work flow for usage rights negotiation. My role was entirely on the server-side operations; web services, business logic, persistence, and some basic dev-ops work. The services were designed to work with SmartClient JavaScript components (the components expect a specific message format provided by Isomorphic's commercial server, which we did not adopt). I also integrated the system with the Google API for user authentication, and managing contacts.
This was a WordPress site for an American Express periodical. The project involved extending WordPress into a full content management system.
For several years now, we've been putting together seasonal fashion trend forecast sites for Cotton, Inc. The sites are completely redesigned on a regular basis. Initially, this was implemented in Nanoc (a static-site compiler in Ruby). I later ported it to React, allowing the designers to define the site with JSON configuration files.
I've been playing board games forever, and have been following boardgamegeek.com for a while. At one point, they released an API to access ratings and other information, and coincidentally I was looking at Pearson co-efficients at work. It seemed like it might be useful for automatic recommendations. So, I wrote a scraper to collect users, games, and ratings, then calculated correlations between games and then between players, and made the results searchable through a simple web interface. The user-based recommendations didn't work out that well, but the correlations between games were effective.
Back in college I wrote a couple of games for the Macintosh (System 7), written in C, using the Sprite Animation Toolkit. One was a Spacewar/Scorched Earth mashup. The other was a remake of Labyrinth, an old Apple 2 game. They were both public domain, and the latter got a nice write-up in Inside Mac Games.
Publications: