Summary
A programmer passionate about leveraging automation, data and technology to promote open government and improve socioeconomic and environmental issues. I have 11 years of experience working as programmer to leverage data and technology to inform research, policy and regulation in banking and biopharmaceuticals.
Projects
Predicting Building Site Energy Usage
- Predicting building site energy usage using building performance data.
Linear regression and random forest models were trained, but a random
forest showed the lowest mean absolute error (TensorFlow, sklearn)
Exploratory Data Analysis of Los Angeles Traffic Collisions
- Course final project demonstrating experience with EDA and visualization
in python (pandas, matplotlib, seaborn).
Experiences
- Developed custom python data pipeline to orchestrate and transform source data into RDF (triple-store) for entity reconciliation
- Orchestrated data pipelines and implemented data testing framework using Airflow
- Reconfigured three knowledge extraction pipelines through normalization, knowledge extraction, and graph representation
- Implemented new code deployment and promotion process with external contractors
- Developed python module standardize and extend PySpark functionality amongst team
- Developed custom python code to parse corrupted xml files
- Documented new features and processes using confluence and created training videos to help cross-train and coach new employees/contractors
- Created new Development and QA environments to ensure seamless code handoffs to Production operations team
- Developed four extract-transform-notify python automation jobs to eliminate over 700 hours of staff work per year
- Designed and implemented new ETL process to transition survey collection platform to PaaS
- Performed data migration and database design to on-board two new survey collections
- Developed dashboard for preliminary survey data to enable rea-time data acess for researches, thus eliminating a 1-day delay
- Developed K-means clustering notebook and KNN clustering notebook to test proof of concept to pilot AWS Sagemaker environment
- Led requirements, development, testing, and deployment of five minor version software releases
- Developed python scripts to migrate survey data to new database environment
- Developed R script to automate the retrieval, cleaning, and validation of data for economic modeling
- Developed VBA excel workbook to automatically update FOMC regional data, eliminating 4 days of rework for every 6-week cycle
- Provided research support and analysis for real-time economic updates in the 11th District
- Improved data collection and analysis of district cafeteria IRS de minimis calculation
- Analyzed and summarized regional economic conditions for the Board of Directors
- Supported executive leadership research requests during and after Hurricane Harvey
- Represented Houston Branch on the President’s Sustainability Initiative Council
Technical Skills
- Python, R, Scala, SAS
- PySpark, Airflow, Hadoop
- Neo4j
- PostgreSQL (psycopg2)
- SQL Server (SSIS)
- Mongo, Redis
- MarkLogic (RDF/triple store), SPARQL
- Spark, PySpark
- Pandas, numpy
- Hive, Impala, T-SQL, pgSQL
- Linux, Docker, AWS (S3, EC2, Sagemaker, boto3)
- Jira, Agile, Git, GitHub/GitLab, CML, Anaconda, Jupyter
- Keras Tensorflow
- Scikit-learn
- Tableau, Power BI, Spotfire
- Python (Matplotlib, Seaborn, Plotly)
- R (ggplot)
- HTML, CSS
- Machine learning: clustering, K means, K nearest neighbors, decision trees, random forest, naive Bayes, neural networks, CNN
- Statistics: linear regression, linear regression, gradient descent
Publications
Iowa State University, 2018
Federal Reserve Bank of Dallas, 2018