Open for Full Stack Data Analytics,Data Science, Applied ML-NLP & LLMs roles

Victor Owino

Full Stack Data Analyst | ML+ NLP Specialist

I build machine learning, NLP, generative AI, and full-stack data analytics systems that turn complex data into actionable insight for healthcare, finance, language technology, and decision-making.

📍 New York City, NY 📍 Kirkland, WA 🎓 MS Computational Linguistics, 2026 🎓 BSc Computer Science, 2020 🧠 Full Stack Data Analyst • Business Intelligence(BI) Developer 🧠 ML• NLP • LLMs • Data Science + AI

// 01_about

Research-minded data scientist with over 8 years of full-stack production analytics experience.

I am a Full Stack Data Analyst and ML+ NLP Specialist focused full data and AI pipeline, from raw data extraction to analytics delivery, machine learning modeling, and NLP research. My strength is connecting data engineering, business intelligence, machine learning, and language AI into practical systems that support decision-making and real-world applications. My work combines model development, data pipelines, dashboards, and stakeholder-facing insight delivery.

I have worked across research, healthcare and finance analytics, data engineering, and Business Intelligence (BI) development and reporting. I enjoy building systems that connect raw data, machine learning models, and clear decision-making outputs.

Full Stack Data Analytics Open Source NLP + Transformers LLM Evaluation LLM Fine-tuning & Deployment Power BI + Tableau MS Fabric Data Engineering Python + SQL+ SparkSQL, pySpark Healthcare + Analytics + AI

profile.json

{
  "name": "Victor Owino",
  "role": "Data Scientist & Full-stack Data Analyst",
  "location": ["New York City, NY", "Kirkland, WA"],
  "focus": [
    "NLP  and LLMs", "Generative AI", "Healthcare Data Science",
    "Machine Learning", "Full Stack Data Analytics"
  ],
  "tools": ["Python", "SQL", "PyTorch", "Power BI", "Tableau", " open-source Hugging Facetransformers"],
  "availability": "Open to full-time/part-time roles"
}

// 02_focus

Where I create value.

DS

Data Science & ML

Predictive modeling, feature engineering, classification, regression, and performance evaluation.

scikit-learn • XGBoost • PyTorch
NLP

NLP & Computational Linguistics

Text classification, NER, corpus annotation, multilingual analysis, and language technology evaluation.

BERT • spaCy • Hugging Face
AI

Generative AI & LLM Evaluation

Prompt engineering, structured reasoning, LLM benchmarking, and semantic similarity scoring.

LLMs • PIC • SBERT
BI

Data Analytics & BI

Executive dashboards, KPI reporting, data storytelling, and operational decision support.

Power BI • Tableau • SQL
HX

Biomedical & Healthcare AI

Clinical NLP, adverse event detection, EHR modeling, and healthcare analytics workflows.

ClinicalBERT • MIMIC-IV • EHR
APP

End-to-End AI Apps

Data pipelines, model workflows, API-driven prototypes, dashboards, and deployment-ready interfaces.

Python • REST APIs • Git

// 03_research

Research highlights.

Biomedical NLP · LLM Reasoning

Pragmatic Inference Chain Reasoning for ADE Detection

Clinical adaptation of structured prompting to help LLMs identify medication-harm links in adverse drug event detection from clinical notes.

LLMsPICClinical NLPADE
$ run_experiment --task ade_detection --method PIC
status: conference submission
focus: structured reasoning + biomedical NLP
Healthcare AI · EHR

Deep Learning for Blood Transfusion Adverse Events

Deep learning approaches for predicting blood transfusion adverse events using patient-record data and healthcare modeling workflows.

Deep LearningEHRMIMIC-IV
$ train_model --domain healthcare --risk adverse_events
output: patient-level prediction workflow
Multilingual NLP · MT Evaluation

Multilingual Euphemism & Machine Translation Evaluation

Swahili annotation and cross-lingual benchmark development for evaluating LLM understanding of euphemisms, pragmatic meaning, and translation quality.

SwahiliLLM EvalMTAnnotation
$ evaluate --language swahili --task euphemism_translation
metric: exact match + semantic similarity

// 04_projects

Featured projects.

BERT NERClinical NLP

BERT NER for Clinical Trials Eligibility

Fine-tuned BERT, BioBERT, and ClinicalBERT models to extract Conditions, Drugs, and Procedures from clinical text using BIO tagging.

  • Implemented subword-token alignment and exact-boundary evaluation.
  • Achieved entity-level F1 of 0.85 with error analysis.
PythonPyTorchHugging FaceBioBERT
Credit RiskML + BI

Credit Risk Analytics & Default Prediction

Built analytical datasets and engineered financial risk features for loan default prediction and portfolio monitoring.

  • Trained Logistic Regression, Random Forest, and XGBoost models.
  • Created dashboards for risk tracking and decision support.
PythonSQLXGBoostPower BI
Fraud DetectionDashboard

Healthcare Insurance Fraud Detection Dashboard

Cleaned, transformed, and enriched claims data to monitor suspicious patterns in procedures, diagnoses, and billing amounts.

  • Built a Power BI dashboard for healthcare claims analytics.
  • Supported data-driven fraud monitoring workflows.
Power BIPower QuerySQLPython
PIC ReasoningLLM Eval

PIC Reasoning for ADE Detection with LLMs

Designed structured prompting workflows for adverse drug event classification, comparing PIC strategies with zero-shot and few-shot baselines.

  • Focused on clinical inference beyond surface drug-symptom overlap.
  • Evaluated LLM outputs with strict binary classification rules.
LLMsPrompt EngineeringADENLP
Translation EvalMultilingual NLP

Multilingual Euphemism Translation Evaluation

Built evaluation workflows for testing whether LLMs preserve implicit and euphemistic meaning across languages and contexts.

  • Led Swahili euphemism annotation for experiment design.
  • Used exact match and semantic similarity evaluation methods.
SwahiliSBERTLLM EvalMT
Public HealthAnalytics

Infectious Disease Analytics & Risk Modeling

Supported public health research by cleaning data, engineering features, creating analysis-ready datasets, and co-developing statistical/ML models.

  • Contributed to zoonotic MERS-CoV transmission risk analysis.
  • Created visualizations for researchers and stakeholders.
PythonRSQLPower BI

// 05_experience

Experience timeline.

09/2025 — 05/2026

Research Assistant – NLP and Deep Learning

Montclair State University · Montclair, NJ

  • Conducted NLP and biomedical AI research on PIC reasoning for adverse drug event detection and deep learning for transfusion adverse event prediction.
  • Collaborated on multilingual euphemism research, leading Swahili annotation and contributing to LLM translation evaluation.
LLMsBiomedical NLPDeep Learning

09/2023 — 12/2024

Analytical Data Engineer

Kidogo Innovations Limited · Nairobi, Kenya

  • Designed dimensional and relational models in BigQuery for dashboards, KPI reporting, and self-service analytics.
  • Built Python and SQL ETL/ELT workflows to improve consistency, refresh reliability, and reporting turnaround.
BigQueryETLSQLPower BI

10/2021 — 08/2023

Health Data Analyst

Kenyatta National Hospital – CONNECT Program · Nairobi, Kenya

  • Built ETL pipelines integrating facility-level data into centralized analytical datasets across 22 facilities.
  • Developed dashboards and executive reports for leadership, CDC teams, and Ministry of Health stakeholders.
Healthcare AnalyticsDashboardsReporting

01/2021 — 09/2021

Data Science Associate

University of Nairobi UNITID · Nairobi, Kenya

  • Conducted statistical analysis, data cleaning, and feature engineering for infectious disease surveillance research.
  • Co-developed ML and statistical models to identify key drivers of zoonotic MERS-CoV transmission in Kenya.
PythonRMLPublic Health

// 06_skills

Technical toolkit.

programming_querying.py

Python · R · SQL · Bash

ml_deep_learning.py

scikit-learn · XGBoost · TensorFlow · PyTorch · feature engineering · classification · regression

nlp_llms.py

Hugging Face · BERT · BioBERT · ClinicalBERT · spaCy · NLTK · prompt engineering · LLM evaluation

data_engineering.sql

ETL/ELT · workflow orchestration · data lakes · data warehousing · REST APIs · JSON · CSV · Parquet

cloud_databases.sh

PostgreSQL · MySQL · Google BigQuery · Neo4j · AWS · Azure · GCP

analytics_bi.pbix

Power BI · Tableau · Excel · dashboards · KPI reporting · statistical analysis

// 07_education_certifications

Education & certifications.

Expected 05/2026

Master of Science, Computational Linguistics

Montclair State University

Relevant coursework: Machine Learning, Deep Learning, Natural Language Processing, Special Topics in Generative AI.

2021

Bachelor of Science, Computer Science

University of Eldoret

Relevant coursework: Database Management Systems, Object-Oriented Programming, Data Structures and Algorithms.

Certifications

Professional credentials

  • Accredited Python Data Engineer
  • IBM Certified Specialist: Data Analysis with Python
  • Microsoft Certified: Data Analyst Associate

// 08_contact

Let’s build something data-driven.

Interested in discussing a data science, NLP, analytics, machine learning, or applied AI role? Let’s connect.

$ contact --email
victorowinoke@gmail.com

$ location
New York City, NY • Kirkland, WA