Reading view

There are new articles available, click to refresh the page.

Designer Vulnerabilities

A (mostly chronological) list of vulnerabilities that have “designer” names.

Total vulnerabilities:

*Disclaimer: This is a best effort at getting all “named” vulnerabilities. It’s very likely I have missed a few.

Feel free to let me know if there are any vulns I am missing! This list will continue to be updated as new named vulnerabilities are announced.

Shout-Outs

- These names are not be confused with those produced by vulnonym.
- Shout out to the folks at 0day.marketing - we have them to thank for some of this madness =).
- I’d like to call out this site which has provided a useful timeline for low-level attacks.
- Cheers to nate at GreyNoise for crediting my list as data-inspiration for this cool infographic and timeline.
- LOL at NOSHIT (CVE-2023-39848), CuppaJoe & Ass Bleed (CVE-2024-3094)

CVE Search

Use the search box below to query the NVD or MITRE database(s) for more information on other vulnerabilities!


Search for a particular CVE value or any other search term (e.g. “Heartbleed”)

Infosec Tools

A list of information security tools I use for assessments, investigations and other cybersecurity tasks.

Also worth checking out is CISA’s list of free cybersecurity services and tools.

Jump to Section


OSINT / Reconnaissance

Network Tools (IP, DNS, WHOIS)

Breaches, Incidents & Leaks

FININT (Financial Intelligence)

  • GSA eLibrary - Source for the latest GSA contract award information

GEOINT (Geographical Intelligence)

HUMINT (Human & Corporate Intelligence)

  • No-Nonsense Intel - List of keywords which you can use to screen for adverse media, military links, political connections, sources of wealth, asset tracing etc
  • CheckUser - Check desired usernames across social network sites
  • CorporationWiki - Find and explore relationships between people and companies
  • Crunchbase - Discover innovative companies and the people behind them
  • Find Email - Find email addresses from any company
  • Info Sniper - Search property owners, deeds & more
  • Library of Leaks - Search documents, companies and people
  • LittleSis - Who-knows-who at the heights of business and government
  • NAMINT - Shows possible name and login search patterns
  • OpenCorporates - Legal-entity database
  • That’s Them - Find addresses, phones, emails and much more
  • TruePeopleSearch - People search service
  • WhatsMyName - Enumerate usernames across many websites
  • Whitepages - Find people, contact info & background checks

IMINT (Imagery/Maps Intelligence)

MASINT (Measurement and Signature Intelligence)

SOCMINT (Social Media Intelligence)

Email

Code Search

  • grep.app - Search across a half million git repos
  • PublicWWW - Find any alphanumeric snippet, signature or keyword in the web pages HTML, JS and CSS code
  • searchcode - Search 75 billion lines of code from 40 million projects

Scanning / Enumeration / Attack Surface


Offensive Security

Exploits

  • Bug Bounty Hunting Search Engine - Search for writeups, payloads, bug bounty tips, and more…
  • BugBounty.zip - Your all-in-one solution for domain operations
  • CP-R Evasion Techniques
  • CVExploits - Comprehensive database for CVE exploits
  • DROPS - Dynamic CheatSheet/Command Generator
  • Exploit Notes - Hacking techniques and tools for penetration testings, bug bounty, CTFs
  • ExploitDB - Huge repository of exploits from Offensive Security
  • files.ninja - Upload any file and find similar files
  • Google Hacking Database (GHDB) - A list of Google search queries used in the OSINT phase of penetration testing
  • GTFOArgs - Curated list of Unix binaries that can be manipulated for argument injection
  • GTFOBins - Curated list of Unix binaries that can be used to bypass local security restrictions in misconfigured systems
  • Hijack Libs - Curated list of DLL Hijacking candidates
  • Living Off the Living Off the Land - A great collection of resources to thrive off the land
  • Living Off the Pipeline - CI/CD lolbin
  • Living Off Trusted Sites (LOTS) Project - Repository of popular, legitimate domains that can be used to conduct phishing, C2, exfiltration & tool downloading while evading detection
  • LOFLCAB - Living off the Foreign Land Cmdlets and Binaries
  • LoFP - Living off the False Positive
  • LOLBAS - Curated list of Windows binaries that can be used to bypass local security restrictions in misconfigured systems
  • LOLC2 - Collection of C2 frameworks that leverage legitimate services to evade detection
  • LOLESXi - Living Off The Land ESXi
  • LOLOL - A great collection of resources to thrive off the land
  • LOLRMM - Remote Monitoring and Management (RMM) tools that could potentially be abused by threat actors
  • LOOBins - Living Off the Orchard: macOS Binaries (LOOBins) is designed to provide detailed information on various built-in macOS binaries and how they can be used by threat actors for malicious purposes
  • LOTTunnels - Living Off The Tunnels
  • Microsoft Patch Tuesday Countdown
  • offsec.tools - A vast collection of security tools
  • Shodan Exploits
  • SPLOITUS - Exploit search database
  • VulnCheck XDB - An index of exploit proof of concept code in git repositories
  • XSSed - Information on and an archive of Cross-Site-Scripting (XSS) attacks

Red Team

  • ArgFuscator - Generates obfuscated command lines for common system tools
  • ARTToolkit - Interactive cheat sheet, containing a useful list of offensive security tools and their respective commands/payloads, to be used in red teaming exercises
  • Atomic Red Team - A library of simple, focused tests mapped to the MITRE ATT&CK matrix
  • C2 Matrix - Select the best C2 framework for your needs based on your adversary emulation plan and the target environment
  • ExpiredDomains.net - Expired domain name search engine
  • Living Off The Land Drivers - Curated list of Windows drivers used by adversaries to bypass security controls and carry out attacks
  • Unprotect Project - Search Evasion Techniques
  • WADComs - Curated list of offensive security tools and their respective commands, to be used against Windows/AD environments

Web Security

  • Invisible JavaScript - Execute invisible JavaScript by abusing Hangul filler characters
  • INVISIBLE.js - A super compact (116-byte) bootstrap that hides JavaScript using a Proxy trap to run code

Security Advisories

  • CISA Alerts - Providing information on current security issues, vulnerabilities and exploits
  • ICS Advisory Project - DHS CISA ICS Advisories data visualized as a Dashboard and in Comma Separated Value (CSV) format to support vulnerability analysis for the OT/ICS community

Attack Libraries

A more comprehensive list of Attack Libraries can be found here.

  • ATLAS - Adversarial Threat Landscape for Artificial-Intelligence Systems is a knowledge base of adversary tactics and techniques based on real-world attack observations and realistic demonstrations from AI red teams and security groups
  • ATT&CK
  • Risk Explorer for Software Supply Chains - A taxonomy of known attacks and techniques to inject malicious code into open-source software projects.

Vulnerability Catalogs & Tools

Risk Assessment Models

A more comprehensive list of Risk Assessment Models and tools can be found here.


Blue Team

CTI & IoCs

  • Alien Vault OTX - Open threat intelligence community
  • BAD GUIDs EXPLORER
  • Binary Edge - Real-time threat intelligence streams
  • CLOAK - Concealment Layers for Online Anonymity and Knowledge
  • Cloud Threat Landscape - A comprehensive threat intelligence database of cloud security incidents, actors, tools and techniques. Powered by Wiz Research
  • CTI AI Toolbox - AI-assisted CTI tooling
  • CTI.fyi - Content shamelessly scraped from ransomwatch
  • CyberOwl - Stay informed on the latest cyber threats
  • Dangerous Domains - Curated list of malicious domains
  • HudsonRock Threat Intelligence Tools - Cybercrime intelligence tools
  • InQuest Labs - Indicator Lookup
  • IOCParser - Extract Indicators of Compromise (IOCs) from different data sources
  • Malpuse - Scan, Track, Secure: Proactive C&C Infrastructure Monitoring Across the Web
  • ORKL - Library of collective past achievements in the realm of CTI reporting.
  • Pivot Atlas - Educational pivoting handbook for cyber threat intelligence analysts
  • Pulsedive - Threat intelligence
  • ThreatBook TI - Search for IP address, domain
  • threatfeeds.io - Free and open-source threat intelligence feeds
  • ThreatMiner - Data mining for threat intelligence
  • TrailDiscover - Repository of CloudTrail events with detailed descriptions, MITRE ATT&CK insights, real-world incidents references, other research references and security implications
  • URLAbuse - Open URL abuse blacklist feed
  • urlquery.net - Free URL scanner that performs analysis for web-based malware

URL Analysis

Static / File Analysis

  • badfiles - Enumerate bad, malicious, or potentially dangerous file extensions
  • CyberChef - The cyber swiss army knife
  • DocGuard - Static scanner and has brought a unique perspective to static analysis, structural analysis
  • dogbolt.org - Decompiler Explorer
  • EchoTrail - Threat hunting resource used to search for a Windows filename or hash
  • filescan.io - File and URL scanning to identify IOCs
  • filesec.io - Latest file extensions being used by attackers
  • Kaspersky TIP
  • Manalyzer - Static analysis on PE executables to detect undesirable behavior
  • PolySwarm - Scan Files or URLs for threats
  • VirusTotal - Analyze suspicious files and URLs to detect malware

Dynamic / Malware Analysis

Forensics

  • DFIQ - Digital Forensics Investigative Questions and the approaches to answering them

Phishing / Email Security


Assembly / Reverse Engineering


OS / Scripting / Programming

Regex


Password


AI

  • OWASP AI Exchange - Comprehensive guidance and alignment on how to protect AI against security threats

Assorted

OpSec / Privacy

  • Awesome Privacy - Find and compare privacy-respecting alternatives to popular software and services
  • Device Info - A web browser security testing, privacy testing, and troubleshooting tool
  • Digital Defense (Security List) - Your guide to securing your digital life and protecting your privacy
  • DNS Leak Test
  • EFF | Tools from EFF’s Tech Team - Solutions to the problems of sneaky tracking, inconsistent encryption, and more
  • Privacy Guides - Non-profit, socially motivated website that provides information for protecting your data security and privacy
  • Privacy.Sexy - Privacy related configurations, scripts, improvements for your device
  • PrivacyTests.org - Open-source tests of web browser privacy
  • switching.software - Ethical, easy-to-use and privacy-conscious alternatives to well-known software
  • What’s My IP Address? - A number of interesting tools including port scanners, traceroute, ping, whois, DNS, IP identification and more
  • WHOER - Get your IP

Jobs

  • infosec-jobs - Find awesome jobs and talents in InfoSec / Cybersecurity

Conferences / Meetups

Infosec / Cybersecurity Research & Blogs

Funny

Walls of Shame

  • Audit Logs Wall of Shame - A list of vendors that don’t prioritize high-quality, widely-available audit logs for security and operations teams
  • Dumb Password Rules - A compilation of sites with dumb password rules
  • The SSO Wall of Shame - A list of vendors that treat single sign-on as a luxury feature, not a core security requirement
  • ssotax.org - A list of vendors that have SSO locked up in an subscription tier that is more than 10% more expensive than the standard price
  • Why No IPv6? - Wall of shame for IPv6 support

Other

Dynamization of Jekyll

Jekyll is a framework for creating websites/blogs using static plain-text files. Jekyll is used by GitHub Pages, which is also the current hosting provider for Shellsharks.com. I’ve been using Git Pages since the inception of my site and for the most part have no complaints. With that said, a purely static site has some limitations in terms of the types of content one can publish/expose.

I recently got the idea to create a dashboard-like page which could display interesting quantitative data points (and other information) related to the site. Examples of these statistic include, total number of posts, the age of my site, when my blog was last updated, overall word count across all posts, etc… Out of the box, Jekyll is limited in its ability to generate this information in a dynamic fashion. The Jekyll-infused GitHub pages engine generates the site via an inherent pages-build-deployment Git Action (more on this later) upon commit. The site will then stay static until the next build. As such, it has limited native ability to update content in-between builds/manual-commits.

To solve for this issue, I’ve started using a variety of techniques/technologies (listed below) to introduce more dynamic functionality to my site (and more specificially, the aforementioned statboard).

Jekyll Liquid

Though not truly “dynamic”, Liquid* templating language is an easy, Jekyll-native way to generate static content in a quasi-dynamic way at site build time. As an example, if I wanted to denote the exact date and time that a blog post was published I might first try to use the Liquid template {{ site.time }}. What this actually ends up giving me is a time stamp for when the site was built (e.g. 2025-05-07 17:03:43 -0400), rather than the last updated date of the post itself. So instead, I can harness the posts custom front matter, such as “updated:”, and access that value using the tag {{ page.updated }} (so we get, __).

One component on the (existing) Shellsharks statboard calculates the age of the site using the last updated date of the site (maintained in the change log), minus the publish date of the first-ever Shellsharks post. Since a static, Jekyll-based, GitHub Pages site is only built (and thus only updated) when I actually physically commit an update, this component will be out of date if I do not commit atleast daily. So how did I solve for this? Enter Git Actions.

* Learn more about the tags, filters and other capabilities of Liquid here.

JavaScript & jQuery

Before we dive into the power of Git Actions, it’s worth mentioning the ability to add dynamism by simply dropping straight up, in-line JavaScript directly into the page/post Markdown (.md) files. Remember, Jekyll produces .html files directly from static, text-based files (like Markdown). So the inclusion of raw JS syntax will translate into embdedded, executable JS code in the final, generated HTML files. The usual rules for in-page JS apply here.

One component idea I had for the statboard was to have a counter of named vulnerabilities. So how could I grab that value from the page? At first, I tried fetching the DOM element with the id in which the count was exposed. However this failed because fetching that element alone meant not fetching the JS and other HTML content that was used to actually generate that count. To solve for this, I used jQuery to load the entire page into a temporary <div> tag, then iterated through the list (<li>) elements within that div (similar to how I calculate it on the origin page), and then finally set the dashboard component to the calculated count!

$('<div></div>').load('/infosec-blogs', function () {
  var blogs = $("li",this).length;
  $("#iblogs").html(blogs);
});
Additional notes on the use of JS and jQuery
  • I used Google’s Hosted Libraries to reference jQuery <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.6.0/jquery.min.js"></script>.
  • Be weary of adding JS comments // in Markdown files as I noticed the Jekyll parsing engine doesn’t do a great job of new-lining, and thus everything after a comment will end up being commented.
  • When using Liquid tags in in-line JS, ensure quotes (‘’,””) are added around the templates so that the JS code will recognize those values as strings (where applicable).
  • The ability to add raw, arbitrary JS means there is a lot of untapped capability to add dynamic content to an otherwise static page. Keep in mind though that JS code is client-side, so you are still limited in that typical server-side functionality is not available in this context.

Git Actions

Thanks to the scenario I detailed in the Jekyll Liquid section, I was introduced to the world of Git Actions. Essentially, I needed a way to force an update / regeneration of my site such that one of my staticly generated Liquid tags would update at some minimum frequency (in this case, at least daily). After some Googling, I came across this action which allowed me to do just that! Essentially, it forces a blank build using a user-defined schedule as the trigger.

# File: .github/workflows/refresh.yml
name: Refresh

on:
  schedule:
    - cron:  '0 3 * * *' # Runs every day at 3am

jobs:
  refresh:
    runs-on: ubuntu-latest
    steps:
      - name: Trigger GitHub pages rebuild
        run: |
          curl --fail --request POST \
            --url https://api.github.com/repos/${{ github.repository }}/pages/builds \
            --header "Authorization: Bearer $USER_TOKEN"
        env:
          # You must create a personal token with repo access as GitHub does
          # not yet support server-to-server page builds.
          USER_TOKEN: ${{ secrets.USER_TOKEN }}

In order to get this Action going, follow these steps…

  1. Log into your GitHub account and go to Settings (in the top right) –> Developer settings –> Personal access tokens.
  2. Generate new token and give it full repo access scope (More on OAuth scopes). I set mine to never expire, but you can choose what works best for you.
  3. Navigate to your GitHub Pages site repo, ***.github.io –> Settings –> Secrets –> Actions section. Here you can add a New repository secret where you give it a unique name and set the value to the personal access token generated earlier.
  4. In the root of your local site repository, create a .github/workflows/ folder (if one doesn’t already exist).
  5. Create a <name of your choice>.yml file where you will have the actual Action code (like what was provided above).
  6. Commit this Action file and you should be able to see run details in your repo –> Actions section within GitHub.
Additional Considerations for GitHub Actions
  • When using the Liquid tag {{ site.time }} with a Git Action triggered build, understand that it will use the time of the server which is generating the HTML, in this case the GitHub servers themselves, which means the date will be in UTC (Conversion help).
  • Check out this reference for informaton on how to specify the time zone in the front matter of a page or within the Jekyll config file.
  • GitHub Actions are awesome and powerful, but their are limitations to be aware of. Notably, it is important to understand the billing considerations. Free tier accounts get 2,000 minutes/month while Pro tier accounts (priced at about $44/user/year) get 3,000.
  • For reference, the refresh action (provided above) was running (for me) at about 13 seconds per trigger. This means you could run that action over 9,000 times without exceeding the minute cap for a Free-tier account.
  • With the above said, also consider that the default pages-build-deployment Action used by GitHub Pages to actually generate and deploy your site upon commit will also consume those allocated minutes. Upon looking at my Actions pane, I am seeing about 1m run-times for each build-and-deploy action trigger.

What’s Next

I’ve only just started to scratch the surface of how I can further extend and dynamize my Jekyll-based site. In future updates to this guide (or in future posts), I plan to cover more advanced GitHub Action capabilities as well as how else to add server-side functionality (maybe through serverless!) to the site. Stay tuned!

Databending Part 5 — Listening to Telephone Codecs

Today we'll be talking about the VOX or Dialogic ADPCM format — a lossy algorithm from Oki Electric for digital voice telephony — and using it to translate raw data (e.g., program files) into audio. As I mentioned in my first post on the topic, at a certain point,

once you listen to the “sonified” data from enough files, commonalities start to become apparent. Many programs use some of the same [library] files, and…[even] differently-named library files sometimes contain similar elements — likely re-used code patterns, or further library code compiled in.

One way I've found of getting more variety from the data is to change the sample format in which I import it. If I divide or group the raw bytes in different ways, or treat them as coming from audio files with different encodings, I can get different sound results from the exact same data. For example, here is the same data (libicudata.73.1.dylib from the Calibre e-book manager macOS app) imported into Audacity first as 16-bit integer, then as VOX ADPCM:

Both very cool, and both very different, despite coming from the same data! Today, let's talk about how ADPCM and VOX formats work; how to do this yourself in Audacity; and how I incorporated these formats into the Rust tool I made in my last post to automate the process of converting data to audio. Also check out the end of today's post for some fun stuff to look forward to!

Following along in Audacity

While I will be doing this in Rust, you can make the exact same sounds (minus the convenience of automation) in Audacity. See my first post on databending for a discussion of how to find the best files to use. Once you have the file you want to convert to audio,

  • in Audacity, go to File > Import > Raw Data…, choose your file, and click “Open”
  • in the settings menu that pops up, set encoding to “VOX ADPCM,” byte order to “default endianness,” channels to “1 channel (mono),” and sample rate to 44100 (or change sample rate to taste)

What is ADPCM?

Tan and Jiang [1] have a helpful discussion of the basics of ADPCM, or “adaptive differential pulse-code modulation.” First, with differential pulse-code modulation (the “non-adaptive” flavor),

[the] general idea is to use past recovered values as the basis to predict the current input data and then encode the difference between the current input and the predicted input. (486)

In other words, if we can predict the output to within a decent approximation, all we need to store is the difference between our rough prediction and the actual output. This difference uses less data to store than the original signal, saving bandwidth. The following diagram illustrates the signal flow for the encoder (A) and decoder (B):

Differential pulse code modulation (DPCM) block diagram. A quantizer feeds back into a prediction of the output; the prediction is compared to the actual next sample; and the difference is used for the next prediction.

DPCM block diagram from Tan and Jiang

Note that while we describe a “predictor,” there isn't anything fancy here — we simply “predict” that the current sample will equal the previous one and take the (quantized) difference between that and the actual current sample.

The next diagram shows the adaptive version of the decoder as shown in the original VOX ADPCM paper from the Dialogic Corporation. [2] The primary difference here is the addition of an adaptive scaling factor for the difference between prediction and actual value. This scaling factor is based on the amplitude of the incoming difference, and we will discuss the specifics of the scaling in the next section.

VOX ADPCM (adaptive differential pulse code modulation) decoder block diagram.

ADPCM decoder block diagram from the Dialogic Corporation

VOX

There are a number of ADPCM algorithms — many different ways to adapt our step size based on the amplitude of the difference and/or prediction — and after testing some out while importing data as audio in Audacity, I decided VOX was by far my favorite. Unfortunately I wasn't able to find anything pre-existing in Rust for VOX — the symphonia crate that was recommended to me only has Microsoft and IMA flavors of ADPCM. Looks like I need to code it myself! You can find the resulting code here.

Here's a snippet of audio databent through my resulting VOX ADPCM implementation:

The file is libQt5Core.5.dylib which I believe I pulled from DaVinci Resolve a week or two ago. Also, just as a check, here's a voice file (8 kHz sample rate) I encoded as VOX ADPCM with Audacity [3] and decoded with this Rust tool:

Sounds just as expected — a bit crunchy and lo-fi like a telephone, but clear and comprehensible.

Reading the VOX Spec

First, we need to calculate the step size ss(n) and use that and the 4-bit input sample L(n) to calculate the difference d(n). That difference plus the previous output X(n-1) will give our 12-bit output value. Below is the pseudocode from the Dialogic paper for calculating d(n) given a value of ss(n) and an incoming sample. Note the values B3–B0 — these refer to the 4 bits in the incoming sample, with B3 as the sign and the rest as the magnitude.

d(n) = (ss(n)*B2)+(ss(n)/2*B1)+(ss(n)/4*BO)+(ss(n)/8) 
if (B3 = 1)
    then d(n) = d(n) * (-1)
X(n) = X(n-1) + d(n)

To make this calculation, we need to get the step size ss(n). The pseudocode for that is shown below:

ss(n+1) = ss(n) * 1.1M(L(n))

The paper includes a pair of lookup tables to efficiently calculate this value. Here they are as I use them in my Rust code. We use the 4-bit incoming value to look up an “adjustment factor” in the ADPCM_INDEX_TABLE, and we move an index into the VOX_STEP_TABLE table by that adjustment factor. This index is initialized to zero, giving the first value in that table — 16.

vox.rs
// duplicate values from spec; can index w/ whole nibble, incl sign bit (4th)
// increment up/down thru this table...
const ADPCM_INDEX_TABLE: [i16; 16] = [
    -1, -1, -1, -1, 2, 4, 6, 8,
    -1, -1, -1, -1, 2, 4, 6, 8,
];
// ...use (clamped) index table to index this array for step size
const VOX_STEP_TABLE: [i16; 49] = [
    16, 17, 19, 21, 23, 25, 28, 31, 34, 37, 41, 45,
    50, 55, 60, 66, 73, 80, 88, 97, 107, 118, 130, 143, 
    157, 173, 190, 209, 230, 253, 279, 307, 337, 371, 408, 449, 
    494, 544, 598, 658, 724, 796, 876, 963, 1060, 1166, 1282, 1411, 1552,
];

Note that incoming magnitudes (first 3 bits) below 4 cause the step size to decrease, and values 4 or greater cause it to increase. The values in ADPCM_INDEX_TABLE are duplicated so I can use the whole 4-bit value (including bit 4, the sign bit) to index the table.

Implementing VOX in Rust

To start, I have a struct called VoxState that stores the predictor and step index. Note in the diagram above that these two values are fed into single-sample delays (the blocks labeled “Z-1”), [4] so having them stored in a struct allows us to maintain state between calls to the decoder function.

vox.rs
pub struct VoxState {
    predictor: i16,
    step_index: i16,
}

I implement a vox_decode() function for the VoxState struct, as shown below. We get the step size from last time around, then update the step size for next time. The sign is the 4th bit of the incoming nibble, and the magnitude is the lower 3 bits. We get the difference between the current value and the prediction from last time with the line let mut delta = ((2 * (magnitude as i16) + 1) * step_size) >> 3; — we will come back to how this relates to the pseudocode in a bit.

We either add or subtract the predictor and delta, depending on the sign bit, and clamp the predictor to the range of a 12-bit signed integer. When we return this value from the function, we multiply it by 16, scaling it into the range of the 16 bit integer format of the .WAV file we'll write later. Before returning, we'll also update the struct's step index for next time around.

vox.rs
impl VoxState {
    // ...
    pub fn vox_decode(&mut self, in_nibble: &u8) -> i16 {
        // get step size from last time's index before updating
        let step_size = VOX_STEP_TABLE[self.step_index as usize];
        // use in_nibble to index into adpcm step table; add to step
        let mut step_index = self.step_index + ADPCM_INDEX_TABLE[*in_nibble as usize];
        // clamp index to size of step table — for next time
        step_index = i16::clamp(step_index, 0, 48);
        
        // sign is 4th bit; magnitude is 3 LSBs
        let sign = in_nibble & 0b1000;
        let magnitude = in_nibble & 0b0111;
        // magnitude; after * 2 and >> 3, equivalent to scale of 3 bits in (ss(n)*B2)+(ss(n)/2*B1)+(ss(n)/4*BO) from pseudocode
        // + 1; after >> 3, corresponds to ss(n)/8 from pseudocode — bit always multiplies step, regardless of 3 magnitude bits on/off
        let mut delta = ((2 * (magnitude as i16) + 1) * step_size) >> 3;
        // last time's value
        let mut predictor = self.predictor;
        // if sign bit (4th one) is set, value is negative
        if sign != 0 { delta *= -1; }
        predictor += delta;
        
        // clamp output between 12-bit signed min/max value
        self.predictor = i16::clamp(predictor, -i16::pow(2, 11), i16::pow(2, 11) - 1);
        // update for next time through; ss(n+1) into z-1 from block diagram
        self.step_index = step_index;
        // return updated predictor, which is also saved for next time; X(n) into z-1
        // scale from 12-bit to 16-bit; 16 = 2^4, or 4 extra bits
        self.predictor * 16
    }
}

Returning to the main code file and picking up from last time, here is how we use our new code. We've opened a file as a Vec<u8>, and we're storing the results of a match expression in a Vec<f64> (since the filtering will work better with floats). In the “arm” of the match expression for the VOX format, we iterate over the imported Vec<u8>, and for each byte, we split the byte into two 4-bit “nibbles,” iterating over [chunk >> 4, chunk & 0b1111].iter() and running vox_state.vox_decode() for each nibble.

In the diagram below from the spec, note that the highest 4 bits in a byte come first, so our first nibble is chunk >> 4, which brings those bits down into the lowest 4 positions. chunk & 0b1111 keeps only the 4 lowest bits, giving us the second nibble in the byte.

A diagram of a byte, showing “sample N” as the highest 4 bits and ”sample N+1” as the lower 4 bits

VOX byte layout from the Dialogic Corporation

We push the decoded values into our output Vec<f64>, ready for the next stage, which is filtering and writing to .WAV (see previous post for a discussion of that code).

main.rs
// import file as Vec<u8>
let data: Vec<u8> = fs::read(entry.path()).expect("Error reading file");
// need to filter as f64 anyway, so best to do in match arms here for consistency
let converted_data: Vec<f64> = match args.format {
    // ...
    SampleFormat::Vox => {
        let mut output: Vec<f64> = Vec::new();
        let mut vox_state = vox::VoxState::new();
        data
            .iter()
            // using for_each and...
            .for_each(|chunk| {
                // start with highest 4 bits (by right-shifting)
                // then & 0x1111 then selects lowest 4
                for nibble in [chunk >> 4, chunk & 0b1111].iter() {
                    output.push(vox_state.vox_decode(nibble) as f64);
                }
            });
        // ...returning outside of pipeline since we need to handle *two* nibbles per element in iter()
        output
    }
};

Before we discuss the challenges, just for funsies I put the compiled binary for my databending tool back through the tool (using our new VOX codec). Here's a segment of the result:

Challenges

At this point, our code works! There were a few things in the VOX spec that tripped me up though, so let's talk about how I got my code working. First, when my attempt at implementing the spec gave me trouble, I looked at the source for FFmpeg, which Audacity uses — specifically the function adpcm_ima_oki_expand_nibble() in libavcodec/adpcm.c, line 553. [5] This is where I got the line let mut delta = ((2 * (magnitude as i16) + 1) * step_size) >> 3; from vox.rs above.

Let's consider the line of pseudocode d(n) = (ss(n)*B2)+(ss(n)/2*B1)+(ss(n)/4*BO)+(ss(n)/8) — this is how we combine the incoming magnitude and the step size to get the difference between the current and previous samples. B2, B1, and B0 are the three magnitude bits from the incoming nibble. If, for example, B1 is zero, ss(n)/2*B1 will divide by zero. Not only will we need to check whether each bit is zero or not, but division is more costly than the other arithmetic operations. However, we can think about this another way.

With (ss(n)*B2)+(ss(n)/2*B1)+(ss(n)/4*BO)+(ss(n)/8), if we leave out the multiplication by ss(n) for the time being, we have 1 or 0 times 1; 1 or 0 times 1/2; 1 or 0 times 1/4; and 1 times 1/8. That's just the ones place and first 3 binary floating point places. If we shift those values 3 places left, we have no more fractions/division, and if we shift the incoming 3 magnitude bits 1 place left (i.e., multiply by 2) and add one, our magnitude and the previous values we shifted line up the same way as before. We can multiply what we have now by the step size, and >> 3 “undoes” the left shift we did to get rid of the fraction. Thus ((2 * (magnitude as i16) + 1) * step_size) >> 3 is equivalent to (ss(n)*B2)+(ss(n)/2*B1)+(ss(n)/4*BO)+(ss(n)/8), but we don't need to work around dividing by zero, and things are a bit faster to boot.

Looking Forward

Lately I've been enjoying windytan (Oona Räisänen)'s blog — a “blog about sound & signals” where she discusses a variety of telecommunications encoding formats, both in terms of their sound and decoding them. I got an RTL-SDR software-defined radio dongle back in 2020, and greatly enjoyed tracking down and decoding interesting signals. Now that I have more programming skills, I think I'll do more discussion of and coding with different telecommunications formats — both for radio, and for telephony, as I did today.

One thing that @EveHasWords mentioned recently and that I also saw on windytan's blog is using cassette tapes to store digital data such as software or games. The general idea is that you modulate a tone to encode digital data, and then record that as audio on a regular cassette tape — this person did it with an Arduino and Python, so that could be a good starting point for a fun project.

You can follow the RSS feeds for this blog to see any future updates on such projects — hope to see you then!


  1. L. Tan and J. Jiang, Digital Signal Processing: Fundamentals and Applications. Academic Press, 2018, pp. 486–496. ↩︎

  2. Dialogic Corporation, Dialogic ADPCM Algorithm, 1988. [Online]. Available: https://people.cs.ksu.edu/~tim/vox/dialogic_adpcm.pdf. ↩︎

  3. Here's a link to the Audacity forum explaining where to find the settings to do this. ↩︎

  4. This notation comes from the idea of the Z-transform. ↩︎

  5. FFmpeg, libavcodec/adpcm.c. [Online]. Available: https://ffmpeg.org/doxygen/7.0/adpcm_8c_source.html#l00553. ↩︎

Reply via email :: Subscribe to my other feeds

Running 11ty and PHP concurrently

Now that I have several sections of this site served dynamically using PHP I've finally put together a single command to work on the site locally.

This command depends on the concurrently to run multiple commands at the same time (hence the name). I've added two commands to accomplish this:

"watch": "eleventy --watch",

Which watches and updates files without running 11ty's dev server and:

"dev": "concurrently -k -n 11ty,PHP -c cyan,magenta \"npm run watch\" \"npm run php\"",

Which — you guessed it — runs both commands together (and also allows them to be quit together). 11ty output is displayed alongside [11ty] in cyan and PHP output is displayed alongside [PHP] in magenta. One consistent command for local dev, with a single server (PHP's) pointed at the built output.

❌