Joe Zack was on a brief holiday so Allen and Michael took over the helm for an episode. What would a new episode be without a little something regarding AI, some more love for Kotlin, and a number of excellent tips throughout (as well as at the end of) the episode.
Reviews* iTunes: ivan.kuchin
NewsAtlanta Dev Con
September 7th, 2024
https://www.atldevcon.com/
Topics* People trying to remove their answers from StackOverflow to not allow OpenAI to use their answers without permission/recognition?
https://www.tomshardware.com/tech-industry/artificial-intelligence/stack-overflow-bans-users-en-masse-for-rebelling-against-openai-partnership-users-banned-for-deleting-answers-to-prevent-them-being-used-to-train-chatgpt
* Obfuscate data dumps with PostgreSQL
https://github.com/GreenmaskIO/greenmask/
* Kotlin Coroutines
https://kotlinlang.org/docs/coroutines-overview.html
https://kotlinlang.org/docs/coroutine-context-and-dispatchers.html#dispatchers-and-threads
* Reminded Outlaw of the Cloudflare Workers we mentioned a while back
https://developers.cloudflare.com/workers/
Please leave us a review!
https://www.codingblocks.net/review
Random Bits* Tesla Las Vegas Loop
https://www.lvcva.com/vegas-loop/
* What actually happens when you overfill the oil in a vehicle?
https://www.youtube.com/watch?v=VaTbfvzNbxQ
* Fisker Ocean totalled after a $900 door ding…really
https://jalopnik.com/fisker-ocean-totaled-over-910-door-ding-after-insurer-1851451187
* A Ford Mustang painted with the blackest black paint available
https://youtu.be/Ll27OkWuE1g
Tip of the WeekDocker Blog is pretty excellent
Car Research
Utilizing wood sheet goods by utilizing cut lists
Docker’s chicken-n-egg problem
Download the file using the server suggested name With wget …
--content-disposition
https://man7.org/linux/man-pages/man1/wget.1.html
Wth curl …
-JO
-J, –remote-header-name
-O, –remote-name
https://curl.se/docs/manpage.html#-J
In this episode Joe introduces us to more security items you should be aware of in the world of CWE’s, Michael bends to the will of Joe and Allen in his favorite portion of the show, and Allen pontificates on the time spent setting up IDE’s and environments.
Reviews – Thank You!* iTunes: Vlad Bezden, Mom in VA, Make1977 * Spotify: chutney3000, Xuraith
Upcoming Events* Atlanta Dev Con
September 7th, 2024
https://www.atldevcon.com/
TopicsOpen Telemetry* The backend matters
https://opentelemetry.io/ecosystem/integrations/
+ Some backends are more fully featured than others
- Splunk Trace Analyzer
https://docs.splunk.com/observability/en/apm/apm-spans-traces/trace-analyzer.html
- Google Trace Explorer
https://cloud.google.com/trace/docs/finding-traces
- Azure OTel Guide
https://learn.microsoft.com/en-us/azure/azure-monitor/app/opentelemetry-enable?tabs=aspnetcore
- AWS OTel Information
https://aws.amazon.com/otel/
* The processor can decouple you
https://opentelemetry.io/docs/collector/configuration/#processors
CNCF – Cloud Native Computing Foundation* If you’re working in a cloud environment, you should know the projects here
https://www.cncf.io/projects/
* Super cool visualization tool for the projects
https://landscape.cncf.io/
Llama 3 – the next version of Meta’s AI engine* “Now available with both 8B and 70B pretrained and instruction-tuned versions to support a wide range of applications”
https://llama.meta.com/llama3/
Environmental concerns over the processing required for AI* Power requirements for processing some of the LLM’s
https://www.nnlabs.org/power-requirements-of-large-language-models/
* The Microsoft underwater datacenter
https://news.microsoft.com/source/features/sustainability/project-natick-underwater-datacenter/
Setting up IDE’s and environments* IDE vs old school debugging * Setup can require a significant amount of time + Is it worth it? + What if you’re just working on a bug?
Security Resources* What’s the difference between CWE and OWASP?
* CWE (Common Weakness Enumeration) is a community-developed list of common software and hardware weaknesses.
+ It’s similar to OWASP, but older (1999 vs 2001) and more general – including non web apps and (more recently) hardware
* The infamous “NVD” database links CVE (Common Vulnerabilities and Exposures) to CWE
https://nvd.nist.gov/vuln/detail/CVE-2021-44228
https://cwe.mitre.org/top25/archive/2023/2023_trends.html
TipsPre-warning – probably wouldn’t recommend installing this!
Saw a cool Windows utility called “Windrecorder” that records video and text from your desktop, and lets you rewind and search.
MacOS’s Spotlight is more powerful than you maybe knew
https://www.intego.com/mac-security-blog/spotlight-secrets-15-ways-to-use-spotlight-on-your-mac/
https://beebom.com/spotlight-tips-tricks/
If you’re grep command isn’t working like you thought it should, you might be a victim of content getting kicked out of the buffer
grep --line-buffered
iOS – get text from images
https://support.apple.com/guide/iphone/use-live-text-iphcf0b71b0e/ios
Picture, if you will, a nondescript office space, where time seems to stand still as programmers gather around a water cooler. Here, in the twilight of the workday, they exchange eerie tales of programming glitches, security breaches, and asynchronous calls. Welcome to the Programming Zone, where reality blurs and (silent) keystrokes echo in the depths of the unknown. Also, Allen is ready to boom, Outlaw is not happy about these category choices, and Joe takes the easy (but not longest) road.
The full show notes are available on the website at https://www.codingblocks.net/episode232
News* Thanks for the reviews! Want to help us out? Leave a review! (/reviews) + ivan.kuchin, Nick Brooker, Szymon, JT, Scott Harden * Text replacements are tricky, replacing links to “twitter.com” with “x.com” enabled a wave of domain spoofing attacks. (arstechnica.com)
Around the Water Cooler* Ktor is an asynchronous web framework based on Kotlin, but can it compete with Spring? (ktor.io) * docker init is a great tool for getting started, but how much can you expect from a scaffolding tool? (docs.docker.com) * Logging, how much is too much? What if we could go back in time? * Boomer Hour: Let’s talk about GChat UX * What do you know about browser extensions? + ViolentMonkey is a modern remake of the infamous GreaseMonkey, but can you trust it? (chromewebstore.google.com) * Can you trust any extensions? + XZ Tools backdown timeline, wow (arstechnica.com) * Bookmarklets still rock! (freecodecamp.org) * Silent Key Tester for mechanical keyboards, you can specify a wide variety of switches (thockking.com) + Joe’s preferences: - Durock Shrimp Silent T1 - Tactile Gazzew Boba U4 Silent - Liner Kailh Silent Brown - Linear Lichicx Lucy Silent - Linear WS Wuque Studio Gray Silent - Tactile WS Wuque Studio - White Silent – Linear - Tactile Kailh Silent Pink - Linear Cherry MX Silent Red
Tip of the Week* Feeling nostalgic for the original GameBoy or GameBoy Color? GBStudio is a one-stop shop for making games, it’s open-source and fully featured. You can do the art, music, and programming all in one tool and it’s thoughtfully laid out and well-documented. Bonus…you games will work in GameBoy emulators AND you can even produce your own working physical copies. (If you don’t want the high-level tools you can go old skool with “GBDK” too) (gbstudio.dev) * If you’re going to do something, why not script it? If you’re going to script it, save it for next time! * Dave’s Garage is a YouTube channel that does deep dives into Windows internals, cool electronics projects, and everything in between! (YouTube)
In this episode, Allen, Joe and Michael finally make it back to record together! Allen revisits the basics, Michael kicks off boomer hour nicely, and JZ let’s us know that the dream of an 8-bit looking keyboard is not dead.
News* An update on the networking redo at Allen’s house
+ The access panel that was mentioned
https://amzn.to/49lAXOq
Topics Data structures are still incredibly important in your day to day software development
* Changing “lookup table” type of data when your data stores are document databases or search engine type of storage
* A newly found 8-bit looking keyboard that may just be the ticket to Joe’s happiness
https://amzn.to/3J15ir2
* Code comments that are…not…great
https://www.reddit.com/r/ProgrammerHumor/comments/15qskcc/juniordevs/
* Frustrating code documentation that doesn’t really tell you anything
https://cloud.google.com/nodejs/docs/reference/container/latest/container/protos.google.container.v1.getoperationrequest
https://cloud.google.com/nodejs/docs/reference/container/latest/container/v1.clustermanagerclient#google_cloud_container_v1_ClusterManagerClient_getOperation_member_1*
* A resource from the past has come back to our attention – thanks Mikerg
https://devhints.io/
* What determines how much a data scientist earns?
https://jobs-in-data.com/salary/data-scientist-salary
+ Based on a 2022 Kaggle Machine Learning and Data Science survey
+ Country
+ Industry
+ Job title
+ Years coding experience
+ Years ML experience
Tips* Remember Carl Schweitzer from MS Dev Show? He’s got a new pod cast, The “Cloud Chat”, talking about cloud everything…like episode 1 about the aas’ of cloud computing!
https://podcasts.apple.com/us/podcast/cloudchat/id1734938265
* Joe has another music suggestion for you, this time it’s a new album by Four Tet. If you’re not familiar with Four Tet, it’s often described as “IDM” or intelligent dance music. It’s slower and more experimental than what you’d hear in a club though it still has those steady beats to help you get in the zone.
https://open.spotify.com/album/7mpTSR6E855VhdCeoPgpCF
https://music.apple.com/us/album/three/1729585296
* Sometimes Google’s GCP API’s don’t seem to tell the truth
* See what your helm-templates will render using this online tool
https://helm-playground.com
* Some useful Java JVM settings when working with containers
+ XX:+UseContainerSupport this one tells the container to use all the available resources – this way the JVM benefits from the CPU / Memory allocated to the container
+ XX:InitialRAMPercentage=80.0 this one tells the JVM to use 80% of the RAM for the initial heap size – this is based off the container memory LIMIT
+ XX:MaxRAMPercentage=80.0 this one tells the JVM to use 80% of the RAM for the MAX heap size – this is based off the container memory LIMIT
+ XX:MaxDirectMemorySize based off reading, if NOT SET, this should default to the same as the Max Heap Size – which is better than what we were doing previously – previously we had this set to 256m which is smaller than some of the larger files we get from the CDS and was causing OOM issues.
This time we are missing the “ocks”, but we hope you enjoy this off…ice topic chat about personalizing our workspaces. Also, Joe had to put a quarter in the jar, and Outlaw needs a cookie.
The full show notes are available on the website at https://www.codingblocks.net/episode230
NewsThank you for the review Szymon! Want to leave us a review?
Decorating your Home Office* Joe’s Uplift Desk Review * Mounting monitors, is there any other way? * To grommet or not to grommet? * How many keys do you want on your keyboard? * Wired vs Wireless * About that “fn” key… * Reddit for inspiration? * Office-Appropriate Art + Paintings + Prints / Silk Screens / Photography + Sculptures + Book Cases + There’s a story for Outlaw about this print: https://www.johndyerbaizley.com/product/four-horsemen-full-color-ap
Tip of the Week* If you have a car, you should consider getting a Mirror Dash Cam. It’s a front and rear camera system that replaces your rearview mirror with a touchscreen. Impress all your friends with your recording, zoom, night vision, parking assistance, GPS, and 24/7 recording and monitoring. (Amazon) * Be careful about exercising after you give blood, else you might end up needing it back! (redcrossblood.org )
The Cloud Nine Ergonomics Keyboard looks pretty nice…John Dyer Baizley does some really cool stuff, including artwork for some of our favorite bands
We are mixing it up on you again, no Outlaw this week, but we can offer you some talk of exotic databases. Also, Joe pronounces everything correctly and Allen leaves you with a riddle.
The full show notes are available on the website at https://www.codingblocks.net/episode229
News* Thanks for the reviews! + ivan.kuchin (has taken the lead!), Yoondoggy, cykoduck, nehoraigold + Want to help us out? Leave a review! (reviews)
Multivalue DBMS* Popular: 86. Adabas, 87. UniData/UniVerse, 147. JBase * Similar to RDBMS – store data in tables + Store multiple values to a particular record’s attribute - Some RDBMS’s can do this as well, BUT it’s typically an exception to the rule when you’d store an array on an attribute - In a MultiValue DBMS – that’s how you SHOULD do it - Part of the reason it’s done this way is these database systems are not optimized for JOINS + Looked at the Adabas and UniData sites – the primary selling points seem to be rapid application development / ease of learning and getting up to speed as well as data modeling that closely mirrors your application data structures * I BELIEVE it’s a schema on write (docs.rocketsoftware.com) * Supposed to be very performant as you access the data the way your application expects it * Per the docs, it’s easy to maintain (Wikipedia)
Spatial DBMS* Popular: 29. PostGIS, 59. Aerospike, 136. SpatiaLite * Provides the ability to efficiently store, modify, and query spatial data – data that appears in a geometrical space (maps, polygons, etc) * Generally have custom data types for storing the spatial data * Indices that allow for quick retrieval of spatial data about other spatial data * Also allow for performing spatial-specific operations on data, such as computing distances, merging or intersecting objects or even calculating areas * Geospatial data is a subset of spatial data – they represent places / spatial data on the Earth’s surface * Spatio-temporal data is another variation – spatial data combined with timestamps * PostGIS – basically a plugin for PostgreSQL that allows for storing of spatial data + Additionally supports raster data – data for things like weather and elevation + If you want to learn how to use it and understand the data and what’s stored (postgis.net) - Spatial data types are: point, line, polygon, and more…basically shapes - Rather than using b-tree indexes for sorting data for fast retrieval, spatial indexes that are bounding boxes – rectangles that identify what is contained within them * Typically accomplished with R-Tree and Quadtree implementations * RedFin – a real estate competitor to realtor.com and others, uses PostgreSQL / PostGIS * Quite a bit of software that supports OpenGIS so may be a good place to start if you’re interested in storing/querying spatial data
Event Stores* Popular: 178. EventStoreDB, 336. IBM DB2 Event Store, 338. NEventStore * Used for implementing the concept of Event Sourcing + Event Sourcing – an application/data store where the current state of an object is obtained by “replaying” all the events that got it to its current state - This contrasts with RDBMS’s in that relational typically store the current state of an object – historical state CAN be stored, but that’s an implementation detail that has to be implemented, such as temporal tables in SQL Server or “history tables” + Only support adding new events and querying the order of events - Not allowed to update or delete an event - For performance reasons, many Event Store databases support snapshots for holding materialized states at points in time * EventStoreDB – https://www.eventstore.com/eventstoredb + Defined as an “immutable log” + Features: guaranteed writes, concurrency model, granulated stream and stream APIs + Many client interfaces: .NET, Java, Go, Node, Rust, and Python + Runs on just about all OSes – Windows, Mac, Linux + Highly available – can run in a cluster + Optimistic concurrency checks that will return an error if a check fails + “Projections” allow you to generate new events based off “interesting” occurrences in your existing data + For example. You are looking for how many Twitter users said “happy” within 5 minutes of the word “foo coffee shop” and within 2 minutes of saying “London”. + Highly performant – 15k writes and 50k reads per second
Resources we like* Database Rankings (db-engines.com)
Tip of the Week* If your internet connection is good, but your cell phone service is bad then you might want to consider Ooma. Ooma sells devices that plug into your network or connect wireless and provide a phone number, and a phone jack so you can hook up an an old school home telephone. We’ve using it for about a week now with no problems and it’s been a breeze to set up. The devices range from $99 to $129 and there’s a monthly “premier” plan you can buy with nifty features like a secondary phone line, advanced call blocking, and call forwarding. (ooma.com) * Why use “git reset –hard” when you can “git stash -u” instead? Reset is destructive, but stashing keeps your changes just in case you need them. Because sometimes, your “sometimes” is now! + “git reset –hard”. + “git stash -u”
We have a different combination of the hosts for this episode where we continue the series on the types of database systems available and why you might choose one over another. Michael continues impressing by recalling everything we’ve ever said on our 500+ hours of podcasts, Allen enjoys learning about a database system he’d never come across, and Joe is loaded up and ready for his trek to Georgia, USA.
Reviews* iTunes: Calum55555 * Spotify: Ian Neethling, Ghostmerc, Xuraith * Audible: Wood2prog
NewsOrlando Code Camp
https://orlandocodecamp.com/
Object Oriented DBMS* Popular: InterSystems Cache, 92. InterSystems IRIS, 161. DB4o, 154. ObjectStore, 159. Actian NoSQL Database
* The idea was to store data in the database the way that it’s modeled in the application
https://stackoverflow.com/questions/9884407/what-is-the-difference-between-object-oriented-and-document-databases#:~:text=The big difference%2C that I,but they’re organized differently.
* Relationships and inheritance would also be modeled in the database
* Would be more performant because the data would be stored in the way the application would expect without using complex joins
* Fallen out of popularity with the availability of ORM’s for RDBMS
https://www.ionos.com/digitalguide/hosting/technical-matters/object-oriented-databases/
* From InterSystems IRIS info
+ Based on the ODMG (Object Database Management Group) standard with advanced features like multiple inheritance
+ ObjectScript and Python directly manipulate and read from the storage – objects can also be exposed in other languages like .NET, JavaScript, Java and C++
+ Can also be queried with SQL syntax
Wide Column Stores* Popular: 12. Cassandra, 26. HBase, 27. Azure Cosmos DB
* Also known as extensible record stores
https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf
* Can hold extremely large numbers of dynamic columns
+ How much is a large number – “a record can have billions of columns” – which is why they’re also described as two-dimensional key/value stores
* Schema on read
* Wide column stores should not be confused with columnar storage in RDBMS – the latter is an implementation detail inside a relational database system that imroves OLAP type of performance by storing data column by column rather than record by record
* Using Cassandra as the information – https://cassandra.apache.org/_/cassandra-basics.html
+ Hyper-horizontally scalable
- Prevents data loss due to hardware failures (if scaled)
+ Ability to tweak throughput of reads or writes in isolation
https://www.codingblocks.net/podcast/search-driven-apps/
+ It’s “distributed” manner means it runs on many nodes but it looks like a single point of entry
+ No real point of running a single node of Cassandra
+ “Masterless” architecture – every node in a cluster acts like every other node
https://www.codingblocks.net/podcast/designing-data-intensive-applications-secondary-indexes-rebalancing-routing/
+ In contrast with traditional RDMBS – can be scaled on low-cost, commodity hardware – don’t need super-high-end motherboards that support terrabytes of ram to scale
+ Linear scalability – every node you add gives you + n throughput
https://www.datastax.com/products/datastax-astra
+ Replication is handled by tweaking replication factors – ie how many times you want the data replicated in order to stay in a good state
+ Per query configurable consistency – how many nodes must acknowledge the read/write query before returning a success
Vector DBMS* Popular: 52. Kdb, 103. Pinecone, 139. Chroma
* A database system that specializes in storing vector embeddings and being able to retrieve them quickly
+ What is a vector embedding?
- https://www.pinecone.io/learn/vector-embeddings-for-developers/
- What is a vector? A mathematical structure with a size and a direction
* Think of it as a point in space (on a graph) with the direction being the arrow from (0,0,0) to the vector point
* They say for developers, it’s easier to think of vectors as an array of numbers
* When you look at the vectors in space, some will be floating by themselves while others might be clustered closely to each other
- Vectors are very useful in Machine Learning algorithms because CPUs and GPUs are very good at doing math
- Vector Embeddings is the process of converting virtually any data structure into vectors
- It’s not as simple as just a straight conversion
* You don’t want to lose the original data’s “meaning”
+ An example they used was comparing two sentences – you wouldn’t just compare the words, you want to compare if the two sentences had the same meaning
+ To keep the meaning and produce vectors with relationships that make sense, that requires embedding models
* Nowadays, many embedding models are created by passing large sets of “labeled” data to neural networks
https://en.wikipedia.org/wiki/Neural_network
+ Neural networks are trained using supervised learning (usually), they can also be self-supervised or unsupervised learning
- Using a supervised model, you pass in large sets of data as pairs of inputs and labeled outputs
- The values are transformed in each layer of the neural network
- With each training of the neural network, the activations at each layer are modified
- The goal is that eventually the neural network will be able to provide an output for any given input, even if it hasn’t seen that specific input before
+ The embedding model is essentially those layers of the neural network minus the last one that was labeling data – rather than getting labeled data you get a vector embedding
* They have a great visualization on the pinecone page showing the output of a word2vec embedding model that shows how words would appear in this 3d vectror space
* This is what an embedding model does – it can take inputs and know where to place them in “vector space”
+ Items placed closer together are more related, and further apart, less related
* Ok, so now we know what vector embeddings are, what can we do with them?
+ Semantic search – rather than having search engines be able to search for words that are similar to what you entered, they can now search for content with meaning similar to what you searched for
+ Question answering applications
+ Audio search
* Check out the page of sample applications – https://docs.pinecone.io/page/examples
Resources* Primary resource we used for these database rankings
https://db-engines.com/en/ranking
* Some nice ways to learn about Machine Learning in an approachable way
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
Tips of the Week* docker init – let AI help you generate a better Dockerfile
https://medium.com/@akhilesh-mishra/you-should-stop-writing-dockerfiles-today-do-this-instead-3cd8a44cb8b0
* epoch converter has code samples!!!
https://www.epochconverter.com/
* Add a someone you trust as an Account Recovery account
https://support.apple.com/en-us/HT212513
https://support.apple.com/en-us/HT204921
* Lastpass’s Emergency Access
https://www.lastpass.com/features/emergency-access
You asked, we listened! A request from one of our Slack channels was to go over the various types of databases and why you might choose one over another. Join us in another information filled episode where Joe won’t be attending the event he’s been promoting and Allen tries to keep his voice together for the entirety of the episode, and almost succeeded.
NewsReviews* iTunes: ivan.kuchin, MikeW717 * Spotify: Darren Pruitt, chutney3000
Upcoming Events* Orlando Code Camp – Conference is February 24th
https://orlandocodecamp.com
Miscellaneous* Kudos to Dell Support on their monitors * The Cat 8 journey will be beginning soon * Home offices – random desires
Database TypesPrimary resource we used
Some terminology we’ll be using
Relational DBMS* Popular – 1. Oracle, 2. mySQL, 3. Microsoft SQL Server, 4. PostgreSQL, 8. IBM DB2, 9. Snowflake, 11. Microsoft Access * Schema on write * Primary language / form of access is SQL * Schema is defined by named tables with named columns and specific data types * Data exists as rows in the table that conform to the columns/types that are defined in the schema * Scalability – typically vertical scaling (increasing available CPU/RAM) is the preferred way + Horizontal scaling with most RDBMS’s is generally complex and requires a lot of thought and effort - https://www.designgurus.io/blog/scaling-sql-databases * Can be very performant but requires knowledge on how to index and store data properly + Even with excellent design and indexing, performance can suffer as size of data grows * Some fun Instragram posts on scaling their databases + https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c + https://earthweb.com/how-many-pictures-are-on-instagram/
Key-value stores* Popular: 6. Redis, 15. Amazon Dynamo DB, 27. Azure Cosmos DB, 35. Memcached, 54. etcd
* Schema on read
* No real language – usually an API to put and get documents
* Depending on the key value store, complex data structures may be stored and ability to query in various ways
* Scalability – horizontally scalable – massively
* Very performant
* Many have built in extended functionality beyond looking up by a single key – for instance, Redis allows search engine type of filtering
* Why’s Hadoop not on the list?
https://db-engines.com/en/blog_post/16
Document Stores* Popular: 5. MongoDB, 15. Amazon DynamoDB, 17. Databricks, 27. Azure Cosmos DB, 34. Couchbase * Schema on read * DBMS specific querying – usually offer a SQL capability but often times is not the most powerful way to query the data * Documents do not need to conform to any schema + Multiple documents in the same collection can have completely different fields/properties, OR they have have the same properties with different data types + Documents can contain collections in fields or even nest other documents + Typically stores data in JSON like documents * Can be very performant but may require care to create proper indexes, manage connections, etc
Time Series DBMS* Popular: 28. InfluxDB, 50. Prometheus, 52. Kdb, 79. Graphite, 73. TimescaleDB * Schema on read * Has special features specifically tailored to time series data that isn’t quite as easy / performant in a regular RDMBS or Key/Value store + Things like querying instants, range vectors, complex joins on ranges, etc - https://prometheus.io/docs/prometheus/latest/querying/basics/ + Also have built in functions specific to the needs of time series data – things like rates, deltas, histograms, quantiles, etc - https://prometheus.io/docs/prometheus/latest/querying/functions/ * Scalability seems to vary – InfluxDB is set up for scaling via clusters with meta and data nodes, whereas Prometheus has a different federated approach + Scaling Prometheus – https://logz.io/blog/prometheus-architecture-at-scale/ + Scaling InfluxDB – https://www.influxdata.com/blog/influxdb-clustering/ * Very performant for querying time series related data + Obviously there’s always things to consider – such as histograms vs quantiles in Prometheus – client vs server side - https://prometheus.io/docs/practices/histograms/
Graph DBMS* Popular: 22. Neo4j, 27. Azure Comsos DB, 59. Aerospike, 75. Virtuoso, 85. ArangoDB * Schema on write (mostly) – not sure if all graph databases force labels and attributes to be consistent + https://neo4j.com/docs/getting-started/data-modeling/guide-data-modeling/ * Different in terms of functionality than other databases – graph databases store data in terms of nodes and edges + Edges are the relationships between the nodes * Great explanation on the Neo4j website – https://neo4j.com/docs/getting-started/data-modeling/guide-data-modeling/ * Use cases – https://neo4j.com/use-cases/ + Fraud and detection analysis - Financial Fraud Detection with Graph Data Science - Money Laundering Prevention with Neo4j - Why Intelligent Applications Need a Graph Database with Granular Security - Fraud Detection with Neo4j + Identity and access management + Network and IT operations + Real time recommendations * So why a graph database? Can’t you do this with an RDBMS and joins? + The friend of a friend scenario – a graph database can easily and performantly return relationships with 20 degrees of separation or more – try that in a SQL query and watch your mind and database engine melt - https://neo4j.com/videos/why-neo4j-3/ * Neo4j has built in scalability via sharding – https://neo4j.com/product/neo4j-graph-database/scalability/
Search engine* Popular: 7. Elasticsearch, 14. Splunk, 24. Solr, 40. OpenSearch, 58. MarkLogic * Extensions of NoSQL databases * Schema on read * Complex search expressions * Full text search * Stemming – reducing words to their root forms so that searches can be more accurate with similar word searches * Ranking and grouping of search results * Built for scalability * Incredibly performant for the use case * Not great with relationship data * Why choose over something like a relational or document database?
Resources* https://db-engines.com/en/ranking * https://db-engines.com/en/articles * All the DB vendor websites – so much good information * Designing Data Intensive Applications
Tips of the Episode* Hot tip for a multi-user document oriented distributed database that’s free, open source and you probably know how to use it already …
+ Bonus points for supporting history
+ The downsides…
- It’s slow at writing, and reading, and querying, and the syntax isn’t easy to learn…but other than that it’s great!
https://gitrows.com/ https://github.com/DavidBruant/github-as-a-database
* kubectl cp
https://kubernetes.io/docs/reference/kubectl/generated/kubectl_cp/
* Hardware – Navepoint Rack chassis hinge
https://navepoint.com/cabinet-accessories/wall-mount-hinge-adapter/
* Bonus: ksync – a kubernetes tool for syncing files across clusters or local environments but it does require setting up an agent in the cluster
https://github.com/ksync/ksync
* 14u DIY Desk
https://www.reddit.com/r/homelab/comments/rouh7m/my_14u_diy_desk_integrated_server_rack_is_finally/
This episode we are talking about keeping the internet interesting and making cool things by looking at PagedOut and Itch.io. Also, Allen won’t ever mark you down, Outlaw won’t ever give you up, and Joe took a note to say something about Barbie here but he can’t remember what it was.
The full show notes are available on the website at https://www.codingblocks.net/episode226
Reviews* Thanks for the reviews! + ineverwritereviews1337, ivan.kuchin * Want to leave us a review? https://www.codingblocks.net/review .
News* Orlando Code Camp Conference is February 24th (orlandocodecamp.com) * Wireless mic kit mentioned by Outlaw regarding the Shure system (shure.com) * New video from Allen: JZ’s tip from last episode – Obsidian Tips for Staying Organized (youtube)
Is Cat 8 Overkill?* No way! * Check out AliExpress to save some money (aliexpress.com) * Note for NAS building / Plex – 11 gen and newer Intels are your friend for transcoding (intel.com)
Merge commits* Thanks for the tip mikerg! * Some orgs are banning merge commits on larger repositories * Should you? (graphite.dev) * Git Rebase Visualized (atlassian.com) * Merge Commit Visualized (atlassian.com)
Paged Out – E-Zine* Paged Out is a free e-zine of interesting and important articles (pagedout.institute) * Thanks for the tip mikerg! * Some samples + AIleister Cryptley, a GPT-fueled sock puppeteer - A fake online persona that will generate content for you using ChatGPT * Beyond The Illusion – Breaking RSA Encryption + Encryption is basically just math – it’s not some magical black box + “Never roll your own crypto – it’s a recipe for problems!” * Keyboard hacking with QMK * Hardware Serial Cheat Sheet * BSOD colour change trick * Cold boot attack on Raspberry Pi * Can we get some love for the demoscene? * Best part…each issue comes with a wallpaper!
Fun Project Ideas* Want to get into gamedev or 3d modeling, or just like making cool stuff with your skills? * Why not use itch.io as inspiration? * See other cool games and tools that people make: https://itch.io/tools * A couple noteworthy tools + Kenney shape (itch.io) - Turn 2d images into 3d by adding depth - Export to several different formats - $3.99 + Asset Forge (itch.io) - Assemble simple shapes into more complex ones - Stretch and rotate - $19.95 US ($39.95 deluxe) + Tiled Sprite Map Editor (itch.io) - Rich feature set, nice integration with Game Dev Tools + Bfxr is a popular tool (which was an elaboration of another tool Sfxr) for generating sound effects (itch.io) - Somebody made a js version too, if you can believe that! (jsfxr.me) - Beeps, boops, blorps, flames + Rexpaint (itch.io) - An ASCII Art Editor…you just have to see it - Layers, Copy/Paste, Undo/Redo, Palette swaps, Zoom - Who needs pixels!?
Resources We Like* Kenney’s Game Dev Resources (kenney.nl) * What is the demoscene? (YouTube)
Tip of the Week* If you subscribe to Audible, don’t forget that they have a lot of “free” content available, such as dramatic space operas and the “Great Courses”
For example. “How to Listen to and Understand Great Music” is similar to a “Music Appreciation Course” you might take at uni. The author works through history, talking about the evolution of music and culture. It’s 36 hours, and that’s just ONE of the music courses available to you for “free” (once you subscribe) (audible.com)
* Visualize Git is an excellent tool for seeing what really happens when you run git commands (git-school.github.io)
* It’s easy to work with checkboxes in Markdown and Obsidian, it’s just - [ ] Don’t forget the dash or spaces!
* Did you know there is a Visual Studio Code plugin for converting Markdown to Jira markup syntax? (Code)
* Apple, Google, and the major password manager vendors have ways to set up emergency contacts. It’s very important that you have this setup for yourself, and your loved ones. When you need it, you really need it. (google.com)
It’s that time of year…the time we (reluctantly) look back at what we said we were going to do this past year and see if we actually did it. Then, we repeat history and set some goals we’ll likely look back and wish we’d accomplished this time next year. In addition, we continue with the antics we’re known for, Joe gets a little aggressive in Mental Blocks, Outlaw has finally nailed nouns (or so we thought), and Allen tries not to look back at 2023’s plans.
The full show notes are available on the website at https://www.codingblocks.net/episode225
ReviewsAgain, thank you so much for the heartfelt and funny reviews! And if you reading this and have thought, “I really should leave them a review”, we’ll make it easy! Just click https://www.codingblocks.net/review for some helpful links.
Upcoming Events* Orlando Code Camp Conference is February 24th
https://orlandocodecamp.com
Random ThoughtsContemplating replacing consumer mesh network with one of the following
This first one I found while editing the notes for the podcast – looks super promising
Alta Labs AP6 Pro – https://amzn.to/3HurKYZ
TP-Link Omada equipment – https://amzn.to/41Rxk0S
Ubiquiti Unifi – https://amzn.to/48LvkJN
Why? Better control what devices can talk to other devices on the network (VLAN’s, separate SSID’s, etc) – security and performance focused
Looking Back and Looking Forward* Allen
+ What was actually accomplished in 2023
- Fully embraced DevOps as a culture
- Kubernetes all the things
- Duplicate data…intentionally
+ Looking forward in 2024
- Way deeper into data streaming (maybe doing a talk on it…maybe making videos about preparing)
- More usage of AI’s – images, coding, questions in general
- More automation, less manual intervention
- Hopefully more YouTubing
The microphones Allen bought that will force his creative hand
https://amzn.to/48zCrVw
An alternative wireless setup for guitars:
https://amzn.to/3NRTcTQ
- Maybe attending more events, like MVP Summit
Please leave us a review! https://www.codingblocks.net/review
Resources we Like* The “I Workout” song: LMFAO – Sexy and I Know It (Lyrics) YouTube https://www.codingblocks.net/podcast/2023-resolutions/
* Minikube with Multi-Node setup
https://minikube.sigs.k8s.io/docs/tutorials/multi_node/
Tip of the Week* Tony Anderson is a music producer that specializes in minimalist ambient piano music. It’s really lush and inspiring, check it out!
https://open.spotify.com/artist/3aRscMJRah0QrvGE5rkvZl
https://music.apple.com/us/artist/tony-anderson/19063662
Tony Anderson’s studio walkthrough that Joe mentioned as well
https://www.youtube.com/watch?v=n13IqwJlYgg
* Using Podman + Kind = Lower CPU overhead than Docker
Podman (Docker replacement) – https://podman.io
Kind (Run Kubernetes Nodes as Pods with Docker or Podman) – https://kind.sigs.k8s.io
Want to run Kubernetes as close to a cloud implementation as possible on your mac? https://opencredo.com/blogs/building-the-best-kubernetes-test-cluster-on-macos/
mirrord – https://mirrord.dev/
* Be careful. But it’s so cool.
git pull --rebase=interactive origin trunk
https://git-scm.com/docs/git-rebase
This episode we are talking about the future of tech with the Gartner Top Strategic Technology Trends 2024. Also, Allen is looking into the crystal ball, Joe is getting lo, and Outlaw is getting into curling.
The full show notes for this episode are available at https://www.codingblocks.net/episode224.
News* Thank you for the reviews! justsomedudewritingareview, Stephan + You can find links to leave us reviews on the website (/reviews) * Orlando Code Camp is coming up February 24th, woo! (orlandocodecamp.com) * Make sure you read up on your next MacBook pro, if you want to maximize the performance then you are going to need to pay for it! * Reminder: Don’t install packages from the internet in your CICD pipeline! * You can find links to leave us reviews on the website (/reviews)
Gartner Top Strategic Technology Trends 2024No surprise, AI is a big topic – it looks like Gartner is suggesting the technologies and processes companies must follow to be successful using and incorporating AI
In this overview, Gartner has grouped these technologies into three different sections
Protect Your Investment* Be deliberate * Ensure that you’ve secured appropriate rights for deploying AI driven solutions
AI Trism – AI Trust, Risk and Security Management* AI model governance + Trustworthiness + Fairness + Reliability + Robustness + Transparency + Data protection * Gartner Prediction – By 2026, companies that incorporate AI Trism controls will improve decision-making by reducing faulty and invalid information by 80%
Why is AI Trism Trending?* Largely, those who have AI Trism controls in place move more to production, achieve more value, and have higher precision in their modeling * Enhance bias control decisions * Model explainability
How to get started with AI Trism?* Set up a task force to manage the efforts * Work across the organization to share tools and best practices * Define acceptable use policies and set up a system to review and approve access to AI models
Continuous Threat Exposure Management – CTEM* Systemic approach to continuously adjust cybersecurity priorities
* Gartner prediction – By 2026, companies invested in CTEM will reduce security breaches by 2/3 (statista.com)
* Aligns exposure assessment with specific projects or critical threat vectors (fortinet.com)
* Both patchable and unpatchable exposures will be addressed
* Business can test the effectiveness of their security controls against the attacker’s view
+ “Expected outcomes from tactical and technical response
are shifted to evidence-based security optimizations supported
by improved cross-team mobilization.”
How to get started?* Integrate CTEM with risk awareness and management programs * Improve the prioritization of finding vulnerabilities through validation techniques * Embrace cybersecurity validation technologies (cybersecurityvalidation.com) + “security validation is a process or a technology that validates assumptions made about the actual security posture of a given environment, structure, or infrastructure” + Sustainable Technology Framework + Solutions for enabling social, environmental and governance outcomes for long term ecological balance and human rights + Gartner prediction – by 2027, 25% of CIO’s will have compensation that’s linked to their sustainable technology impact + Why trending? + Environmental technologies help deal with risks in the natural world + Social technologies help with human rights + Governance technologies strengthen business conduct + Sustainable technologies provide insights for improving overall performance + How to get started? + Select technologies that help drive sustainability + Have an ethics board involved when developing the roadmap (gartner.com) + Use the Gartner “Hype Cycle for Sustainability 2023” – helps identify well-established vs leading-edge technologies for enterprise sustainability (gartner.com)
Resources We Like* Gartner Top 10 Strategic Technology Trends for 2024 (gartner.com) * “Where Online Returns Really End Up And What Amazon Is Doing About It” (YouTube)
Tip of the Week* Lofi Girl is a YouTube channel that plays lo-fi hip-hop beats, with relaxing minimalistic animations. The people behind Lo-Fi Girl also released a new channel featuring a Synthwave (80’s influenced mid-tempo electro music) Boy. Same type of thing, but Synthwave music. (youtube.com) * If you are interested in streaming technologies and/or Apache Pinot then you should check out the Real-Time Analytics podcast by Tim Berglund (rta.buzzsprout.com) * Are you having runtime issues with your Docker container? Why not run it, and poke around? (curl.se)
It’s that time of year again when the three of us reflect on the things we’ve bought and loved, or the things we want to get…and want to love…So, come join us in this episode for our usual amount of fun while seeing if there’s anything that might make your shopping list! A small note – we forgot to get this episode out before Black Friday but we’re releasing a day early so you can at least make Cyber Monday! And who are we joking nowadays? Black Friday seems to run from Nov 1 to Nov 30. Remember, if you’re going to do some shopping, please do use our links as they help the show out – you’ll pay the same as if you went directly to the sites but we’ll make a few pennies for showing you the way! Happy Holidays and shopping to all!
The full show notes for this episode are available at https://www.codingblocks.net/episode223.
NewsThank you for the reviews!* iTunes: TUXCoon * Spotify: Frederik Laursen, Volkmar Rigo, OrbWizard
Upcoming EventsOrlando Code Camp call for speakers still open! Event is February 24th, 2024. https://orlandocodecamp.com/
Time to ShopFor anyone new to our shopping lists. There are some things that are absolutely every-day developer focused, but then we throw in things that bring us joy regardless of the relationship to life as a developer. Hopefully you enjoy what we’ve shared this year and as always, if you use the links below it’s greatly appreciated as it helps the show out with no cost to you!
Joe’s List
| Price | Description | | Biohacking…kinda | | $97.46 | Withings BPM | | | $53.00 | (Alternate) OMRON Silver Blood Pressure Monitor | | | Unavailable | (Alternate) Greater Goods Bluetooth Connected Bathroom Smart Scale | | | $199.95 | Withings Smart Scale | | | $79.95 | Withings Smart Contactless Thermometer | | | $117.99 | Withings Sleep Tracking pad | | | $99.95 | Fitbit Watch Charge 6 | | | $269.00 | Oura Ring | | | Tests | | $129.35 | Everlywell Food Sensitivity Test | | | $198 | Gut Health Test w/ Microbiome Wipe | | | $129 | 23 and Me Health and Ancestry Service | | | Subscriptions | | $230 / month | Soylent (yes, really) | | | $135 / month | Signos (continuous glucose monitoring for non-diabetics) | |
Allen’s List00
| Price | Description | | Quality of Life | | $449 | Kinesis Advantage360 | | | $75 + $75 | ACM Membership + Skills Bundle Add On | | | $30 / month | Linked In Premium | | | $120 to $180 | Logitech Combo Touch Keyboard / Case (iPad 11 Pro Linked – make sure you pick the right one for your device) | | | $13.98 / year | Vanity Domain Name w/ Namecheap | | | free! | Discord servers on investing | | | $60 to $120 | Capital Audio Fest | | | $400 | Elac Debut Connex | | | $23 | Apple Air Pod Pro 2 Comply Foam Tips | | | $119 | Wiim Pro | | | $175 | Wiim Pro Plus | | | $549 to 649 | Steam Deck OLED | | | $300 | Logitech GCloud Portable Gaming | | | $750 | Lenovo Legion Go Z1 Extreme | |
Michael’s List
| Price | Description | | For the home… | | $119.95 | Moen Flo Smart Water Leak Detector, Water Sensor Alarm for Home, 3-Pack | | | $22.28 | Moen Flo Smart Detector 6-Foot Sensing, Leak-Sensor Cable Only, White | | | $299.31 | Schlage Encode Plus WiFi Deadbolt Smart Lock | | | For the computer… | | $240.99 | SAMSUNG 990 PRO SSD 4TB PCIe 4.0 M.2 2280 Internal Solid State Hard Drive | | | $160.99 | Corsair SF Series, SF750, 750 Watt, SFX, 80+ Platinum Certified, Fully Modular Power Supply | | | $99.90 | CORSAIR Premium Individually Sleeved PSU Cables Pro Kit for Corsair PSUs | | | $59.99 | Lian Li Strimer Plus V2 24 Pin | | | For the bling… | | $15.99 | upHere 5V 3PIN Addressable RGB Graphics Card GPU Brace Support Video Card Sag Holder | | | $29.69 | Cooler Master MasterAccessory ARGB GPU Support Bracket | | | $17.99 | ARGB GPU Support Bracket | | | For your health… | | $79.99 | LifePro Hand Massager | | | $89.99 | VIVO Universal Treadmill Desk Riser, Height Adjustable Platform | | | $26.79 | BalanceFrom All Purpose 1/2-Inch Extra Thick High Density Anti-Tear Exercise Yoga Mat | | | $30.00 | The Tightwad Money Clip – Minimalist Slim Wallet | | | For your ears… | | $13.99 | Devinal Guitar to USB C Record Cable, Gold Plated 6.6 Feet | | | For use with… | | $9.99 | iStroboSoft | | | $49.99 to $99.99 | StroboSoft 2.0 | | | $113.60 | JIM DUNLOP Cry Baby Junior Wah Special Edition White | | | $12.95 | HexHider Magnetic 3mm Allen Wrench | | | $126.42 | Temple Audio DUO 24 Templeboard | | | $34.99 | Schaller Security Ruthenium Guitar Strap Locks | | | For the ride… | | $9.99 | Muc-Off No Puncture Hassle Tubeless Sealant | | | $12.99 | Vansky UV Flashlight Black Light | | | $39.95 | Wolf Front Axle for RockShox Suspension Forks and Fat Forks | | | $29.95 | Axle Handle Multi-tool | |
Tips of the Week Ever thought about getting into synthesizers? Synthesizers are a really cool way of making sounds and music that you can do right from your computer, with no accessories needed, right now!
Vital is a spectral warping semi-modular wavetable synth. The functionality is almost* 100% free (aside from text-to-wavetable) or you can pay up to a max of $80 to unlock discord access, support, unlimited text-to-wavetable, and surprisingly important…Presets!
Why are presets so important? Because working with synths is not easy, it’s really technical and there are a lot of ways to sound really, really bad.
Guess what else is cool, you can do a no-upcharge “rent to own” subscription for $5 a month where you can use the money you put in to buy preset packs..or eventually just save up the $80 to buy the pro version. Cool, right?
One last perk, Vital will totally run as VST/VST3 plugin like most other digital synthesizers..but it also has a standalone version which makes it really easy to get started without learning a whole bunch of other stuff before you get to what you want.
https://vital.audio/#getvital
* Send yourself a reminder in slack…
https://slack.com/help/articles/208423427-Set-a-reminder
* SOMETIMES debugging is just too much
What do we mean? Well, there was a situation where there was a debug.log() statement that had if statements wrapped around it – if logging level is debug, then debug.log(). That had Outlaw scratching his head – like this is totally unnecessary!!! There happened to be a good reason for it – the logging statement was doing some heavy string interpolation which means that statement was going to run all the necessary string formatting, then it was going to call the log method, and then once inside that log method it would all get thrown away, which points to potentially expensive operations that are being done for nothing. There may be times when relying on your log output level may not be enough to save you from some expensive tasks even though you might think otherwise.
Get a behind the scenes intro to some of the interesting conversations we have before we even get into the content. We’ll be jumping into the meat of this episode and looking at the specifics of tracing using OpenTelemetry. Before we do that though, we should probably find out what special 2-liter containers Outlaw uses that can somehow trap the bubbles for more than 24 hours after opening, find out if Joe is alone in liking flat carbonated drinks, and maybe Allen has fallen off his rocker for suggesting that the ONE THING that is metric in the USA should be converted to empirical measurements. Maybe leave a comment on the episode to join in the fun. To see the full show notes and/or leave a comment, head over to…
https://www.codingblocks.net/episode217
OpenTelemetry Diving InTracerProviderThis is a factory for Tracers
TracerThese are created by a trace provider and creates spans with more information about what’s happening with the request
Trace ExporterThese send traces to a consumer
Context Propagation* The heart of distributed tracing as it takes and correlates multiple spans
Context* The context contains information that allows spans to be correlated + Example is Service A calling Service B + Service A will have a trace id as well as its own span id + Service B will reuse that same trace id so the entire trace can be correlated, and Service B will also have its own unique span id but the parent id will point to the span id from Service A, so again this correlates the parent / child hierarchy of the spans
Propagation* This is what moves the context between services and processes
* Serializes / deserializes the context object and provides the relevant trace information to be carried from one service or process to the next
+ This is usually handled by instrumentation libraries but can be done manually via propagation APIs
* There are a number of formats that OpenTelemetry supports, but the default is the W3C’s TraceContext
https://www.w3.org/TR/trace-context/
+ The context objects are stored within a span
Spans* These represent a unit of work or an operation * Spans are the building blocks of traces * Information included in a span + Name + Parent span id (empty for a root span) + Span context + Attributes + Span events + Span links + Span status * Spans can be nested (parent/child) – any child span should be a sub-operation
Span Context* IMMUTABLE * Contains the following + Trace id + Unique span id + Trace flags – binary encoding containing information about the trace + Trace state – list of key-value pairs that can carry vendor-specific trace information + This data sits alongside distributed context and baggage
Attributes* Key/value pairs used to carry information about the operation it’s tracking
+ Example used – shopping cart may store the user id, item id added to cart and a cart id
* Keys must be non-null
* Values must be non-null strings, a boolean, floating point value, integer or an array of any of those
* There exists semantic attributes for well-known attributes that should be followed to help standardize across systems
https://opentelemetry.io/docs/specs/otel/trace/semantic_conventions/
+ Some examples of “general attributes”
- server.address
- server.port
+ Some “database” examples
- db.system
- db.connection_string
Span Events* A structured log message or annotation on a span, usually used for a meaningful point in time in the span’s duration + Example – two web browser scenarios - Tracking a page load – what you’d use a span for because it has a start and end time - Denoting when a page becomes interactive – this is a singular point in time in the life of that span above
Span Links* Allows you to associate a span with one or more other spans – indicating a causal relationship * An example is when you have a system that queues actions based off other actions in an asynchronous manner – you don’t know when the queued action will start so you create a span link that can correlate these spans over time when the asynchronous event occurs * Span links are optional but can be a good way to associate trace spans with one another
Span Status Status is attached to a span – one of three values + Unset – this is what you usually want to do + Ok – the back-end that processes the spans should set this for you as a final status + Error* – usually set whenever you handle an exception/error
Span Kind When a span is created, it is created as one of the following types + Client – represents synchronous, outgoing remote calls – this doesn’t mean that it’s not asynchronous technically, it just means it’s not being queued for later processing + Server – represents an incoming synchronous call such as an HTTP request + Internal – operations that do not cross a process boundary – things like instrumenting a function + Producer – represent the creation of a job that may be asynchronously processed later – things like messages sent to queues or handling of events + Consumer* – represents the processing of a job that was created by a producer and potentially starts well after the producer span has ended * These types provide a hint to the backend span processor as to how these spans should be assembled * Based on the OpenTelemetry specification + The parent span of a server span is usually a remote client span + The parent of a consumer span is always a producer span * If no type is provided, it is assumed to be an internal span
Resources we Likehttps://opentelemetry.io
https://opentelemetry.io/docs/demo/architecture/
https://opentelemetry.io/docs/demo/screenshots/
Tip of the Week Interested in tldr.sh but want a faster, single binary version? Check out “tealdeer” – same info, faster implimentation in Rust. Thanks for the tip, Aleksander Andrzejewski!
https://github.com/dbrgn/tealdeer
* Don’t waste your time hitting ctrl-alt-delete and then selecting the task manager. You gotta go fast, just use ctrl-shift-escape* next time and it’ll pop right up! Thanks for the tip in the comments Mark Crowley!
* Sea of Stars is a new, retro, video game that hearkens back to the golden era of ol’ Super Nintendo/Famicom games like Chronotrigger and Secret of Mana. No grindy battlepasses, weird mobile ads, or zany gpu requirements…just a good game. Bonus, it features new songs from Uasunori Mitsuda, who is well known for their music work with a bunch of classic RPG series including Final Fantasy, Chronotrigger, Xenogears. If any of those names make your heart jump out of your chest then you owe it to yourself to give it a look.
https://seaofstarsgame.co/
* Fuzzing tool for a kubernetes cluster
https://google.github.io/clusterfuzz/
* Foundational C# Certification – Partnership with freeCodeCamp / Microsoft
https://devblogs.microsoft.com/dotnet/announcing-foundational-csharp-certification/
* Leader board for Chat Bots
https://chat.lmsys.org/
* There’s an AI for that
https://theresanaiforthat.com/
* Use kubectl get deployments -o wide to see which image tag your containers are using.
https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#get
* Did you know you can exec into a pod without looking up the generated pod name? Yep! You can just exec into a deployment and it will pick a pod for you, works for logs too!
+ kubectl exec -it deploy/your-pod-name — bash
+ kubectl logs deploy/your-pod-name
In this episode, we’re talking all about OpenTelemetry. Also, Allen lays down some knowledge, Joe plays director and Outlaw stumps the chumps. See the full show notes at https://www.codingblocks.net/episode216 News What is OpenTelemetry? It’s all about Observability Reliability and Metrics Distributed Tracing To truly understand what distributed tracing is, there’s a few parts we have to […]
In this episode, Allen, Michael and Joe discuss the latest update with the Reddit saga, software for designing audio and reproducing analog sounds, an open-ended interview question and tips on how to be a great leader. Reviews Huge thank you for that! News Episode If you were going to create a web service / api […]
In this episode, we’re talking about the history of “man” pages, console apps, team leadership, and Artificial Intelligence liability. Also, Allen’s downloading the internet, Outlaw has fallen in love with the sound of a morrvair, and Joe says TUI like two hundred times as if it were a real word. See all the show notes […]
Last episode, it might have been said that you can become a senior engineer in just one short year. Our amazing slack community spoke up and had some thoughts on that as well…we revisit that, and what does senior even mean?! Join us for that and much more as Allen plays more with ChatGPT, Michael […]
In this episode, we’re talking about lessons learned and the lessons we still need to learn. Also, Michael shares some anti-monetization strategies, Allen wins by default, and Joe keeps it real 59/60 days a year! The full show notes for this episode are available at https://www.codingblocks.net/episode212. News Exceptions vs Errors in Java Question from Twitter: (thanks […]
We’re back after a brief break for a busy month of May, and we’re here to talk about some pretty cool stuff happening in the developer world. Outlaw took vacation and can remember nothing, Joe introduces us to Sherlocking, and Allen discovered what all the fuss was about with Chat GPT as a software developer. […]
In this sequence of sound, we compute Joe’s unexpected pleasure in commercial-viewing algorithms, Michael’s intricate process of slicing up the pizza, and Allen’s persistent request for more cheese data augmentation. Will you engage in this data streaming session?
The full show notes for this episode are available at https://www.codingblocks.net/episode210.
Resources we like* Stack Overflow is ChatGPT Casualty: Traffic Down 14% in March similarweb.com * Github Copilot Chat Leak Prompt: (news.ycombinator.com) * We’ve been talking about Co-Pilot for 2 years now? (episode 163) * Github vs Gitlab Rankings + Github Trending Repositories (github.com) + Gitlab Trending Repositories (gitlab.com) + Gitlab Number of Stars (gitlab.com) + Github ranking: gitstar-ranking.com * The 3 laws of Robotics (or is it 4!?) (wikipedia.org) * ML in Postgres with PostgresML (postgresml.org) * Must See Videos + Family Auto-Mart: I’ll see you there! (youtube) + AI-Generated Commercial: Pepperoni Hug Spot – Like family, but with more cheese (youtube) * How many services per team? (microservices.io) * AWS’s take on services per team (docs.aws.com) * SQL Server Machine Learning Service (learn.microsoft.com)
Tip of the week* MusicLM lets you create music from descriptive text, similar to Dalle-2. The output is a little strange, but could still potentially be really useful and inspiring with a little bit of effort. It’s in private beta now, as part of the “AI Test Kitchen” but you can sign up to join the waitlist today.
+ Sign up for the waitlist: (aitestkitchen.withgoogle.com)
+ Samples (google-research.github.io)
* You can easily compare query results In DataGrip, using the “Compare Data” button (it’s the button with two blue arrows) (jetbrains.com)
* IntelliJ now supports the entire IDE Zoom, great for…well…Zoom! View --> Appearance --> Zoom IDE (blog.jetbrains.com)
* Visual Studio Code Bookmarks (marketplace.visualstudio.com)
* Warped Kart Racers is a fun mobile game, kinda like Mario Kart but featuring characters from 20th Century Studios (apps.apple.com)
In this episode we talk about several things that have been on our mind. We find that Joe has been taken over by AI’s, Michael now understands our love of Kotlin, and Allen wants to know how to escape supporting code you wrote forever.
NewsVisited with Jamie Taylor from the .NET Core Podcast, Tabs N Spaces and Waffling Taylors
TopicsShould you own the work you created forever?
Wiki vs Readme
Should you take on the work that nobody else wants and “take one for the team”?
Test coverage
What’s a technology that’s reignited excitement in you?
javadoc != documentation
Resources we LikeKotlin documenation is excellent
Microsoft still doing excellent documentation as well
Tips of the WeekWarp AI is a (currently free) terminal for macOs that integrates an AI. It has several nice features such as those listed below, but one killer feature is that it has support for either a local or a cloud-based AI which helps navigate sticky legal, security, or company policies.
Thanks for the tip Dave Follett!
Recently mikerg suggested a really cool book in our #gamedev channel on the https://codingblocks.slack.com It has chapters on things like vectors, fractals, celluar automata and other cool type topics for game or other graphical programming. It’s available free online or you can order a physical print-on-demand copy. https://natureofcode.com/book/chapter-1-vectors/
Gson().toJson(mapOf( “key” to “value” )) https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/map-of.html
IntelliJ – Kotlin Bytecode
We’re doing a water cooler talk today. Also, Allen can tell you how not to leak secrets, Michael knows how to work a spreadsheet, and Joe has been replaced by an AGI.
The full show notes for this episode are available at https://www.codingblocks.net/episode208.
Topics* Want to score Vue.js London tickets? Tweet using both @CodingBlocks and #vuejs for a chance to win! (vue.js) * How do you decide which projects are worth trying to convert into a money-making endeavor? * Samsung ChatGPT sensitive information leaks (mashable.com) * U.S Military Documents Leaked To Minecraft Discord Server (kotaku.com) * Real-Time Analytics Podcast with Tim Berglund (podcasts.apple.com) * CodeWhisperer from Amazon (aws.amazon.com) * How much did GPT 3 cost? (pcguide.com) * How much did GPT 4 cost? (medium.com) * How much did Alpaca cost to train? (newatlas.com) * Have any experience with Twilio? It’s work! (twilio.com)
Resources we like* docker init is a tool (in beta) built into the latest Docker Desktop that you can use to get a leg up on your next project. It makes it easy to create docker files with best practices, as well as a docker-compose file to get you up and running. (docker.com)
* screen is an open-source powerful terminal multiplexer that allows users to create, manage, and switch between multiple terminal sessions, enabling seamless multitasking and persistent remote connections in a single window.
+ How to use gnu screen (linuxize.com)
+ tmux is a similar utility that some people prefer (github.com)
+ tmux vs screen (stackoverflow.com)
* The VIVO Universal Treadmill Desk Riser is an adjustable, ergonomic workspace solution designed to fit most treadmills, allowing users to seamlessly combine their work and exercise routines for a healthy, productive lifestyle. (amazon.com)
* The LifeSpan Fitness Under Desk Walking Treadmill is a compact, low-profile treadmill designed to fit under standing desks, enabling remote workers to maintain an active lifestyle by seamlessly integrating walking or light jogging into their daily work routine, promoting better health and increased productivity. (amazon.com)
* Kubernetes Network Policies are a set of rules that define how pods within a cluster can communicate with each other and with external resources, allowing administrators to enforce fine-grained access control and enhance the security of their containerized applications. (kubernetes.io)
We’ve got a new / old opening…Allen goes off / on script? Michael denies Joe the “swing” vote, and Joe is all in on AI assistance
Testing for concurrency issues is hard because it’s non-deterministic – basically you get unlucky due to the timing of things
Serializability* The problems we’ve been discussing the past few episodes have been around since the 1970’s * The answer is always – just use serializable isolation! * Serializable isolation is the strongest isolation + The database prevents ALL race conditions - Even if transactions run in parallel, they’re guaranteed to act/result the same as if they had run one at a time, one after another * If they’re so much better, why have/use weaker isolation levels?
Common Implementations* Executing the transactions serially, actually * Two phase locking – was one of the only real available solutions for several decades * Optimistic concurrency control – things like serializable snapshot isolation * We’ll be talking about these in terms of a single node database
Actual Serial Execution* The easiest way to get rid of race conditions is to really just run things one after another – no concurrency * This was only implemented for the first time around 2007 – prior the performance was too poor + This is truly a loop over transactions submitted to the db engine * What changed to make it possible? + RAM became cheap enough to store entire active datasets in memory – when this is done transactions can execute much faster as you don’t have to wait to load the data from disk + DB designers concluded that most OLTP transactions are usually short-lived and make a small number of reads and writes – so they can be run on a consistent snapshot using snapshot isolation outside of the serial execution loop * Used by VoltDB/HStore, Redis, and Datomic + Sometimes single-threaded systems can perform better than concurrent ones simply because there’s no locking - However, you’re bound by a single CPU core - Transactions will need to be set up differently than in typical concurrent systems
Encapsulating transactions in stored procedures* They talked about how the early implementations in db’s had the intention of making the entire flow part of the transaction – to book a flight, a person would be shown a list of flights, they’d choose the one they want, and it’d be stored + The problem with that approach is it can take a long time for that flow to be completed + For that reason, web-applications limit transactions to a single web / http request * There can still be situations where the transaction can occur with multiple interactions between the application and the database + From the application, query to see if the seat is still available on the flight…Ok, it is..now send another query to the db to update the seat to the customer…now query the db again to get any additional information - Doing it this way in a serial transaction db would be too slow because there’s too much network latency / waiting * In a single-threaded serial transaction, everything must be done all at once in a stored procedure + Keeping everything in memory and providing the stored proc everything it needs ensures the transaction is fast without waiting for any network or disk IO - Great picture in the book in Figure 7-9 that describes this
Pros and Cons of Stored Procedures* They’ve been part of the SQL standard since….1999 + Sometimes get a bad rep - Each vendor’s implementation has their own language - The book mentions that the SQL language hasn’t kept up with other programming languages and look/are archaic in comparison - It’s hard to manage code stored on the database server * Harder to debug * More difficult to keep in source control * More difficult to test * More difficult to gather metrics for monitoring - Because db’s are typically shared by many applications or a LOT of application code, non-performant stored procedure code can cause massive problems – usually worse problems than poorly written application code * These issues have and can be remedied + Modern serializable databases use regular programming languages - VoltDB – Java/Groovy - Datomic – Java/Clojure - Redis – Lua * When the database is in memory and the transactions are single threaded, stored procedures can actually be quite good + Because there’s no IO / networking overhead, transactions can occur quickly on a single thread * VoltDB also executes stored procedures for replication! + This means the stored procedures have to be deterministic – datetimes have to use deterministic apis
Partitioning* As mentioned before, doing serial transactions means you are limited to a single core of a single CPU + Read only transactions could occur on a separate thread using snapshot isolation + If you need high write throughput, the single thread on a single core could be a problematic bottleneck * This is where partitioning comes into play – if you can divvy your data up in a way that would allow transactions to stay within a single partition, then you’ll have the ability to linearly scale your CPU cores/threads to the number of partitions you have + If your transaction has to go across multiple partitions, then the stored procedure must ensure that each partition is handled appropriately to keep everything serialized properly + VoltDB can handle multiple partitions - Doing cross-partition writes is much slower than single partition writes – VoltDB reports 1k cross-partition writes per second * Determining if transactions can occur on a single partition takes a bit of planning + key-value data is likely a single partition transaction + data with multiple secondary indexes will likely require cross-partition transactions
Resources We Like* Designing Data Intensive Applicationshttps://www.codingblocks.net/get/designing-data-intensive-applications
Tips of the Episode* Copilot Labs is an optional extension for Github Copilot that adds some nifty new features to VSCode with Copilot. It installs as a new sidebar icon and has 4 major features:
+ Code explanation – What does this block of code do? Does the code I wrote do what I think it does?
+ Code translation – Not familiar with a language you’re reading? Convert it to one that you do!
+ IDE brushes – Modify existing code using a variety of brushes like you would in an art program – Add Types, Fix bugs, improve readability, resilience, add documentation and it looks like there’s a way to add custom branches!
+ Test generation – JS and TS only right nowhttps://githubnext.com/projects/copilot-labs/
* from AndrewEver wanted a Windows-like ALT+TAB experience on your Mac? Introducing AltTab for Mac
https://alt-tab-macos.netlify.app/
* A reason to use the terminal in Visual Studio Code
+ Any operations like a git status that show a list of files are easy to ctrl / cmd click to open directly in the editor
* Bitwarden as a LastPass replacement – Less than 1/3 the pricehttps://bitwarden.com/pricing/https://www.lastpass.com/pricing
What are lost updates, and what can we do about them? Maybe we don’t do anything and accept the write skew? Also, Allen has sharp ears, Outlaw’s gort blah spotterfiles, and Joe is just thinking about breakfast.
The full show notes for this episode are available at https://www.codingblocks.net/episode206.
News* Thank you for the amazing reviews! + iTunes: JomilyAnv * Want to help us out? Leave us a review.
Great book!Preventing Lost Updates* Last episode we talked about weak isolation, committed reads, and snapshot isolation * There is one major problem we didn’t discuss called “The Lost Update Problem” * Consider a read-modify-write transaction, now imagine two of them happening at the same time * Even with snapshot isolation, it’s possible that read can happen for transaction A before B, but the write for A happens first + Incrementing/Decrementing values (counters, bank accounts) + Updating complex values (JSON for example) + CMS updates that send the full page as an update * Solutions: + Atomic Writes – Some databases support atomic updates that effectively combine the read and write - Cursor Stability – locking the read object until the update is performed - Single Threading – Force all atomic operations to happen serially through a single thread + Explicit Locking - The application can be responsible for explicitly locking objects, placing responsibility in the devs hands - This makes sense in certain situations – imagine a multiplayer game where multiple players can move a shared object. It’s not enough to lock the data and then apply both updates in order since the shared game world can react. (ie: showing that the item is in use)
Detecting Lost Updates* Locks can be tricky, what if we reused the snapshot mechanism we discussed before? * We’re already keeping a record of the last transactionId to modify our data, and we know our current transactionId. What if we just failed any updates where our current transaction id was less than the transactionId of the last write to our data? * This allows for naive application code, but also gives you fewer options…retry or give up * Note: MySQL’s InnoDB’s Repeatable Read feature does not support this, so some argue it doesn’t qualify as snapshot isolation
What if you didn’t have transactions?* If you didn’t have transactions, let alone a snapshot number, you could get similar behavior by doing a compare-and-set * Example: update account set balance = 10 where balance = 9 and id = ABC * This works best in simple databases that support atomic updates, but not great with snapshot isolation * Note: it’s up to the application code to check that updates were successful – Updating 0 records is not an error
Conflict resolution and replication* We haven’t talked much about replicas lately, how do we handle lost updates when we have multiple copies of data on multiple nodes? * Compare-and-Set strategies and locking strategies assume a single up-to-date copy of the data….uh oh * The options are limited here, so the strategy is to accept the writes and have an application process to decide what to do + Merge: Some operations, like incrementing a counter, can be safely merged. Riak has special datatypes for these + Last Write Wins: This is a common solution. It’s simple but inaccurate. Also the most common solution.
Write Skew and Phantoms* Write skew – when a race condition occurs that allows writes to different records to take place at the same time that violates a state constraint + The example given in the book is the on-call doctor rotation + If one record had been modified after another record’s transaction had been completed, the race condition would not have taken place + write-skew is a generalization of the lost update problem * Preventing write-skew + Atomic single-object locks won’t work because there’s more than one object being updated + Snapshot isolation also doesn’t work in many implementations – SQL Server, PostgreSQL, Oracle, and MySQL won’t prevent write skew - Requires true serializable isolation + Most databases don’t allow you to create constraints on multiple objects but you may be able to work around this using triggers or materialized views as your constraint + They mention if you can’t use serializable isolation, your next best option may be to lock the rows for an update in a transaction meaning nothing else can access them while the transaction is open * Phantoms causing write skew + Pattern - The query for some business requirement – ie there’s more than one doctor on call - The application decides what to do with the results from the query - If the application decides to go forward with the change, then an INSERT, UPDATE, or DELETE operation will occur that would change the outcome of the previous step’s Application decision * They mention the steps could occur in different orders, for instance, you could do the write operation first and then check to make sure it didn’t violate the business constraint - In the case of checking for records that meet some condition, you could do a SELECT FOR UPDATE and lock those rows - In the case that you’re querying for a condition by checking on records to exist, if they don’t exist there’s nothing to lock, so the SELECT FOR UPDATE won’t work and you get a phantom write – a write in one transaction changes the search result of a query in another transaction * Snapshot isolation avoids phantoms in read-only queries, but can’t stop them in read-write transactions
Materializing conflicts* The problem we mentioned with phantom is there’d no record/object to lock because it doesn’t exist * What if you were to have a set of records that could be used for locking to alleviate the phantom writes? + Create records for every possible combination of conflicting events and only use those to lock when doing a write - “materializing conflicts” because you’re taking the phantom writes and turning them into lock records that will prevent those conflicts * This can be difficult and prone to errors trying to create all the combinations of locks AND this is a nasty leakage of your storage into your application + Should be a last resort
Resources We Like* The 12 Factor App and Google Cloud (cloud.google.com)
Tip of the Week* Docker’s Buildkit is their backend builder that replaces the “legacy” builder by adding new non-backward compatible functionality. The way you enable buildkit is a little awkward, either passing flags or setting variables as well as enabling the features per Dockerfile, but it’s worth it! One of the cool features is the “mount” flag that you can pass as part of a RUN statement to bring in files that are not persisted past that layer. This is great for efficiency and security. The “cache” type is great for utilizing Docker’s cache to save time in future builds. The “bind” type is nice for mounting files you only need temporarily. like source code in for a compiled language. The “secret” is great for temporarily bringing in environment variables without persisting them. Type “ssh” is similar to “secret”, but for sharing ssh keys. Finally “tmpfs” is similar to swap memory, using an in-memory file system that’s nice for temporarily storing data in primary memory as a file that doesn’t need to be persisted. (github.com)
* Did you know Google has a Google Cloud Architecture diagramming tool? It’s free and easy to use so give it a shot! (cloud.google.com)
* ChatGTP has an app for slack. It’s designed to deliver instant conversation summaries, research tools, and writing assistance. Is this the end of scrolling through hundreds of messages to catch up on whatever is happening? /chatgpt summarize (salesforce.com)
* Have you heard about ephemeral containers? It’s a convenient way to spin up temporary containers that let you inspect files in a pod and do other debugging activities. Great for, well, debugging! (kubernetes.io)
There’s this thing called ChatGPT you may have heard of. Is it the end for all software developers? Have we reached the epitome of mankind? Also, should you write your own or find a FOSS solution? That and much more as Allen gets redemption, Joe has a beautiful monologue, and Outlaw debates a monitor that is a thumb size larger than his current setup.
Read moreIf you’re in a podcast player and would prefer to read it on the web, follow this link:
https://www.codingblocks.net/episode205
News* Thank you for the amazing reviews! + iTunes: MalTheWarlock, Abdullah Nafees, BarnabusNutslap * Orlando Code Camp coming up Saturday March 25th + https://orlandocodecamp.com/
ChatGPT* Is this the beginning or the end of software development as we know it? * Are you using it for work? Does your work have an AI policy? * OpenAI has recently announced a whopping 90% price reduction on their ChatGPT and Whisper APi calls + $.002 per 1000 ChatGPT tokens + $.006 per minute to Whisper * You also get $5 in free credit in your first 3 months, so give it a shot! * https://openai.com/pricing
Roll Your Own vs FOSS* This probably isn’t the first time and it won’t be the last we ask the question – should you write your own version of something if there’s a good Free Open Source Software alternative out there?
Typed vs Untyped Languages Another topic that we’ve touched on over the years – which is better* and why? * Any considerations when working with teams of developers? * What are the pros and cons of each?
Cloud Pricing* If you’re spending a good amount of money in the cloud, you should probably talk to a sales rep for your given cloud and try to negotiate rates. You may be surprised how much you can save. And…you never know until you ask!
Outlaw has the Itch to get a new Monitor* Is it worth upgrading from a 34″ ultrawide to a 38″ ultrawide?
* What’s a good size for a 4k monitor?
+ Should you even get a 4k monitor?
* Should you go curved?
* Some references mentioned during the show
+ NVidia monitor search page:
https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/
+ LG 38″ ultrawide:
https://amzn.to/3SLeqUO
+ Rtings recommended gaming monitors:
https://www.rtings.com/monitor/reviews/best/by-usage/gaming
+ Games Radar best G-Sync monitors:
https://www.gamesradar.com/best-g-sync-monitors/
+ Acer Predator 38″ ultrawide:
https://amzn.to/3ZBDb80
+ Samsung Odyssey Neo G9 49″ Ultrawide:
https://amzn.to/3ZGMTpx
+ LG 49WQ95C-W 49″ Ultrawide:
https://amzn.to/3mk0TY5
Resources from this episode* How to jailbreak ChatGPT – List of Prompts: https://www.mlyearning.org/how-to-jailbreak-chatgpt/ * Magazine stops accepting submissions due to bots: https://nypost.com/2023/02/22/sci-fi-magazine-not-accepting-submissions-due-to-bots/ * Stack Overflow bans ChatGPT answers: https://www.theverge.com/2022/12/5/23493932/chatgpt-ai-generated-answers-temporarily-banned-stack-overflow-llms-dangers * ChatGPT detection tool already out: https://www.ctvnews.ca/sci-tech/cheaters-beware-chatgpt-maker-releases-ai-detection-tool-1.6253847
Tips of the Week* Did you know that the handy, dandy application jq is great for formatting json AND it’s also Turing complete? You can do full on programming inside jq to make changes – conditionals, variables, math, filtering, mapping…it’s Turing Complete!
https://stedolan.github.io/jq/
* Want to freshen up your space, but you just don’t have the vision? Give interiorai.com a chance, upload a picture of your room and give it a description. It works better than it should.
* You can sort your command line output when doing something like an ls
sort -k2 -b
* On macOS you can drag a non-fullscreen window to a fullscreen desktop
* When using the ls -l command in a terminal, that first numeric column shows the number of hard links to a file – meaning the number of names an inode has for that file
* Argument parser for Python 3 – makes parsing command line arguments a breeze and creates beautiful –help documentation to boot!
https://docs.python.org/3/library/argparse.html
* .NET has an equivalent parser we’ve mentioned in the past
https://www.nuget.org/packages/NuGet.CommandLine
Ever wonder how database backups work if new data is coming in while the backup is running? Hang with us while we talk about that, while Allen doesn’t stand a chance, Outlaw is in love, and Joe forgets his radio voice.
The full show notes for this episode are available at https://www.codingblocks.net/episode204.
News* Thanks for the great reviews! + Audible: Allison Williams * Orlando Code Camp 2023 is coming up on March 25th 2023 (orlandocodecamp.com)
The big, beautiful, boar book: Designing Data-Intensive ApplicationsWeak Isolation levels* If two transactions don’t touch the same data, they can be run in parallel. * Race conditions occur when two different processes are trying to modify and access or modify the same data at the same time. * Concurrency bugs are hard to find and test for – it usually comes down to unlucky timing. * Concurrency bugs can also be very difficult to understand because multiple parts of an application can be interacting with the database simultaneously and in unexpected ways. * Single-user interactions with a database are hard enough, and when you have multiple interactions happening simultaneously, it makes it all much more difficult. * Databases try to make it look like interactions happen one at a time for that very reason – to simplify the work for a developer. + Serializable isolation is a database guarantee that makes transactions look as if they happened serially – one after another. * Isolation is not that simple in reality + Serializable isolation comes at a performance cost - For this reason, most databases choose not to use it + Most databases use weaker isolation levels to protect against some concurrency issues but not all of them - These aren’t just theoretical bugs * Have resulted in large financial losses * Investigations by financial auditors * Customer data corruption * It’s been a common theme that “use a relational db if you’re doing financial transactions” – however, being that most db’s use weak isolation, that doesn’t guarantee things would have been perfect + For this reason – you should understand the various weak isolation (non-serial) levels
Read Committed* Two guarantees + When reading from the database, you will only see data that has been committed (no dirty reads) + When writing to the database, you will only overwrite data that has been committed (no dirty writes) - A second write is delayed until the first write’s transaction has been committed or aborted - This does not protect against the incrementing race condition – ie. two processes read a value at the same time, id = 1, then process one increments that and saves it, so the value is 2. Now, process two (which had 1 in memory from the read) does its increment, and stores the value as 2 as well – the value should have been three but because it had an old id, it didn’t increment properly * Avoiding this is discussed later in “Preventing Lost Updates”
Snapshot Isolation and Repeatable Read* Addresses read skew, an example of a non-repeatable read + The example given was a customer has two bank accounts, gets her balance of account A, and then some moments after a transfer of $100 from account 2 to account 1 the customer gets the balance for account B…customer has an old value from account 1 and a new value from account two, so it looks like the customer is missing $100. - This is acceptable in the read committed isolation as both account values were committed at the time of the reads. - How could this happen? Here’s a quick example… * Multiple queries were issued to get the different account values – get balance for account 1, get balance for account 2…behind the scenes, someone did a transfer from one account to the other + This is a very temporary state * There are situations where this can’t happen + Backups, analytics queries and integrity checks * Snapshot isolation is a typical solution to the problem + Transactions read from a consistent snapshot – meaning that a transaction is opened to read multiple values from a snapshot state of the database when the read first started - Very popular feature – supported by PostgreSQL, Oracle, SQL Server, MySQL with InnoDB
How is snapshot isolation accomplished?* Usually use write locks to stop dirty writes + Reads never block writes, and writes never block reads * Because there may be multiple transactions taking place at once, there may need to be multiple copies of database objects in play at once – this is referred to as multi-version concurrency control * The difference between read committed and snapshot isolation is read committed will use a different snapshot for each read whereas snapshot isolation will use the same snapshot for multiple reads within the transaction + They show an example of PostgreSQL’s implementation - Found this README in Postgres – https://github.com/postgres/postgres/blob/master/src/backend/storage/lmgr/README-SSI - The implementation basically uses some metadata fields on a row – created_by and deleted_by fields which contain transaction id’s * If you were to delete a row, that deleted_by field is updated, the row isn’t actually deleted at that point in time, but garbage collection will pick it up later and remove it physically from the table – at a time when it’s deemed that it will no longer be accessed * Updates are converted to creates and deletes (similar to what you’d see if you’re familiar with triggers in something like SQL Server)
Visibility for seeing a consistent snapshot* Consistent snapshots work by following these rules: + At the start of a transaction, a list of all transactions in progress are identified and ignored for any reads + Any writes made by transactions that were aborted are ignored + Any writes made by a newer transaction id are ignored + All other writes are available to read * Another way of thinking about it – an object is visible if + A transaction that had created an object had already been completed BEFORE the reader transaction starts + An object is either not marked for deletion OR if it is marked for deletion, the transaction had not been completed at the time the read transaction started * Because the database is never truly updating/deleting values in place, a number of running transactions can continue to function from snapshots of those objects with very small overhead
Snapshot isolation and indexes* Considering what we mentioned about the database storing multiple snapshots of state, how does this work with indexes? + One way would be to have the index point to all the transaction ids and have them filtered out, and when garbage collection happens, remove those entries from the index as well * Turns out, there are a lot of implementation details and performance thoughts to take into consideration depending on the database implementation + They gave an example of how things are done in PostgreSQL – if multiple versions of the same object can fit on the same page file, nothing is done to the index + Another approach used by CouchDB and others are to use an append-only / copy-on-write method that does not overwrite the existing page in the b-tree but rather creates a copy of the modified page. Then, a copy of each parent is made all the way up to the root page to point to the new pages. Any pages not impacted by the write operation don’t need to be touched - If that sounded like it was creating a new tree for every write in the append-only-B-tree, you’d be correct. By taking this approach, every individual root tree is a consistent shot of the database at that point in time * The benefit of this approach is you don’t have to filter anything because every root node has only the transactions that belong in that snapshot * You do need a background process to garbage collect and compact
Repeatable read and naming confusion* Unfortunately, snapshot isolation is known by many names + Oracle calls it serializable + PostgreSQL and MySQL call it repeatable read * Why is it not consistent naming? Because it’s not a part of SQL, but rather a part of System R’s 1975 isolation levels, it hadn’t been defined yet! + They HAD defined repeatable read which is very similar to snapshot isolation * Unfortunately in relational databases, “repeatable read” doesn’t define what guarantees they really provide * It was called out that there is a formal definition of a repeatable read but most implementations don’t meet the definition * “Nobody really knows what repeatable read means”
Resources We Like* Orlando Code Camp 2023 (orlandocodecamp.com) * The 12 Factor App and Google Cloud (cloud.google.com) * Martin Kleppmann’s website (martin.kleppmann.com) * Dave Foster Wallace – This is Water (youtube)
Tip of the Week* “Infinite Jest” is an interesting book, but it’s not a good audio experience. Get the physical book this time. (amazon.com)
* Tamara Makes Games is a game dev on YouTube that has a lot of videos oriented around isometric, city-builder, and strategy games similar to Factorio. It’s a cool niche, and it’s a nice balance of code and visuals that are a delight to watch. (youtube)
* There’s a lot you can do with iTerm2’s status bar, it’s highly configurable making it easy to show system resource monitors, shell information, and other miscellaneous items. (iterm2.com)
* Google has a tool named “container-diff” for analyzing and comparing container images. It can examine images along several different criteria, great for tracking down issues..like knowing why Docker isn’t caching a layer. (github.com)
* Xeol is a great utility for checking for end-of-life packages that you should get rid of. Thanks, gaprogman! (github)
* Using minikube? You can manage the space for the vm by using minikube ssh to shell into the machine and then prune your images with docker builder prune. Alternatively, you can use eval $(minikube docker-env) it to proxy docker to your local machine so you can just docker builder prune (and any other docker commands) without the shell. (minikube.sigs.k8s.io)
It’s time we learn about multi-object transactions as we continue our journey into Designing Data-Intensive Applications, while Allen didn’t specifically have that thought, Joe took a marketing class, and Michael promised he wouldn’t cry.
The full show notes for this episode are available at https://www.codingblocks.net/episode203.
News* Thanks for the reviews! + iTunes: Dom Bell 30, Tontonton2 * Want some swag? We got swag! (/swag) * Orlando Codecamp 2023 is coming up in March 25th 2023 (orlandocodecamp.com)
Single Object and Multi-Object OperationsBest book evarr! Multi-object transactions need to know which reads and writes are part of the same transaction.
+ In an RDBMS, this is typically handled by a unique transaction identifier managed by a transaction manager.
+ All statements between the BEGIN TRANSACTION and COMMIT TRANSACTION are part of that transaction.
* Many non-relational databases don’t have a way of grouping those statements together.
* Single object transactions must also be atomic and isolated.
* Reading values while in the process of writing updated values would yield really weird results.
+ It’s for this reason that nearly all databases must* support single object atomicity and isolation.
+ Atomicity is achievable with a log for crash recovery.
+ Isolation is achieved by locking the object to be written.
* Some databases use a more complex atomic setup, such as an incrementer, eliminating the need for a read, modify, write cycle.
* Another operation used is a compare and set.
* These types of operations are useful for ensuring good writes when multiple clients are attempting to write the same object concurrently.
* Transactions are more typically known for grouping multiple object writes into a single operational unit
Need for multi object transactions Many distributed databases / datastores don’t have transactions because they are difficult to implement across partitions. + This can also cause problems for high performance or availability needs. + But there is no technical reason distributed transactions are not possible. * The author poses the question in the book: “Do we even need transactions?”* + The short answer is, yes sometimes, such as: - Relational database systems where rows in tables link to rows in other tables, - In non-relational systems when data is denormalized for “object” reasons, those records need to be updated in a single shot, or - Indexes against tables in relational databases need to be updated at the same time as the underlying records in the tables. * These can be handled without database transactions, but error handling on the application side becomes much more difficult. + Lack of isolation can cause concurrency problems.
Handling errors and aborts* ACID transactions that fail are easily retry-able. * Some systems with leaderless replication follow the “best effort” basis. The database will do what it can, and if something fails in the middle, it’ll leave anything that was written, meaning it won’t undo anything it already finished. + This puts all the burden on the application to recover from an error or failure. * The book calls out developers saying that we only like to think about the happy path and not worry about what happens when something goes wrong. * The author also mentioned there are a number of ORM’s that don’t do transactions proud and rather than building in some retry functionality, if something goes wrong, it’ll just bubble an error up the stack, specifically calling out Rails ActiveRecord and Django. * Even ACID transactions aren’t necessarily perfect. + What if a transaction actually succeeded but the notification to the client got interrupted and now the application thinks it needs to try again, and MIGHT actually write a duplicate? + If an error is due to “overload”, basically a condition that will continue to error constantly, this could cause an unnecessary load of retries against the database. + Retrying may be pointless if there are network errors occurring. + Retrying something that will always yield an error is also pointless, such as a constraint violation. + There may be situations where your transactions trigger other actions, such as emails, SMS messages, etc. and in those situations you wouldn’t want to send new notifications every time you retry a transaction as it might generate a lot of noise. - When dealing with multiple systems such as the previous example, you may want to use something called a two-phase commit.
Tip of the Week Manything is an app that lets you use your old devices as security cameras. You install the app on your old phone or tablet, hit record, and configure motion detection. A much easier and cheaper option than ordering a camera! (apps.apple.com, play.google.com) * The Linux Foundation offers training and certifications. Many great training courses, some free, some paid. There’s a nice Introduction to Kubernetes* course you can try, and any money you do spend is going to a good place! (training.linuxfoundation.org) * Kubernetes has recommendations for common-labels. The labels are helpful and standardization makes it easier to write tooling and queries around them. (kubernetes.io) * Markdown Presentation for Visual Studio Code, thanks for the tip Nathan V! Marp lets you create slideshows from markdown in Visual Studio Code and helps you separate your content from the format. It looks great and it’s easy to version and re-use the data! (marketplace.visualstudio.com)
We decided to knock the dust off our copies of Designing Data-Intensive Applications to learn about transactions while Michael is full of solutions, Allen isn’t deterred by Cheater McCheaterton, and Joe realizes wurds iz hard.
The full show notes for this episode are available at https://www.codingblocks.net/episode202.
News* Thanks for the reviews! + iTunes: Jla115, Cuttin’ Corner Barbershop, mirgeee, JackUnver + Audible: Mr. William M. Davies * Want some swag? We got swag! (/swag)
It’s baaaaack!Chapter 7: Transactions Great statement from one of the creators of Google’s Spanner where the general idea is that it’s better to have transactions as an available feature even if it has performance issues and let developers decide if the performance is worth the tradeoff, rather than not having transactions and putting all that complexity on the developer. * Number of things that can go wrong during database interactions: + DB software or underlying hardware could fail during a write, + An application that uses the DB might crash in the middle of a series of operations, + Network problems could arise, + Multiple writes to the same records from multiple places causing race conditions, + Reads could happen to partially updated data which may not make sense, and/or + Race conditions between clients could cause weird problems. * “Reliable” systems can handle those situations and ensure they don’t cause catastrophic failures, but making a system “reliable” is a lot* of work. * Transactions are what have been used for decades to address those issues. + A transaction is a way to group all related reads and writes into a single operation. + Either a transaction as a whole completes successfully as a “commit” or fails as an “abort, rollback”. - If the transaction fails, the application can choose what to do, like retry for example. * In general, transactions make error handling much simpler for an application. + That was their purpose, to make developing against a database much simpler. * Not all applications need transactions. * In some cases, it makes sense not to use transactions for performance and/or availability reasons.
How do you know if you need a transaction?* What are the safety guarantees? * What are the costs of using them?
Concepts of a transaction Most relational DBs support transactions and some non-relational DBs support transactions. * The general idea of a transaction has been around mostly unchanged for over 40 years, originally introduced in IBM System R, the first relational database. * With the introduction of a lot of the NoSQL (non-relational) databases, transactions were left out. + In some NoSQL implementations, they redefined what a transaction meant with a weaker set of guarantees. - A popular belief was put out there that transactions meant anti-scalable. - Another popular belief was that to have a “serious” database, it had to have transactions. * The book calls out both as hyperbole. * The reality is there are tradeoffs for both having or not having transactions. * ACID is the acronym to describe the safety guarantees of databases and stands for Atomicity, Consistency, Isolation, and Durability. + Coined in 1983 by Theo Harder and Andreas Reuter. + The reality is that each database’s implementation of ACID may be very different. - Lots of ambiguity for what Isolation means. - Because ACID doesn’t specify the actual guarantees, it’s basically a marketing term. * Systems that don’t support ACID are often referred to as BASE, BAsically available, Soft state, and E*ventual consistency. + Even more vague than ACID! BASE, more or less, just means anything but ACID.
Atomicity Atomicity refers to something that can not be broken into smaller parts. + In terms of multi-threaded programming, this means you can only see the state of something before or after a complete operation and nothing in-between. + In the world of database and ACID, atomicity has nothing to do with concurrency. For instance, if multiple actions are trying to processes the same data, that’s covered under Isolation*. - Instead, ACID describes what should happen if there is a fault while performing multiple related writes. * For example, if a group of related writes are to be performed in an operation and there is some underlying error that occurs before the transaction of writes can be committed, then the operation is aborted and any writes that occurred during that operation must be undone, i.e. rolled back. * Without atomicity, it is difficult to know what part of the operation completed and what failed. * The benefit of the rollback is you don’t have to have any special logic in your application to figure out how to get back to the original state. You can just simply try again because the transaction took care of the cleanup for you. + This ability to get rid of any writes after an abort is basically what the atomicity is all about.
Consistency In ACID, consistency just means the database is in a good state. * But consistency is a property of the application as it’s what defines the invariants for its operations. + This means that you must write your application transactions properly to satisfy the invariants that have been defined. + The database can take care of certain invariants, such as foreign key constraints and uniqueness constraints, but otherwise it’s left up to the application to set up the transactions properly. + The book suggests that because the consistency is on the application’s shoulders, the C* shouldn’t be part of ACID.
Isolation Isolation is all about handling concurrency problems and race conditions. + The author provided an example of two clients trying to increment a single database counter concurrently, the value should have gone from 3 to 5, but only went to 4 because there was a race condition. * Isolation means that the transactions are isolated from each other so the previous example cannot happen. + The book doesn’t dive deep on various forms of isolation implementations here as they go deeper in later sections, however one that was brought up was treating every transaction as if it was a serial transaction. The problem with this is there is a rather severe performance hit for forcing everything serially. - The section that describes the additional isolation levels is “Weak Isolation Levels”*.
Durability Durability just means that once the database has committed a write, the data will not be forgotten, even if a database failure or hardware failure occurs. + This notion of durability typically means, in a single node database, that the data has been written to the drive, typically to a write-ahead log or similar implementation. - The write-ahead log ensures if there is any data corruption in the database, that it can be rebuilt, if necessary. * In a replicated database, durability means that the data has been written to the other nodes successfully. + The performance implication here is that for the database to guarantee that it’s durable, it must wait for those distributed writes to complete before committing the transaction. * PERFECT DURABILITY DOES NOT EXIST*. + If all your databases and backups somehow got destroyed at the same time, there’s absolutely nothing you could do.
Resources we Like Coding Blocks Jam ’23 (itch.io) * NewSQL (Wikipedia) * Visual Studio (Wikipedia) * Chrissy’s Court (IMDb) * Tracy Morgan gets in a crash right after buying a $2 million Bugatti (CNN) * IBM System R (Wikipedia) * Database Schema for Multiple Types of Products (Coding Blocks) * Uber’s Big Data Platform: 100+ Petabytes with Minute Latency (Uber) * How to store data for 1,000 years (BBC) * Longevity of Recordable CDs, DVDs and Blu-rays – Canadian Conservation Institute (CCI) Notes 19/1* (canada.ca)
Tip of the Week The Bad Plus is an instrumental band that makes amazing music that’s perfect for programming. It’s a little wild, and a little strange. Maybe like Radiohead, but a saxophone instead of Thom Yorke? Maybe? (YouTube)
+ Correction, Piano Rock will quickly become your new favorite channel. (YouTube)
* docker builder is a command prefix that you can use that specifically operates against the builder. For example you can prune the builder’s* cache without wiping out your local cache. It can really save your bacon if you’re working with a lot of images. (docs.docker.com)
* Ever want to convert YAML to JSON so you can see nesting issues easier? There’s a VSCode plugin for that! Search for hilleer.yaml-plus-json or find it on GitHub. (GitHub)
* Spotify has a great interface, but Apple Audio has lossless audio, sounds great, and pays artists more. Give it a shot! If you sign up for Apple One you can get Apple Music, Apple TV+, Apple Arcade, Apple News+ and a lot more for one unified price. (Apple)
The top 5 themes will carry on to the final round. Help us out by choosing your favorite five from the list below!
Don’t forget to sign up for the jam!What are your five favorite themes?* Assimilate * Strike the right balance * Magic Numbers * Do more with less * 3’s a crowd * What's that smell? * What's that smell? * Why are you running? * Eye in the sky * Oddly satisfying * The Gravity of it all! * QWERTY * Smaller than expected * The Fog of war * Unblocking * They’re out to get me * Life is good * Bottom of the ocean * Concat * Hey, you! Scram! VoteFinal theme will be chosen randomly from the winners of these polls!
The top 5 themes will carry on to the final round. Help us out by choosing your favorite five from the list below!
Don’t forget to sign up for the jam!What are your five favorite themes?* Everything is backwards * Stop hitting yourself * Building blocks * Sticky situations * Powerball * Shooting blocks * Can't go back * Puzzle games * The 3 amigos * Rock 'N Roll * You shouldn’t mix those * The run around * Jumping forwards in time * No visuals * Infinite loops * Trial by error * Global Wormin' * Pick 3 * Command Line Heroes * Constant surprise VoteFinal poll opens up on January 14th.
The top 5 themes will carry on to the final round. Help us out by choosing your favorite five from the list below!
Don’t forget to sign up for the jam!What are your five favorite themes?* Spaced out * Stronger with friends * Bronze * The Fast and Furry-ious * Beyond Thunderdome * Glitch * The 80's * Wild, Wild, West * One way out * YOU CAN’T TELL ME WHAT TO DO DAVE!!! * Dance with my hands * Revenge of the (Java)script * I hate you so much * Block all the things * Software Deployment Nightmare * Bad news * The Perfect meal * Expect the Unexpected * New Years Resolutions * It keeps getting bigger! * Where am I? VoteNext poll opens up on January 14th.
Michael spends the holidays changing his passwords, Joe forgot to cancel his subscriptions, and Allen’s busy playing Call of Duty: Modern Healthcare as we discuss the our 2023 resolutions.
The full show notes for this episode are available at https://www.codingblocks.net/episode201.
News Thanks for the reviews CourageousPotato, Billlhead, [JD]Milo! + Want to help us out? Leave us a review. * Game Jam is coming up, January 20-23! (itch.io) * Thoughts on LastPass? + Check out the encrypted fields, as figured out by a developer. (GitHub) + LastPass users: Your info and password vault data are now in hackers’ hands* (Ars Technica)
Our 2023 ResolutionsMichael’s* Learn Kotlin, * Go deeper on streaming technologies, such as Kafka, Flink, and/or Kafka Connect, and * Learn more music theory and techniques.
Drink!JZ’s* Of course Joe has categorized his resolutions into the following areas: finances, health, personal development, and career management, * Go deeper on Spring and streaming technologies, and * Do more game dev and LeetCode.
Q&A Round 1 What skills are opposite and which are adjacent that can be picked up this year? + Angular unit testing, + Front end development, + Spring, + Big data concepts and technologies * Any books, courses, or certifications? + Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon) + Certified Kubernetes Application Developer (CKAD)* (cncf.io)
Allen’s* Spend more time focusing on health and fun, * Updating the About Us page with recent info, * Go deeper on streaming technologies and conepts, * Go deeper on big data concepts such as data lakes, and best practices, etc., * Get back into making content again, such as YouTube, and/or maybe presenting.
Q&A Round 2* What do you want to avoid in 2023? + Less Jenkins, + Avoid piecemeal Spring upgrades,
2023 Predictions Data, privacy … do we need it?, * New languages, frameworks, * Generated content (Dalle-2, ChatGPT, Copilot), and * AI ethics + ChatGPT Wrote My AP English Essay—and I Passed* (WSJ)
Resources* Designing Data Intensive Applications (Amazon) * Orlando Code Camp (OrlandoCodeCamp.com) * Atlanta Dev Con (AtlDevCon.com)
Tip of the Week You can pipe directly to Visual Studio Code (in bash anyway), much easier than outputting to a file and opening it in Code … especially if you end up accidentally checking it in!
+ Example: curl https://www.codingblocks.net | code -
* Is your trackpad not responding on your new(-ish) MacBook? Run a piece of paper around the edge to clean out any gunk. Also maybe avoid dripping BBQ sauce on it.
* How does the iOS MFA / Verification Code settings work? We want MFA, but we we’re tired of the runaround!
* Jump around – nope, not Kris Kross, great tip from Thiyagarajan – keeps track of your most “frecent”* directories to make navigation easier (GitHub)
+ There’s a version for PowerShell too – thank you Brad Knowles! (GitHub)
We step back and look at how things have changed since we first started the show while Outlaw is dancing on tables, Allen really knows his movie monsters, and Joe’s math is on point.
The full show notes for this episode are available at https://www.codingblocks.net/episode200.
News* Thanks for the review nickname222Apple<3! + Want to help us out? Leave us a review. * Want Free stickers? Send us a SASE, instructions over at (/swag) * Game Jam is coming up, January 20-23 (itch.io)
Favorite Episodes* We Still Don’t Understand Open Source Licensing (#5) * Comparing Git Workflows (#90) * Git from the Bottom Up series (#195) * Designing Data-Intensive Applications series (series) * The DevOps Handbook series (series) * The Imposters Handbook series (series) * Boxing and Unboxing in .NET (#2) * Docker for Developers (#80) * Elasticsearch (#83) * Show Recursion Show (#154) * Why is Python Popular? (#152) * Hierarchical database patterns (series)
Favorite Events* NDC 2020 (#126) * Atlanta Dev Con (atldevcon.com) * Orlando Code Camp (orlandocodecamp.com) * South Florida Code Camp (#SoFloDevCon) * Tampa Code Camp (facebook.com) * Game Jams
How things have changed since we started?* Social media * The technologies we use * Our careers * Show format * Media consumption habits * Any viewpoints that have changed? * Technology * We’ve wrapped up 9 years…how have we changed the most…why? * Bonus: Buying a window with 3 huge tvs (youtube.com)
Top 3 things you’ve gotten out of it …* Alphabetize all the things in your class * A better understanding of DB technologies and the impact of their underlying data structures * It’s forced us to study various topics … * Amazing friends, community * The application tier can / should be your most powerful * Don’t make your tech-du-jour a hammer
Tip of the Week* If you want to enable Markdown support, open a document in Google Docs, head over to the top of the screen, go to “Tools” then “Preferences” and enable “Automatically detect Markdown.” After that, you’re good to go..except this only works for the current doc. (techcrunch.com) * Markdown Viewer is also a plugin for Chrome that lets you support .md files in Google Drive (workspace.google.com) * DataGrip’s useless “error at position” messages are frustrating, but the IDE actually does give you the info you need. Check your cursor! * Minikube’s “profile” feature makes it easy to swap between clusters. No more tearing down and rebuilding if you need to switch to a new task! (minikube.sigs.k8s.io) * SQLforDevs.com has a free ebook: Next-Level Database Techniques for Developers. (sqlfordevs.com) + Thanks for the tip Mikerg!
We talk about career management and interview tips, pushing data contracts “left”, and our favorite dev books while Outlaw is [redacted], Joe’s trying to figure out how to hire junior devs, and Allen’s trying to screw some nails in.
The full show notes for this episode are available at https://www.codingblocks.net/episode199.
News* Thanks for the reviews Ryan Barger and Amazon Customer! + Want to help us out? Leave us a review. * The sign-up form for The 3rd Coding Blocks Game Jam is live! #cbjam + Check out videos from past years: - CBJAM ’22 (youtube) - CBJAM ’21 (youtube) * Interesting article about AI potentially replacing recruiters at Amazon (vox.com)
From ‘Round the Water-CoolerWhy don’t companies want junior developers?
How long do you need to stay at a job?
Data Contracts..moving left?
Most impactful books we’ve covered on the show
How do you prepare to interview for a company?
How do you decide when to bring in new tech?
Tip of the Week* Did you know Obsidian has a command palette similar to Code? Same short-cut (Cmd/Ctrl-P) as VS Code and it makes for a great learning curve! Don’t know how to make something italic? Cmd-P. Insert a template? Cmd-P. Pretty much anything you want to do, but don’t know how to do. Cmd P! (help.obsidian.md) * Ghostery plugin for Firefox cuts down on ads and protects your privacy. Thanks for the tip Aaron Jeskie! (addons.mozilla.org) * Amazing prank to play on Windows user, hit F-11 to full screen this website next time your co-worker or family member leaves their computer unlocked. Thanks Scott Harden! (fakeupdate.net)
We take a peak into some of the challenges Twitter has faced while solving data problems at large scale, while Michael challenges the audience, Joe speaks from experience, and Allen blindsides them both.
It's that time of year where we've got money burning a hole in our pockets. That's right, it's time for the annual shopping spree. Meanwhile, Fiona Allen is being gross, Joe throws shade at Burger King, and Michael has a new character encoding method.
We gather around the watercooler to discuss the latest gossip and shenanigans have been called while Coach Allen is not wrong, Michael gets called out, and Joe gets it right the first time.
We wrap up Git from the Bottom Up by John Wiegley while Joe has a convenient excuse, Allen gets thrown under the bus, and Michael somehow made it worse.
This episode, we learn more about Git's Index and compare it to other version control systems while Joe is throwing shade, Michael learns a new command, and Allen makes it gross.
It's time to understand the full power of Git's rebase capabilities while Allen takes a call from Doc Brown, Michael is breaking stuff all day long, and Joe must be punished.
We are committed to continuing our deep dive into Git from the Bottom Up by John Wiegley, while Allen puts too much thought into onions, Michael still doesn't understand proper nouns, and Joe is out hat shopping.
It's surprising how little we know about Git as we continue to dive into Git from the Bottom Up, while Michael confuses himself, Joe has low standards, and Allen tells a joke.
After working with Git for over a decade, we decide to take a deep dive into how it works, while Michael, Allen, and Joe apparently still don't understand Git.
Once again, Stack Overflow takes the pulse of the developer community where we have all collectively decided to switch to Clojure, while Michael is changing things up, Joe is a future predicting trailblazer, and Allen is "up in the books"
We're going back in time, or is it forward?, as we continue learning about Google's automation evolution, while Allen doesn't like certain beers, Joe is a Zacker™, and Michael poorly assumes that UPSes work best when plugged in.
We explore the evolution of automation as we continue studying Google's Site Reliability Engineering, while Michael, ah, forget it, Joe almost said it correctly, and Allen fell for it.
We finished. A chapter, that is, of the Site Reliability Engineering book as Allen asks to make it weird, Joe has his own pronunciation, and Michael follows through on his promise.
We haven't finished the Site Reliability Engineering book yet as we learn how to monitor our system while the deals at Costco as so good, Allen thinks they're fake, Joe hasn't attended a math class in a while, and Michael never had AOL.
We say "toil" a lot this episode while Joe saw a movie, Michael says something controversial, and Allen's tip is to figure it out yourself, all while learning how to eliminate toil.
Welcome to the morning edition of Coding Blocks as we dive into what service level indicators, objectives, and agreements are while Michael clearly needs more sleep, Allen doesn't know how web pages work anymore, and Joe isn't allowed to beg.
We learn how to embrace risk as we continue our learning about Site Reliability Engineering while Johnny Underwood talked too much, Joe shares a (scary) journey through his mind, and Michael, Reader of Names, ends the show on a dark note.
It's finally time to learn what Site Reliability Engineering is all about, while Jer can't speak nor type, Merkle got one (!!!), and Mr. Wunderwood is wrong.
We're living through the tail end, maybe?, of the Great Resignation, so we dig into how that might impact software engineering careers while Allen is very somber, Joe's years are ... different, and Michael compares Apples to Apples.
We dive into what it takes to adhere to minimum viable continuous delivery while Michael isn't going to quit his day job, Allen catches the earworm, and Joe is experiencing full-on Stockholm syndrome.
We have a retrospective about our recent Game Ja Ja Ja Jam, while Michael doesn't know his A from his CNAME, Allen could be a nun, and Joe still wants to be a game developer.
We wrap up our discussion of PagerDuty's Security Training, while Joe declares this year is already a loss, Michael can't even, and Allen says doody, err, duty.
We're pretty sure we're almost done and we're definitely all present for the recording as we continue discussing PagerDuty's Security Training, while Allen won't fall for it, Joe takes the show to a dark place, and Michael knows obscure, um, stuff.
The 2022 Coding Blocks Game Jam is almost here. We need your help with one more round of voting. VOTING IS CLOSED! Choose your 3 favorite themes and we will […]
We continue our discussion of PagerDuty’s Security Training presentation while Michael buys a vowel, Joe has some buffer, and Allen hits everything he doesn’t aim for.
The full show notes for this episode are available at https://www.codingblocks.net/episode175.
Sponsors * Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. * Linode – Sign up for $100 in free credit and simplify your infrastructure with Linode’s Linux virtual machines. * Shortcut – Project management has never been easier. Check out how Shortcut is project management without all the management.
Survey Says
Anonymous VoteSign in with WordpressDo stick with your New Year's resolutions?* For the first couple weeks. * I'm pretty good until Spring. -Ish. * I'm like a machine. Resolutions are rules that are not meant to be broken. * Wait, those things are to be taken seriously? They're broken by noon New Years Day. * What are resolutions? vote
News * Thanks for the reviews! + iTunes: aodiogo * Game Ja-Ja-Ja-Jamuary is coming up, sign up is open now! (itch.io)
Encryption * OWASP has the more generic “Cryptographic Failures” at #2, up from #3 in 2017. * PagerDuty defines encryption as encoding information in such a way that only authorized readers can access it. + Note that this is an informal definition that speaks to the most common use of the word. * Encryption is really, really difficult to get right. There are people that spend their whole lives thinking about encryption, and breaking encryption. You may think you’re a genius by coming up with a non-standard implementation, but unfortunately the attackers are really sophisticated and this strategy has shown to fail over and over. * There are different types of encryption: + Symmetric/Asymmetric – refers to whether the keys for reading and writing the encrypted data are the same. + Block Cipher – Lets you encrypt and decrypt the data in whole chunks. You need to have an entire block to encrypt or decrypt the whole block at once. + Public/Private Key – A kind of asymmetric encryption intended for situations where you want groups to be able to share one of the keys. For example, you can publish a public PGP key and then people can use that to send you a message. You keep the private key private, so you’re the only entity that can read the message. + Stream Cipher – Encode “on the fly”, think about HTTPS, great for streaming. You can start reading before you have the entire message. Great for situations where performance is important, or you might miss data.
Encryption in Transit * Also known by other names such as data in motion. * Designed to protect against entities that can snoop (or manipulate!) our communications. * You can do this with HTTPS, TLS, IPsec. * Perfect Forward Secrecy is the key to protecting past communications, by generating a new key for a single session so that compromised keys only affect the specific session they were used for. * From Wikipedia “In cryptography, forward secrecy (FS), also known as perfect forward secrecy (PFS), is a feature of specific key agreement protocols that gives assurances that session keys will not be compromised even if long-term secrets used in the session key exchange are compromised.” (Wikipedia)
Encryption at Rest * Simply means that data is encrypted where it’s stored. + An example of this is full disk encryption on laptops and desktops. The entire drive is encrypted so if someone were to steal the drive, it’d essentially be useless without the keys to decrypt the data on the drive. * For PagerDuty, and many other companies, the most important information to protect is customer data, just as important as your own passwords. * PagerDuty’s data classifications: + General data – This is anything available to the public. + Business data – Includes operating data for the business, such as payroll, employee info, etc. This type of data is expected to be encrypted in transit and at rest. + Customer data – This is data provided to the company by the customer and is expected to be encrypted in transit and at rest. - Customer data includes controls such as authentication, access control, storage, auditing, encryption, and destruction. - Business data has similar controls except without the auditing. * PagerDuty called out when using cloud systems, make sure you’re enabling the encryption on the various services, like S3, GCS, Blob storage, etc. + They mentioned it’s just a checkbox, but in reality you’re probably using scripts, templates, etc. So make sure you know the configurations to include to enable encryption. * Another interesting thing they do at PagerDuty: they get alerted when a resource is created without encryption enabled. * What about third parties you use? Should they encrypt as well? YES!!! + Perform vendor risk assessments prior to using the vendor. If they don’t pass the security assessment, use a different vendor.
Secret Management * Q. What is it? A. Protecting and auditing access to secrets. + Auditing so that you can see when someone is using your secrets that shouldn’t, as well as keep track of systems that should and are using secrets. * Hashicorp Vault has a great video to learn about the challenges of managing secrets. (YouTube) * What are secrets? + Secrets are sensitive things such as tokens, keys, passwords, user names, many others. * Secrets should NOT be stored in source control. + Although it seems to happen all the time, be it on purpose, by accident, etc. + Anyone with access to the code can now access the secrets. * PagerDuty uses Vault. Vault: + Securely stores secrets, + Provides audit access to those secrets, and + Provides mechanisms to rotate the secrets if/when necessary. * Don’t hardcode or come up with crazy ways to get secrets into your applications. * Secrets should never be shared, i.e. if two people need access to a system, they should have their own secrets to access that system. + Or maybe you have a “jump” server that has access to an external system, and users have access to the jump server. * NEVER share passwords over insecure channels. This can include channels such as: + Slack, + Email, + SMS, + But this is not an exhaustive list. * If you do accidentally post a secret in a chat or an insecure channel, you should: + Let the security team know immediately (you have a security team right?!), and + Find out how to rotate the secret and do it. * Never allow a secret to be logged! + This can be especially egregious if you’re logging customer credentials you don’t control. + Be sure you are sanitizing your log data before you log.
Resources we Like * For Engineers – PagerDuty Security Training (sudo.PagerDuty.com) * For Everyone – PagerDuty Security Training (sudo.PagerDuty.com) * Security Now (TWiT.tv) * Have I Been Pwned (HaveIBeenPwned.com) * Forward secrecy (Wikipedia) * What is Sign in with Apple? (support.apple.com) * What is Hide My Email? (support.apple.com) * Introduction to HashiCorp Vault with Armon Dadgar (YouTube) * Encryption (NetworkSorcery.com) * OWASP Guide to Cryptography (OWASP.org) * Infrastructure Secret Management Software Overview (GitHub)
Tip of the Week * Hashicorp Vault is a tool for managing secrets, but did you know they have a ton of plugins? Take a look! (VaultProject.io) * Unity has tools built in for common game functionality, it’s worth taking a few minutes to google for something before you start typing. Don’t worry, there is still plenty of code to write, but these tools improve the quality and consistency of your game. * You can use animation clips to create advanced character animations, but it’s also good for simple tweens and motions that need to happen once, or in a loop. No need for “Rotator.cs” type classes that you see in a lot of Unity tutorials. (docs.unity3d.com) * NavMeshes are an efficient ways of handling pathfinding, which is an important piece of many games. You can learn the basics in just a few minutes and accomplish some amazing things. (docs.unity3d.com) * GoFullPage lets you take a screenshot of a whole webpage, bada bing, bada boom. (chrome.google.com, GoFullPage.com)
Register for the Game Jam. Make cool stuff. The 2022 Coding Blocks Game Jam is coming up January 21st – 24th ~~and we need your help selecting a theme! Pick your 3 – 5 favorites in the list below, and we’ll carry the top half forward to round 2.~~
Voting has ended, these topics are coming over to round 2:
This poll is no longer accepting votes
Choose your 3 favorites themes* No time to lose * Spaced out * Can't stop * All your base are belong to us * It doesn't mix * Go to Hell * Magic Bug * Work Life Balance * There is a ticket for that * Trust nothing * So the customer is king, but I'm the empire * Squid Game * Hook me up * Origami * Fast Forward * It's following me * A Link to the somewhere * Failure is the option * Find the light * Tethered but alone * A11y * Branching off * It's alive * Snake! Snake! Snaaake! * All for one * Fresh starts * Everything is fixed * Game Jam 2 Electric Boogalo * Up and down * The rising tide Vote
What’s a Game Jam?
We have an episode all about Game Jams, but the short version is that a Game Jam is a chance to try something new by using your programming and creative skills to build a game. This jam runs from January 21st – 24th, afterwards anybody who submitted a game can play and give feedback on the games. There are no strict rules or prizes, it’s all about the fun. Check up the sign-up page for more information: itch.io
We’re taking our time as we discuss PagerDuty’s Security Training presentations and what it means to “roll the pepper” while Michael is embarrassed in front of the whole Internet, Franklin Allen Underwood is on a full name basis, and don’t talk to Joe about corn.
The full show notes for this episode are available at https://www.codingblocks.net/episode174.
Sponsors * Linode – Sign up for $100 in free credit and simplify your infrastructure with Linode’s Linux virtual machines.
Survey Says
Anonymous VoteSign in with WordpressHow much personal time off do you take on average each year?* A week or less. It's awful, but without me the company won't survive. * Two weeks. It's like I just joined the company. Amurica * Three weeks. Itsa nice. * Four weeks!!! An entire month. This must be what it's like to be European. * More than four weeks!!! I wouldn't say I was missing work. vote
News * Thanks for the reviews! + iTunes: Goofiw, totalwhine, Kpbmx, Viv-or-vyv * Game Ja-Ja-Ja-Jamuary is coming up, sign up is open now! (itch.io) * Question about unit tests, is extra code that’s only used by unit tests acceptable? * Huge congrats to Jamie Taylor for making Microsoft MVP! Check out some of his podcasts: + The .NET Core Podcast (DotNetCore.show) + Tabs and Spaces Podcast (TabsAndSpaces.io) + RJJ Software Ltd (RJJ-Software.co.uk)
Why this topic? * It’s good to learn about the common security vulnerabilities when developing software! What they are, how they are exploited, and how they are prevented. * WebGoat is a website you can run w/ known vulnerabilities. It is designed for you to poke at and find problems with to help you learn how attackers can take advantage of problems. (OWASP.org) * “But the framework takes care of that for me” + Don’t be that person! + Recent vulnerability with Grafana, CVE-2021-43798. (SOCPrime.com) + The Log4j fiasco begins. (CNN) * You can’t always wait for a vulnerability patch to be released. You may need to patch one yourself. * Basically, even if you’re using a framework, it doesn’t mean you can be naïve to everything about it. * You shouldn’t use the excuse “It’s just for a hackathon” or “It’s a proof of concept.” + This can include things like disabling firewalls, etc. + Don’t put things on a public repo, as you might accidentally share company secrets, intellectual property, etc. - Open sourcing may be an option later, but it should be looked through first. + NEVER use customer data when doing hackathons or proofs of concepts. Too many things can go wrong if it leaks out. - Maybe a better rule of thumb would be to never use customer data for any type of development. Instead, always use fake data. * The slides had an interesting story that was redacted: there was a software vulnerability that was discovered that existed due to a missing check-in of code, i.e. everything was functioning perfectly fine, and there was an effort already to plug a hole in the code, but it just never made it into the repo. Nearly impossible to detect by automated tools.
Vulnerability #1 – SQL Injection
* OWASP has more a generic “Injection” as the #3 position, down from #1 in 2017.
* An example is manipulating a query at runtime with user provided input.
+ This typically implies that strings are patched into a query directly, i.e. WHERE password = '$providedPassword'.
+ Can be attacked by doing something like providedPassword = ' OR 1=1 --.
+ Which effectively turns into WHERE password = '' OR 1=1 --.
+ This is the basis for the tale of little Bobby Tables (xkcd).
* Users should NEVER be able to directly impact the runnable query.
+ They can provide values, and those should be parameterized, or validated first.
* The real problem is that people with SQL knowledge can string multiple lines of SQL together to manipulate the original query in some scary ways.
Blind Injection Boolean * Boolean based attacks take time but the scripting throws errors if script results are true. + Example they provided is “If the first database starts with an A, throw”, “If the first database starts with a B, throw”, etc.
Time Based
* Uses the Boolean based attack, but puts them on a delay so they won’t be as easily detected.
* So you can just regular expressions for keywords and escape quotes right?! Ummm … no!
+ There’s just too many combinations of things you’d need to know as well as weird characters and tricks you couldn’t even be aware of, double or triple encoding, exceptions, etc.
+ It’s surprisingly tricky. For example, how would you allow single quotes? Replace them all with \'? Unless there’s already a \ in front of it, but what if it’s \?
+ You can theoretically overcome all of these problems … but … why? Why not just do it the right way?
* The answer is to use prepared statements and/or parameterized queries.
+ The difference between a prepared statement and what was mentioned above is the user’s input doesn’t directly modify a query, rather the input is substituted in the appropriate place.
- Side benefit is prepared statements often execute quicker than manually constructed SQL queries.
Vulnerability #2 – Storing Passwords * OWASP has the more generic “Cryptographic Failures” at #2, up from #3 in 2017. * Never store passwords in plain text! * I’ve heard hashing is good, right? + Kind of, until you hear that there’s this thing called rainbow tables. - Rainbow tables are basically dictionaries of passwords that have been hashed using various algorithms. This allows you to quickly look up a previously known password with a common hashing algorithm. * Using a salt: + This is essentially appending a random string of data to the end of a password before hashing it. - This salt must NEVER be reused, and it should be changed every time a password is created or changes. - The sole purpose of a salt is to ensure rainbow tables will be ineffective. The salts can be stored as plain text right next to the password, they are not a secret, they just ensure the hash will be different even if the same passwords are used multiple times. * Using “a” pepper: + They referred to it as a site-wide salt, which is pretty accurate. + The pepper does the same thing as the salt, it’s appended to every password before hashing. - The biggest difference is that the pepper is not stored alongside the data, rather it’s stored in a file on a server separate from the data. - Essentially you’re double-salting your password before hashing. - Password + Salt (stored next to the password with the data) + Pepper (stored on separate server), then hash. * Pepper can make it more difficult for hackers as if they steal the database, they still don’t have the pepper. * Pepper can also make it more difficult for the owners of the system as “rolling a pepper” can be difficult, and you have to potentially keep track of all historical peppers. * Even with the salts and peppers, this still doesn’t fully solve the problem. Why? + Can’t use a rainbow table, but … if a hacker has the salt and pepper, they can try to brute force the password hashes. - They can do this because depending on the hashing algorithm chosen, the hashing is just too fast: MD5, SHA-1, etc. - Those algorithms weren’t designed for security, they were designed for speed. + Solution: Key-stretching - This is running the password through a hash algorithm a large number of times. * The output of the first hash will be the input for the second hash, and so on. * The whole point is to make it take longer to hash. If you were to hash a password 100k times, it might take a second. + This means for a legit user, it’s going to take a second to hash and compare a login, but for a hacker trying to crack passwords, at MOST they’ll be able to do one attempt per second. + Following the math here, previously with a single MD5 or similar hash, the hacker could attempt 100k password cracks per second vs one per second. - It’s still not perfect. Hardware is constantly getting better. So what’s a good and slow today, may not be in a year. * Adaptive Hashing: + Same concepts as above, except you can increase the number of hashing rounds as time goes on. + Really what you want is the cost to hack a password for a given algorithm. PagerDuty had a nice slide on this that estimated the cost of hardware to crack a password in one year. + Good algorithms for increasing the cost to hackers are bcrypt, scrypt and PBKDF2. - These were designed for hashing passwords specifically. - Salting and key stretching are also built into the algorithms so you don’t have to go do it on your own.
Resources we Like * For Engineers – PagerDuty Security Training (sudo.PagerDuty.com) * For Everyone – PagerDuty Security Training (sudo.PagerDuty.com) * OWASP Top 10 (OWASP.org) * SQL Injection References + MySQL SQL Injection Cheat Sheet (PenTestMonkey.net) + SQL Injection Prevention Cheat Sheet (OWASP.org) + Time-Based Blind SQL Injection Attacks (SQLInjection.net) + Blind SQL Injection (OWASP.org) + SQL and NoSQL Injection (ckarande.GitBooks.io) * Security Now episode 843, Trojan Source (TWiT.tv) * Trojan Source Bug Threatens the Security of all Code (KrebsOnSecurity) * Check if your accounts have been compromised (HaveIBeenPwned) * How big is your haystack and how well hidden is your needle? Calculator for figuring out how good your password is. (GRC.com) * bcrypt (Wikipedia) * scrypt (Wikipedia) * PBKDF2 (Wikipedia)
Tip of the Week * Did you know you can mail merge in Gmail? It works well! (developers.google.com) * Tip from Jamie Taylor: DockerSlim is a tool for slimming down your Docker images to reduce your image sizes and security foot print. You can minify it by up to 30x. Free and open-source. (GitHub) * Game Jam is coming up, checking out the free assets provided by Unity in the asset store. The quality is incredible and inspiring and the items range from art work to controllers (think FPS, 3P) to full “microgames” that you can take and build with till your heart’s content. Most are free and the one’s that aren’t are cheap and interesting. (assetstore.unity.com) * while True: learn() is a puzzle video game that can help teach you machine learning techniques. Thanks to Alex from GamingFyx for sharing this! + while True: learn() (SteamPowered.com) + Gaming Fyx (fyx.space) * Now that Zsh is the default shell in macOS, it’s time to get comfy and set up tab completion (ScriptingOSX.com) * GiTerm is a command line tool for visualizing Git information. (GitHub)
With Game Ja-Ja-Ja-Jamuary coming up, we discuss what makes a game engine, while Michael’s impersonation is spot-on, Allen may really just be Michael, and Joe already has the title of his next podcast show at the ready.
The full show notes for this episode are available at https://www.codingblocks.net/episode173.
Sponsors * Linode – Sign up for $100 in free credit and simplify your infrastructure with Linode’s Linux virtual machines.
Survey Says
Anonymous VoteSign in with WordpressWhat's your container management of choice?* Good ol' reliable Docker Desktop. * Rancher Desktop, I like my container management free and open like the wild west. * Podman, because the little otters logo is so cute. vote
Game Jam ’22 is coming up in Ja-Ja-Ja-Jamuary News * Thanks for the reviews! + Podchaser: Jamie Introcaso * Game Ja-Ja-Ja-Jamuary is coming up, sign up is open now! (itch.io)
What is a Game Engine? * What’s a… + Library, + Framework, + Toolkit, + … Engine? * Want to see terrible explanations of a thing? Google “framework vs engine”. * Other types of engines: storage engine, rendering engine, for example.
Q: Why do people use game engines? Well, they reduce costs, complexities, and time-to-market. Consistency!
Q: Why do so many AAA games create their own custom engines?
Common Features of Game Engines * 2D/3D rendering engine + Basic shapes (planes, spheres, lines), + Particles, Shaders, + Masking/Culling, + Progressive enhancement (either by distance or by some other means) * Physics engine + Collision detection, + Mass, + Gravity, + Torque, + Force, + Friction, + Springiness, + Fluid Dynamics, + Wind * Sound + Multiple sounds at once, looping, spatial settings, etc. * Scripting * AI * Networking + Ever thought about how this works? Peer to peer, dedicated servers? * Streaming + Streaming assets, as in, the player hasn’t installed your game. * Scene Management * Cinematics * UI * Often engines also include development tools to making working with these various systems easier … like an IDE.
Some Really Cool Things About Unity * Asset Store and Package Management, * ProBuilder (Unity), * Terrain, * Animation Manager, * Ad Systems and Analytics, * Target multiple platforms: Xbox, Windows, Linux, Android, MacOS, iOS, PSX, Switch, etc.
About the Industry * How big is the industry? + $150B in 2019, estimated $250B for 2025 (TechJury.net) + How does it compare to other industries? - Movies are $41B, - Books are $25B, - Netflix is $7B … that’s about half of Nintendo, - HBO is $2B * How many companies and employees? + 2,457 companies and 220k jobs … in 2015! (Quora) * What’s the breakdown on sales? + Mobile $90B, PC $35B, console $49B (NewZoo.com) * How many games released in a year? + About 10k a year, on Steam (NewZoo.com) * How long does it take? 1 – 10 years? * The 10 Best Games Made By Just One Person (TheGamer.com)
Commentary on Popular Game Engines Unity * Publish for 20+ platforms * 50% of games are made with Unity (GameDeveloper.com) * List of Unity games (Wikpedia) * Pricing range: Free to $2,400. You can use the free plan if revenue or funding is less than $100k! * Program in C# * Great learning resources (learn.unity.com)
Unreal * Many AAA games built with Unreal. Basically think of the top 10 biggest, most beautiful, AAA games; those are probably all Unreal or custom (RAGE, Frostbyte, Last of Us) * Pricing: from free to “call for pricing”, 5% royalty after $1mm * List of Unreal Engine games (Wikipedia) * Originally came out of the Unreal series of games, and a new one is coming out soon! (Epic Games) * Program in C++
Godot * Open Source * Growing in popularity * You can program in a variety of languages, officially C/C++ and GDScript but there are other bindings (Wikipedia)
Custom Game Engines: * GameMaker * RPG Maker * Specialized: Frostbyte, Cryo, etc. * Korge * libGDX
Final Question Game Jam sign-up is live … what are you thinking for technology and mechanics?
Resources We Like * Splitgate, where Halo meets Portal (Splitgate.com) * Mythic Quest (Apple) * What’s wrong with TikTok? (Washington Post) * Blood, Sweat, and Pixels by Jason Schreier (Amazon)
Tip of the Week
* ProBuilder is a free tool available in Unity that is great for making polygons and great for mocking out levels or building ramps. The coolest part is the way it works, giving you a bunch of tools that you do things like create vertices, edges, surfaces, extrude, intrude, mirror, etc. You have to add it via the package manager but it’s worth it for simple games and prototypes. (Unity)
* Great blog on processing billions of events in real time at Twitter, thanks Mikerg! (blog.twitter.com)
* forEachIndexed is a nice Kotlin method for iterating through items in a collection, with an index for positional based computations (ozenero.com)
* How can you log out of Netflix on Samsung Smart TVs? Ever heard of the Konami code? Press Up Up Down Down Left Right Left Right Up Up Up Up (help.netflix.com)
We wrap up the discussion on partitioning from our collective favorite book, Designing Data-Intensive Applications, while Allen is properly substituted, Michael can’t stop thinking about Kafka, and Joe doesn’t live in the real sunshine state.
The full show notes for this episode are available at https://www.codingblocks.net/episode172.
Sponsors * Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. * Linode – Sign up for $100 in free credit and simplify your infrastructure with Linode’s Linux virtual machines.
Survey Says
Anonymous VoteSign in with WordpressHow many different data storage technologies do you use for your day job?* Just the one. And it is our hammer. * Two to three. It's a quaint little data pipeline. * Four or more. OMG why do we have so many. * None. Keep your data crap out of my CSS. vote
News * Game Ja Ja Ja Jam is coming up, sign up is open now! (itch.io) * Joe finished the Create With Code Unity Course (learn.unity.com) * New MacBook Pro Review, notch be darned!
Last Episode … Best book evar! In our previous episode, we talked about data partitioning, which refers to how you can split up data sets, which is great when you have data that’s too big to fit on a single machine, or you have special performance requirements. We talked about two different partitioning strategies: key ranges which works best with homogenous, well-balanced keys, and also hashing which provides a much more even distribution that helps avoid hot-spotting.
This episode we’re continuing the discussion, talking about secondary indexes, rebalancing, and routing.
Partitioning, Part Deux Partitioning and Secondary Indexes * Last episode we talked about key range partitioning and key hashing to deterministically figure out where data should land based on a key that we chose to represent our data. + But what happens if you need to look up data by something other than the key? + For example, imagine you are partitioning credit card transactions by a hash of the date. If I tell you I need the data for last week, then it’s easy, we hash the date for each day in the week. + But what happens if I ask you to count all the transactions for a particular credit card? - You have to look at every single record. in every single partition! * Secondary Indexes refer to metadata about our data that help keep track of where our data is. * In our example about counting a user’s transactions in a data set that is partitioned by date, we could keep a separate data structure that keeps track of which partitions each user has data in. * We could even easily keep a count of those transactions so that you could return the count of a user’s transaction solely from the information in the secondary index. * Secondary indexes are complicated. HBase and Voldemort avoid them, while search engines like Elasticsearch specialize in them. * There are two main strategies for secondary indexes: + Document based partitioning, and + Term based partitioning.
Document Based Partitioning
* Remember our example dataset of transactions partitioned by date? Imagine now that each partition keeps a list of each user it holds, as well as the key for the transaction.
* When you query for users, you simply ask each partition for the keys for that user.
* Counting is easy and if you need the full record, then you know where the key is in the partition. Assuming you store the data in the partition ordered by key, it’s a quick lookup.
* Remember Big O? Finding an item in an ordered list is O(log n). Which is much, much, much faster than looking at every row in every partition, which is O(n).
* We have to take a small performance hit when we insert (i.e. write) new items to the index, but if it’s something you query often it’s worth it.
* Note that each partition only cares about the data they store, they don’t know anything about what the other partitions have. Because of that, we call it a local index.
* Another name for this type of approach is “scatter/gather”: the data is scattered as you write it and gathered up again when you need it.
* This is especially nice when you have data retention rules. If you partition by date and only keep 90 days worth of data, you can simply drop old partitions and the secondary index data goes with them.
Term Based Partitioning
* If we are willing to make our writes a little more complicated in exchange for more efficient reads, we can step up to term based partitioning.
* One problem with having each partition keeping track of their local data is you have to query all the partitions. What if the data’s only on one partition? Our client still needs to wait to hear back from all partitions before returning the result.
* What if we pulled the index data away from the partitions to a separate system?
* Now we check this secondary index to figure out the keys, which we can then go look up on the appropriate indices.
* We can go one step further and partition this secondary index so it scales better. For example, userId 1-100 might be on one, 101-200 on another, etc.
* The benefit of term based partitioning is you get more efficient reads, the downside is that you are now writing to multiple spots: the node the data lives on and any partitions in our indexing system that we need to account for any secondary indexes. And this is multiplied by replication.
* This is usually handled by asynchronous writes that are eventually consistent. Amazon’s DynamoDB states it’s global secondary indexes are updated within a fraction of a second normally.
Rebalancing Partitions
* What do you do if you need to repartition your data, maybe because you’re adding more nodes for CPU, RAM, or losing nodes?
* Then it’s time to rebalance your partitions, with the goals being to …
+ Distribute the load equally-ish (notice we didn’t say data, could have some data that is more important or mismatched nodes),
+ Keep the database operational during the rebalance procedure, and
+ Minimize data transfer to keep things fast and reduce strain on the system.
* Here’s how not to do it: hash % (number of nodes)
+ Imagine you have 100 nodes, a key of 1000 hashes to 0. Going to 99 nodes, that same key now hashes to 1, 102 nodes and it now hashes to 4 … it’s a lot of change for a lot of keys.
Partitions > Nodes * You can mitigate this problem by fixing the number of partitions to a value higher than the number of nodes. * This means you move where the partitions go, not the individual keys. + Same recommendation applies to Kafka: keep the numbers of partitions high and you can change nodes. + In our example of partitioning data by date, with a 7 years retention period, rebalancing from 10 nodes to 11 is easy. * What if you have more nodes than partitions, like if you had so much data that a single day was too big for a node given the previous example? + It’s possible, but most vendors don’t support it. You’ll probably want to choose a different partitioning strategy. * Can you have too many partitions? Yes! + If partitions are large, rebalancing and recovering from node failures is expensive. + On the other hand, there is overhead for each partition, so having many, small partitions is also expensive.
Other methods of partitioning * Dynamic partitioning: + It’s hard to get the number of partitions right especially with data that changes it’s behavior over time. - There is no magic algorithm here. The database just handles repartitioning for you by splitting large partitions. - Databases like HBase and RethinkDB create partitions dynamically, while Mongo has an option for it. * Partitioning proportionally to nodes: + Cassandra and Ketama can handle partitioning for you, based on the number of nodes. When you add a new node it randomly chooses some partitions to take ownership of. - This is really nice if you expect a lot of fluctuation in the number of nodes.
Automated vs Manual Rebalancing * We talked about systems that automatically rebalance, which is nice for systems that need to scale fast or have workloads that are homogenized. * You might be able to do better if you are aware of the patterns of your data or want to control when these expensive operations happen. * Some systems like Couchbase, Riak, and Voldemort will suggest partition assignment, but require an administrator to kick it off. * But why? Imagine launching a large online video game and taking on tons of data into an empty system … there could be a lot of rebalancing going on at a terrible time. It would have been much better if you could have pre-provisioned ahead of time … but that doesn’t work with dynamic scaling!
Request Routing * One last thing … if we’re dynamically adding nodes and partitions, how does a client know who to talk to? * This is an instance of a more general problem called “service discovery”. * There are a couple ways to solve this: + The nodes keep track of each other. A client can talk to any node and that node will route them anywhere else they need to go. + Or a centralized routing service that the clients know about, and it knows about the partitions and nodes, and routes as necessary. + Or require that clients be aware of the partitioning and node data. * No matter which way you go, partitioning and node changes need to be applied. This is notoriously difficult to get right and REALLY bad to get wrong. (Imagine querying the wrong partitions …) * Apache ZooKeeper is a common coordination service used for keeping track of partition/node mapping. Systems check in or out with ZooKeeper and ZooKeeper notifies the routing tier. * Kafka (although not for much longer), Solr, HBase, and Druid all use ZooKeeper. MongoDb uses a custom ConfigServer that is similar. * Cassandra and Riak use a “gossip protocol” that spreads the work out across the nodes. * Elasticsearch has different roles that nodes can have, including data, ingestion and … you guessed it, routing.
Parallel Query Execution * So far we’ve mostly talked about simple queries, i.e. searching by key or by secondary index … the kinds of queries you would be running in NoSQL type situations. * What about? Massively Parallel Processing (MPP) relational databases that are known for having complex join, filtering, aggregations? * The query optimizer is responsible for breaking down these queries into stages which target primary/secondary indexes when possible and run these stages in parallel, effectively breaking down the query into subqueries which are then joined together. * That’s a whole other topic, but based on the way we talked about primary/secondary indexes today you can hopefully have a better understanding of how the query optimizer does that work. It splits up the query you give it into distinct tasks, each of which could run across multiple partitions/nodes, runs them in parallel, and then aggregates the results. + Designing Data-Intensive Applications goes into it in more depth in future chapters while discussing batch processing.
Resources We Like * Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon)
Tip of the Week
* PowerLevel10k is a Zsh “theme” that adds some really nice features and visual candy. It’s highly customizable and works great with Kubernetes and Git. (GitHub)
* If for some reason VS Code isn’t in your path, you can add it easily within VS Code. Open up the command palette (CTRL+SHIFT+P / COMMAND+SHIFT+P) and search for “path”. Easy peasy!
* Gently Down the Stream is a guidebook to Apache Kafka written and illustrated in the style of a children’s book. Really neat way to learn! (GentlyDownThe.Stream)
* PostgreSQL is one of the most powerful and versatile databases. Here is a list of really cool things you can do with it that you may not expect. (HakiBenita.com)
Check out PowerLevel10k
We crack open our favorite book again, Designing Data-Intensive Applications by Martin Kleppmann, while Joe sounds different, Michael comes to a sad realization, and Allen also engages “no take backs”.
The full show notes for this episode are available at https://www.codingblocks.net/episode171.
Sponsors * Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard. * Linode – Sign up for $100 in free credit and simplify your infrastructure with Linode’s Linux virtual machines.
Survey Says
Anonymous VoteSign in with WordpressHave you ever had to partition your data?* Ever? More like always. * On occasion. It's just another tool in my toolbox. * Once. I don't want to talk about it. * Nope. Does that mean my dataset is small? * Nope, not my job. vote
News * Thank you for the review! + iTunes: Wohim321
Best book evar! The Whys and Hows of Partitioning Data * Partitioning is known by different names in different databases: + Shard in MongoDB, ElasticSearch, SolrCloud, + Region in HBase, + Tablet in BigTable, + vNode in Cassandra and Riak, + vBucket in CouchBase. * What are they? * In contrast to the replication we discussed, partitioning is spreading the data out over multiple storage sections either because all the data won’t fit on a single storage mechanism or because you need faster read capabilities. * Typically data records are stored on exactly one partition (record, row, document). * Each partition is a mini database of its own.
Why partition? Scalability
Figure 6-1 in the book shows this leader / follower scheme for partitioning among multiple nodes. * The goal in partitioning is to try and spread the data around as evenly as possible. * If data is unevenly spread, it is called skewed. * Skewed partitioning is less effective as some nodes work harder while others are sitting more idle. * Partitions with higher than normal loads are called hot spots. * One way to avoid hot-spotting is putting data on random nodes. + Problem with this is you won’t know where the data lives when running queries, so you have to query every node, which is not good.
Partitioning by Key Range
* Assign a continuous range of keys on a particular partition.
+ Just like old encyclopedias or even the rows of shelves in a library.
+ By doing this type of partitioning, your database can know which node to query for a specific key.
+ Partition boundaries can be determined manually or they can be determined by the database system.
+ Automatic partition is done by BigTable, HBase, RethinkDB, and MongoDB.
+ The partitions can keep the keys sorted which allow for fast lookups. Think back to the SSTables and LSM Trees.
* They used the example of using timestamps as the key for sensor data – ie YY-MM-DD-HH-MM.
* The problem with this is this can lead to hot-spotting on writes. All other nodes are sitting around doing nothing while the node with today’s partition is busy.
+ One way they mentioned you could avoid this hot-spotting is maybe you prefix the timestamp with the name of the sensor, which could balance writing to different nodes.
+ The downside to this is now if you wanted the data for all the sensors you’d have to issue separate range queries for each sensor to get that time range of data.
+ Some databases attempt to mitigate the downsides of hot-spotting. For example, Elastic has the ability specify an index lifecycle that can move data around based on the key. Take the sensor example for instance, new data comes in but the data is rarely old. Depending on the query patterns it may make sense to move older data to slower machines to save money as time marches on. Elastic uses a temperature analogy allowing you to specify policies for data that is hot, warm, cold, or frozen.
Partitioning by Hash of the Key
* To avoid the skew and hot-spot issues, many data stores use the key hashing for distributing the data.
* A good hashing function will take data and make it evenly distributed.
* Hashing algorithms for the sake of distribution do not need to be cryptographically strong.
+ Mongo uses MD5.
+ Cassandra uses Murmur3.
+ Voldemort uses Fowler-Noll-Vo.
+ Another interesting thing is not all programming languages have suitable hashing algorithms. Why? Because the hash will change for the same key. Java’s object.hashCode() and Ruby’s Object#hash were called out.
+ Partition boundaries can be set evenly or done pseudo-randomly, aka consistent hashing.
* Consistent hashing doesn’t work well for databases.
* While the hashing of keys buys you good distribution, you lose the ability to do range queries on known nodes, so now those range queries are run against all nodes.
* Some databases don’t even allow range queries on the primary keys, such as Riak, Couchbase, and Voldemort.
* Cassandra actually does a combination of keying strategies.
+ They use the first column of a compound key for hashing.
+ The other columns in the compound key are used for sorting the data.
- This means you can’t do a range query over the first portion of a key, but if you specify a fixed key for the first column you can do a range query over the other columns in the compound key.
- An example usage would be storing all posts on social media by the user id as the hashing column and the updated date as the additional column in the compound key, then you can quickly retrieve all posts by the user using a single partition.
* Hashing is used to help prevent hot-spots but there are situations where they can still occur.
+ Popular social media personality with millions of followers may cause unusual activity on a partition.
+ Most systems cannot automatically handle that type of skew.
+ In the case that something like this happens, it’s up to the application to try and “fix” the skew. One example provided in the book included appending a random 2 digit number to the key would spread that record out over 100 partitions.
+ Again, this is great for spreading out the writes, but now your reads will have to issue queries to 100 different partitions.
* Couple examples:
+ Sensor data: as new readings come in, users can view real-time data and pull reports of historical data,
+ Multi-tenant / SAAS platforms,
+ Giant e-commerce product catalog,
+ Social media platform users, such as Twitter and Facebook.
The first Google computer at Stanford was housed in custom-made enclosures constructed from Mega Blocks. (Wikipedia) Resources We Like * Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann (Amazon) * History of Google (Wikipedia)
Tip of the Week * VS Code lets you open the search results in an editor instead of the side bar, making it easier to share your results or further refine them with something like regular expressions. * Apple Magic Keyboard (for iPad Pro 12.9-inch – 5th Generation) is on sale on Amazon. Normally $349, now $242.99 on Amazon and Best Buy usually matches Amazon.(Amazon) + Compatible Devices: - iPad Pro 12.9-inch (5th generation), - iPad Pro 12.9-inch (4th generation), - iPad Pro 12.9-inch (3rd generation) * Room EQ Wizard is free software for room acoustic, loudspeaker, and audio device measurements. (RoomEQWizard.com)
The Mathemachicken strikes again for this year’s shopping spree, while Allen just realized he was under a rock, Joe engages “no take backs”, and Michael ups his decor game.
The full show notes for this episode are available at https://www.codingblocks.net/episode170.
Sponsors * Datadog – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard.
Survey Says
Anonymous VoteSign in with WordpressWhat's your favorite feature on the new MacBook Pro?* The return of the function keys!!! Bye Touch Bar ... * The MagSafe charger! I love proprietary cables! * I need an SD card slot. Not a USB card reader. That requires a USB A to C dongle. * That shape that hearken's back to those early 2000's MacBooks. * Obviously, it's all about that M1 Max!!! 10 CPU cores, 32 GPU cores. THIS. * That I don't need to buy or enable a TPM 2.0 module to run the latest OS. * Wait, Apple had another announcement? * The NOTCH!!!! vote
News * Thank you to everyone that left a review! + iTunes: BoldAsLove88 + Audible: Tammy
Joe’s List
| Price | Description | | “Fun” Answers | | $3,499.95 | Jura x8 (Williams and Sonoma) | | | $3,499.00 | 2021 Macbook Pro – 16″ Screen, M1 Max, 32GB RAM, 1TB drive (Apple) | | | Robotics | | $359.99 | Lego Mindstorms (Amazon) | | | $149.99 | Sphero BOLT (Amazon) | | | Entertainment | | $499.99 | Xbox Series X (Microsoft) | | | $180 | Game Pass (Microsoft) | | | $179.00 | Play Date (Website) | | | Health | | $929.99 | Trek Dual Sport 3 (Trek) | | | $179.95 | Fitbit Charge 5 (Amazon) | | | $921.22 | Dorito Dust Supplies (Recipe) | | | Levelling Up | | $199 / year | Educative.io (Website) | | | $159 / year | LeetCode Subscription (Website) | | | $99 | ACM Subscription (Sign Up) | |
Allen’s List
| Description | Price | | | Honorable mention: Steam Deck (Steam) | $399.00 | | | Honorable mention: Microsoft Surface Laptop Studio 14.4″ (Amazon) | $2,700.00 | | | LG 48″ C1 OLED TV (Amazon) | $1,297.00 | | | Honorable mention: Aorus 48″ OLED Gaming Monitor (Newegg) | $1,500.00 | | | HTC Vive Pro 2 (Amazon) | $799.00 | | | Valve Index Controllers (Steam/Valve) | $279.00 | | | Kinesis Advantage 2 (Amazon) | $339.00 | | | Corsair MP600 NVME PCIE x4 2TB (Amazon) | $240.00 | | | Arduino Ultimate Starter Kit (Amazon) | $63.00 |
Michael’s List
| Price | Description | | My smart home can beat up your smart home | | $14.99 | Kasa Smart Light Switch HS200 (Amazon) | | | $16.99 | Kasa Smart Dimmer Switch HS220 (Amazon) | | | $26.99 | Kasa Smart Plug Mini 15A 4-Pack EP10P4 (Amazon) | | | $17.99 | Kasa Outdoor Smart Plug with 2 Sockets EP40 (Amazon) | | | For my health | | $529.00 | Apple Watch Series 7 GPS + Cellular (Amazon) | | | Need moar power! | | $34.00 | Apple MagSafe Charger (Amazon) | | | $12.99 | elago W6 Apple Watch Stand (Amazon) | | | $10.99 | Honorable mention: elago W3 Apple Watch Stand (Amazon) | | | $29.00 | Honorable mention: Apple Watch Magnetic Charging Cable (0.3m) (Amazon) | | | When I lose my stuff | | $98.99 | Apple AirTag 4 Pack (Amazon) | | | $10.99 | Protective Case for Airtags (Amazon) | | | $14.88 | Honorable mention: Air Tags Airtag Holder for Dogs/Cat Pet Collar (Amazon) | | | I need to get some work done | | $180.00 | Code V3 104-Key Illuminated Mechanical Keyboard (Amazon) | | | $169.00 | Honorable mention: Das Keyboard 4 Professional Wired Mechanical Keyboard (Amazon) | | | $280.00 | Honorable mention: Drop SHIFT Mechanical Keyboard (Amazon) | | | $240.00 | Honorable mention: Drop CTRL Mechanical Keyboard (Amazon) | | | If you insist on an ergo keyboard | | $199.00 | Honorable mention: KINESIS GAMING Freestyle Edge RGB Split Mechanical Keyboard (Amazon) | | | Turns out, keycaps matter | | $29.99 | Honorable mention: Razer Doubleshot PBT Keycap Upgrade Set (Amazon) | | | $24.99 | Honorable mention: HyperX Pudding Keycaps (Amazon) | | | Things I need to buy again | | $19.99 | HyperX Wrist Rest (Amazon) | | | $28.99 | Honorable mention: Glorious Gaming Wrist Pad/Rest (Amazon) | | | $34.99 | Honorable mention: Razer Ergonomic Wrist Rest Pro (Amazon) | | | When things go wrong | | $69.99 | iFixit Pro Tech Toolkit (Amazon) | | | $64.99 | Honorable mention: iFixit Manta Driver Kit (Amazon) | | | For all your calling needs | | $599.00 | Rode RODECaster Pro Podcast Production Studio (Amazon) | | | $549.99 | Honorable mention: Zoom PodTrak P8 Podcast Recorder (Amazon) | | | $12.95 | On-Stage DS7100B Desktop Microphone Stand (Amazon) | | | $199.99 | Elgato Ring Light (Amazon) | | | $159.99 | Elgato HD60 S+ Capture Card (Amazon) | | | Music to your ears | | $148.49 | Kali Audio LP-6 Studio Monitor (Amazon) | | | $189.00 | Honorable mention: KRK RP5 Rokit G4 Studio Monitor (Amazon) | | | $379.99 | Honorable mention: Yamaha HS7I Studio Monitor (Amazon) | | | $199.99 | Honorable mention: ADAM Audio T5V Two-Way Active Nearfield Monitor (Amazon) | | | $155.00 | Honorable mention: JBL Professional Studio Monitor (305PMKII) (Amazon) | | | $599.00 | Kali Audio WS-12 12 inch Powered Subwoofer (Sweetwater) | | | $65.00 | Palmer Audio Interface (PMONICON) (Amazon) | | | $169.99 | Honorable mention: Focusrite Scarlett 2i2 (3rd Gen) USB Audio Interface (Amazon) | | | For the decor | | $34.99 | Dumb and Dumber Canvas (Amazon) | | | $34.99 | Honorable mention: The Big Lebowski Canvas (Amazon) | | | $34.99 | Honorable mention: Pulp Fiction Canvas (Amazon) | | | $34.99 | Honorable mention: Friday Canvas (Amazon) | | | $34.99 | Honorable mention: Jurassic Park (Amazon) | | | $34.99 | Honorable mention: Bridesmaids Canvas (Amazon) | | | $34.99 | Honorable mention: There’s Something About Mary (Amazon) | |
Resources We Like * Security Now 834, Life: Hanging By A Pin (Twit.tv) * Buyer Beware: Crucial Swaps P2 SSD’s TLC NAND for Slower Chips (ExtremeTech.com) * Samsung Is the Latest SSD Manufacturer Caught Cheating Its Customers (ExtremeTech.com)
Tip of the Week * VS Code … in the browser … just … there? Not all extensions work, but a lot do! (VSCode.dev) * Skaffold is a tool you can use to build and maintain Kubernetes environments that we’ve mentioned on the show several times and guess what!? You can make your life even easier with Skaffold with environment variables. It’s another great way to maintain flexibility for your environments … both local and CI/CD. (Skaffold.dev) * K9s is a Kubernetes terminal UI that makes it easy to quickly search, browse, filter, and edit your clusters and it also has skins! The Solarized Light theme is particularly awesome for customizing your experience, especially for presenting. (GitHub)