2.3 Summarize secure application development, deployment, and automation concepts
- Environment
- Development
- Test
- Staging
- Production
- Quality Assurance (QA)
- Provisioning and Deprovisioning
- Integrity Measurement
- Secure Coding Techniques
- Normalization
- Stored Procedure
- Obfuscation/Camouflage
- Code Reuse / Dead Code
- Server-Side vs Client-Side Execution and Validation
- Memory Management
- Use of Third-Party Libraries and Software Development Kits (SDKs)
- Data Exposure
- Open Web Application Security Project (OWASP)
- Software Diversity
- Compiler
- Binary
- Automation/Scripting
- Automated Courses of Action
- Continuous Monitoring
- Continuous Validation
- Continuous Integration
- Continuous Delivery
- Continuous Deployment
- Elasticity
- Scalability
- Version Control
Why is secure application development so important? It is so that bad people can’t sneak bad code into your software. When you release a new application, or an update to an existing application, you are expected to digitally sign it. A digital signature tells people that the software you are providing them is authentic, that nobody tampered with it.
But what if somebody writes malicious code and sneaks it in to your application before you sign it? That is what happened to SolarWinds in 2020. Somebody snuck in and wrote some malicious code inside a software update. SolarWinds signed the update. 18,000 customers installed the update and many computers were compromised. Those customers trusted the update because it was digitally signed by SolarWinds.
How do you keep this from happening to you?
How are applications developed? The Development, Test, Staging, Production hierarchy is common in software architecture. All hierarchies start at development and end at production but may include other stages.
- Development. The development environment is where the program is written. The development environment may be different from the target machine. For example, a developer may use a Windows machine to develop an application for a smartphone running an Android operating system.
The development environment may be limited to a single machine or could include many machines and servers. In software development, it is common for changes to be logged gradually (so that a developer can revert to a previous revision). Each time a major change occurs, the developer stops and records that change. A git repository is good for recording changes to software code. - Test. The test environment is where the program is tested. Ideally, the program should be tested on the same types of machines that it will be deployed on. This is not always possible (for example developing an application for a smartphone that has not been released). For a developer writing a program for a mass market, there may be millions of combinations of hardware and operating systems, and it is not possible to test on all of them.
No test is perfect. Results from testing will catch the most common errors. It may be necessary to release the program to the marketplace and allow users to report their feedback. This is known as an alpha test or a beta test.
The testing may be automated and/or may use human subjects. What are some items that should be tested?- User interface (is the application user friendly?)
- Does the application perform as intended?
- Does the application consume excessive system resources?
- Does the application crash or halt unexpectedly?
- Will entering invalid data (for example, negative numbers for a measurement, text in a numeric field, etc.) damage the program?
- User interface (is the application user friendly?)
- Staging. The staging environment is an environment that is exactly like the production. It is used to install, test, and configure the program. In some environments, the staging environment is where the program is set up. For example, a retail store may set up (stage) cash registers at a warehouse and then ship them to a store for final installation.
- Production. The production environment is where the program is installed and in use by end users. A live application is known to be “in production”.
- Quality Assurance. After we have deployed the program, we perform some checks to make sure that it is performing in accordance with our standards. This is known as quality assurance. It may take place on the production devices and/or on the development devices.
There are two main development life cycles.
In the Waterfall life cycle, changes are implemented every period. The developer releases a software version, comes up with a list of new features, spends months or years updating the software, and then releases a new version. Each release contains many new features. This was a bad approach because it took a long time for user feedback to result in a change to the application. By the time a new version was released, users were frustrated and many of the new features were no longer relevant.
In the Agile life cycle, changes are implemented every two weeks. Each iteration works as follows
- At the beginning of the development iteration, the developer comes up with a list of new features that have been demanded by the users
- The new features are ranked by priority
- The development team attempts to complete as many of the new features as possible in the short timeframe (two weeks)
- The software update containing the new features is released
- The next iteration begins. Features that were not completed in the previous iteration return to the new list. Some are no longer relevant. New ideas are added to the list. The features are ranked, and the iteration begins again.
It is easier to implement new security features in an Agile lifecycle because changes can be made quickly. In the Waterfall lifecycle, adding security features requires the developer to return to the planning phase of the project.
Provisioning and Deprovisioning
Provisioning is the process of assigning a user permission to access an object. Deprovisioning is the process of removing that permission.
A process in a software application can be provisioned when it requires a higher level of permission (for example, when the application needs to access a system file). It should be quickly deprovisioned when it no longer requires that level of permission. This way, there is little time for the application to be compromised.
Integrity Measurement & Management
Integrity Measurement guarantees that the software on a machine has not been tampered with. It is important to ensure that malware has not infiltrated a machine.
Integrity Management uses a TPM and hashes of the software or data to determine if changes have taken place. In simple terms, we take a fingerprint of the software when we install it. The fingerprint is stored in a secure location. Each time we run the application, the operating system takes a new fingerprint and compares it to the original. If the fingerprints match, then we know that the software hasn’t changed. If the fingerprints do match, then we know that the software was tampered with.
Each operating system type has its own form of Integrity Measurement.
For example, Linux contains Integrity Measurement Architecture (IMA), a system that supports additional integrity measurement. It is up to each software developer to write code that works with IMA.
IMA has two subsystems: measure, and appraise
Secure Coding Techniques
The Software Development Lifecycle Methodology provides many best practices for secure coding. At the time of writing, the top 25 coding errors are
Many of these errors are easily protected against, and I discussed them in the first chapter.
There are two types of programming languages: Compiled languages, and Runtime languages.
Complied code is code that is written in a language and converted to machine code. This code is difficult to reverse engineer. The languages can include C++, C, and Visual Basic. Complied code can run on any machine regardless of the original language.
Runtime code is code that stays in a code file and is executed by an emulator. The runtime code is executed by the emulator at runtime. Runtime code is not secured. The end user could potentially modify the code and run it again, if it is not server side. The run time code can only be executed by a machine or device that has the emulator installed on it. As we will see, there are ways to potentially hide the source code.
Run time code includes HTML, Java, and JavaScript (run on the client side), and ASP and PHP (run on the server side).
Good coding practices include
- The developer must predict all possible errors, including errors caused by user-generated content and have a method of handling each one. When an application is unable to handle an error, it could crash and may leak sensitive data. The error should get stored in a log file. The user should be provided with general information about the error so that he can correct his input, but critical details such as SQL queries, filenames, variable names, etc., should never be displayed.
- The developer must be able to sanitize all inputs. For example, number field should be a number and text field should be text. The system should sanitize the input on the client side and again on the server side. It is always possible for a malicious end user to bypass the sanitization on the client side.
- Normalization is the process of converting an input into a standard format so that it can be compared and processed. Different users and systems use different text encoding format such as Unicode, UTF, etc.. An encoding is a set of characters available for input. For example, a Chinese keyboard will have a different encoding than an English keyboard. A developer must choose an encoding format that is suitable for his program and then convert all inputs into that format.
- Each common procedure should be stored as a script that can be called when required. In SQL, stored procedures are also known as prepared statements. The prepared statement prevents a user from injecting SQL code into an input that could harm the underlying database. Essentially, the stored procedure is a command, and the program fills in the blanks with the correct variables. But we write the procedure in a way that escapes special characters so that malicious users can’t put bad things inside it (the program will understand that all the blanks don’t contain code).
- The developer should digitally sign the application with a certificate. By default, Windows will not run any unsigned application. Signed code ensures that the application has not been modified after it has been released by the developer.
- There are two types of code – code that is compiled, and code that is interpreted. Code that is compiled is converted to machine code. It includes languages like C++ and C#. Once compiled, it is not possible to revert the machine code back to the original code.
Code that is interpreted runs inside another program (an interpreter). It includes many web languages like HTML, PHP, Java, and JavaScript. The original source code remains, and users who have possession of the application can also read it.
Code obfuscation is a method of making the code difficult for humans to read. It will be difficult, but not impossible for somebody clever to make sense of it after a while. It also makes the code smaller and more efficient.
We can obfuscate the code manually, or we can use a software application to do it. The software application is better because we can always go back to the original code when we want to make changes. - The developer should be careful not to reuse code from unverified sources. We must carefully examine and test any external code that we use in our application. We must also verify that we have the legal right to use the code.
The developer should delete code that is not in use (dead code). This code may be executed by an end user inadvertently, introduce vulnerabilities into the application, or behave unpredictably. It also increases the volume of the application. A complier may automatically remove dead code. - Validating inputs on the client side allows for reduced load on the server and it improves the user experience. We should always validate inputs at the client side first, and then validate it again at the server. The reason we must validate it at the server is because a malicious user can bypass the validation and send invalid data (after all, the software is running on the user’s computer).
Code execution can take place on the client or on the server, depending on the type of code used. Code such as JavaScript can only run client-side, while code such as PHP can only run server-side. The developer should be careful to ensure that critical code runs only on the server. When code runs on the client, then we must have a way of validating the output and ensuring that the software wasn’t tampered with. - It is important to minimize the amount of memory used by a program. As mentioned previously, an application will store variables in the memory, and then call them by their addresses. When the program stops using a variable, it should release it from memory. When a program closes, it should release all the memory locations that it occupied. If not, eventually the memory will be full of occupied spaces for applications that aren’t running.
Some programming languages have an automatic garbage collect feature to release memory locations that are no longer in use, but some languages do not (such as C). When we write a program, we should make sure that we account for each memory location. - Third Party Libraries are convenient. A library is like a giant code toolbox. Each library has a different set of functions, which we can reference from our application. They allow a developer to introduce new functions without having to write additional code.
The developer must take care to ensure that any library referenced is from a trusted source. That means not just that we trust the developer who wrote the library, but also that the exact library file we are using is authentic and not one that has been compromised. It would not be practical (or even possible) to thoroughly test an entire library and ensure it is secure. - We must protect all data that enters and resides inside the application. We must also protect the data when it is stored.
Open Web Application Security Project (OWASP)
Open Web Application Security Project is an online community that shares best practices for creating secure web applications.
The OWASP Security Knowledge Framework is an open-source application that allows developers to understand common security principles. It is a framework for integrating security into a web software application. It includes
- Example projects
- Code examples
- Security checklists
- User management
- A knowledge base
- Interactive labs
Software Diversity
Looking back, even if we have written a lot of great code and followed all the best practices, there is still a risk that a vulnerability can be present.
A computer only understands one language: machine code. Remember that we write our code in a language such as C or C++ or some other language, and then we compile it into a binary. The compiler converts the language into machine code, which is what the users run on their computers.
The hackers will look for vulnerabilities in our binary. Once they find one, they can exploit every computer with the binary.
Compiler diversity is an idea where we use the compiler to make slight changes to each binary. The source code stays the same, and all the functions of each binary stay the same. But each binary is slightly different. So, your version of Microsoft Word looks and feels and works the same as that of your neighbor’s but inside a few variables might have different names or something. The point is, a vulnerability in one binary can’t be exploited in another slightly different binary. And if that’s the case, a hacker trying to exploit many computers would need to try and discover a vulnerability in the software on each of them. It wouldn’t be worth his time.
Software diversity is a concept that is not well implemented yet because it is time consuming. A developer must manually compile the program for each user, and then digitally sign it.
Code Quality and Testing
Code Testing can use the following techniques
- Static Code Analysis. Static analysis reviews the actual source code without running the program.
- Code can be reviewed by a human or through an automated tool such as a static code analyser
- Tools can look for errors in syntax, logic, use of unapproved libraries, etc.
- Code can be reviewed by a human or through an automated tool such as a static code analyser
- Dynamic Analysis. Dynamic analysis reviews the program as it is running.
- The program is fed a set of pre-defined test inputs and the outputs (or resulting errors) are verified
- Thousands or hundreds of thousands of inputs can be entered using automated tools. This is known as fuzzing. Everything can be fuzzed including network protocols, files, and web protocols. Fuzzing allows for a wide range of errors to be detected.
- The program is fed a set of pre-defined test inputs and the outputs (or resulting errors) are verified
- Stress Testing. A Stress Test is where the system is overloaded to examine its response under pressure. A developer will want to examine how the program responds when it is overloaded and background issues such as buffer overflows occur. The developer will also see how much the program slows down.
- Sandboxing. A sandbox is an environment that isolates untrusted code from an external environment. When testing a new application, a developer must be careful not to introduce it into an environment that is in use. The application could open security vulnerabilities or damage existing systems.
- Model Verification. The ‘model’ is a concept that the software is supposed to look and act like. Model verification is a method for verifying that the software we wrote looks like the model.
Secure Baseline
The Secure Baseline defines how a software application will perform including
- Firewall rules/ports that must allow application traffic
- Patches for the operating system and other applications
- Operating system settings
The Secure Baseline can be created through an analysis tool such as the Microsoft Baseline Security Analyzer. The baseline can be adjusted from time to time (as new security vulnerabilities are discovered, or adjustments are required)
Master Image
The master image is a premade image of an operating system and/or software configuration. It is typically a full disk image of a standard issue computer, server, or network device used by the organization. Each time an organization deploys a computer, it loads a copy of the master image onto the computer.
The master image can be updated from time to time. There can be multiple master images (for different types of devices and scenarios). The master image must be carefully protected. The master image can be deployed over a network boot protocol (PXE) or via a bootable USB drive.
The master image saves an administrator from having to manually install software applications and configurations on each computer that he deploys. It also ensures that critical security protocols are always followed, since they do not have to be implemented manually.
In the event of a software corruption or viral infection, the master image can be used to reload the operating system and base software on a computer.
Automation/Scripting
Automation can reduce risks due to human error. We can create a custom, pre-defined response for each common scenario. Some benefits of automation
- Saves the administrator time in not having to manually act on each situation
- Reduces the risk of typos in common configurations, which are now implemented through automated scripts
- Each time a script is run, its use can be logged and audited. A query can determine how many systems were affected by a script.
Some Automation Frameworks
- NIST Security Content Automation Framework
- Common Vulnerabilities and Exposures
The organization implements
- Automated Courses of Action. NIST Special Publication 800-53 provides a set of standards for automating courses of action, suitable for US Federal Government computer systems.
Examples of things that we can automate
-Installation of a software application
-Configuration of a network device
-Creation of a new user account - Continuous Monitoring. A system must be continually monitored in real time, so that the automated response can react immediately to any new scenario.
Without automation, monitoring either happens by a human at a control panel, or doesn’t happen at all. The administrator might only check on the status of the system once something fails.
Status data from each device must be aggregated into a database and unusual or out-of-bound scenarios can be highlighted, so that an administrator does not have to sort through thousands or hundreds of thousands of entries. - Configuration Validation. Configuration Validation is a method for determining whether a system is operating exactly the way that it is supposed to. A system must do exactly what it was designed to do and must not do anything that it was not designed to do. Systems change, software is updated automatically, applications crash, and users make modifications. After each change, the administrator must again verify that the system continues to operate exactly as it is supposed to.
A monitoring system can continually check each device to ensure that it continues to maintain the correct validation and automatically make corrections as required. - Continuous Integration. Continuous Integration is the process of combining code from multiple sources into one software application several times per day. Why do we do that? When multiple programmers are working on the changes to the same source code, each one might take it in a different direction. If they wait a long time to merge their changes, their versions might not fit together well. But if they automatically merge their changes several times per day, each developer can see what the other developers are doing and ensure that they continue to write compatible code.
- Continuous Delivery. We take Continuous Integration one step further and deliver our software manually, but often. Maybe even once per day. The benefit of Continuous Delivery is that features requested by users can be quickly inserted into the software, and that user feedback can be obtained quickly and used to further refine the software.
- Continuous Deployment. Continuous Deployment is like Continuous Delivery except that software updates happen automatically. That means that as developers write code, the system automatically pushes it out to end users.
When implementing continuous deployment, we must be careful to ensure that the software we release is secure.
Elasticity & Scalability
Scalability allows the system to increase its resources to accommodate greater demand. Elasticity is the ability of the system to modify the quantity of its resources to accommodate the current demand. Elasticity is more important because it allows us to scale down when less resources are required.
Elasticity can be implemented in most cloud computing services. We can write a script that deploys additional servers when the load on our existing servers is too high and shut down servers when the demand is low. If we use serverless architecture, then elasticity can happen automatically.
Version Control
Each time a new version of an application is released, it should contain a version number. The current stable version is the latest version of the program that can be used in a production environment. An administrator should ensure that all users are updated to the latest stable version, especially when the latest stable version corrects security risks.
There may be additional, later versions that are still in production or that are being tested. These are known as beta versions. They are versions that may be functional but could have bugs, including bugs that could be harmful or introduce security risks.
Each new version should document the changes that have been made. If that version is installed and causes problems, then it can be rolled back to the previous version. When we document changes well and use a continuous delivery model, then each new change is small. That means that switching back to a previous version won’t deprive the users of many features.
A git repository is a tool that can store multiple versions of a software application and allow us to compare changes. A popular implementation of git is github.