Data Types  «Prev  Next»
Lesson 1

Storing numbers and text

In this module we will discuss how a computer stores the numbers and text that your programs work with.
After completing the module you will have the skills and knowledge necessary to:
  1. Explain how numbers are stored in binary and hexadecimal form
  2. Convert numbers between binary and decimal form
  3. Convert numbers between binary and hexadecimal form
  4. Explain how signed integers are stored in two's complement form
  5. Convert signed integers between decimal and two's complement form
  6. Explain how real numbers are stored as floating-point form
  7. Explain how characters are stored using ASCII and Unicode
Ad Computer Science Structured Approach In computer science, bits, bytes, and words are "units of information" that measure the size of data. When do we use bit, bytes, and words in Computer Science?
In computer science, bits, bytes, and words are commonly used units of information, each serving different purposes based on the context:
  1. Bit:
    • A bit is the smallest unit of data in computing and digital communications, representing a binary digit, either 0 or 1. Bits are fundamental in computer processing, as they are the basis of all computer data.
    • Bits are used extensively in areas such as digital signal processing, data transmission, and storage where data needs to be represented efficiently at the most granular level. Encryption and error detection algorithms also often work directly with bits.
  2. Byte:
    • A byte is a unit of digital information that commonly consists of eight bits. A byte can represent a single character, such as a letter, digit, or punctuation mark in text data.
    • Bytes are the standard unit for measuring file sizes and data storage in computers. Operating systems and applications typically refer to file sizes and space in bytes, with larger units like kilobytes (KB), megabytes (MB), gigabytes (GB), and terabytes (TB) being multiples of bytes.
  3. Word:
    • In computer architecture, a word is a group of bytes that a processor is designed to handle as a unit. The size of a word typically depends on the architecture, commonly being 16, 32, or 64 bits.
    • Words are used in the context of processor operations and memory management. Instructions, addressing, and the organization of memory are often word-oriented. This means that a processor retrieves and processes data in word-sized chunks, which impacts performance and efficiency.

Each of these units plays a crucial role in defining how data is processed, stored, and transmitted in computer systems.

How does a Computer store Numbers and Text?

An early convention for representing text was ASCII - the American Standard Code for Information Interchange - which assigned the characters on a standard typewriter to a number that could be stored in 7-bits (i.e. between 0-127). Capital 'A' is 65, capital 'B' is 66, lowercase 'a' is 97, lowercase 'b' is 98, etc. This system works great for storing English text, but it doesn't include the accented characters that are needed in other European languages, and it definitely doesn't include the thousands of characters or symbols found in Chinese, Korean, and many other languages. For that purpose, Unicode was created as a much larger character set in the late 1980s, although it still includes as a very small subset those original 127 characters from ASCII. So a capital 'A' is still essentially represented as the number 97, just as it has been for over 50 years. In a more technical sense, computers don't really store numbers either. They are really just a collection of billions of transistors that are either at a high or low voltage, and we group these transistors into groups of 8, 16, 32, 64, or more bits and think of them as a binary number that is composed of that many bits. This allows numbers that are essentially infinitely large, which is why computers can store any length of text, or strings of colors (pictures and videos), or strings of audio frequency (music), etc.

Data Storage in Computers: Numbers and Text in the Context of the PDP-11 Mini-computer

I am working as a programmer on PDP-11 using PDP-11 assembly language. How does a computer store numbers and text that your programs work with. Computers fundamentally operate on binary data, using a series of electrical switches that can be either on (1) or off (0). The PDP-11 mini-computer, a marvel of its time, adheres to this basic principle. When programming on the PDP-11 using assembly language, it's essential to grasp how this system stores numbers and text.
  1. Binary Representation: At the heart of any computer, including the PDP-11, lies the binary system. All data whether numbers, text, or otherwise is represented in binary form using bits (binary digits). A bit can hold one of two values: 0 or 1.
  2. Data Word and Byte:
    1. Word: The PDP-11 primarily operates on 16-bit words. That means each word consists of 16 individual bits.
    2. Byte: A byte consists of 8 bits. Given the PDP-11's 16-bit word design, a word can be split into two contiguous bytes.
  3. Storing Numbers:
    1. Integers: On the PDP-11, integers are stored using a binary representation. A 16-bit word can represent integers ranging from 0 to 65,535 in unsigned form. For signed integers, the PDP-11 uses Two's Complement notation, allowing representation of numbers from -32,768 to 32,767.
    2. Floating-Point Numbers: While the PDP-11 has provisions to handle floating-point arithmetic, it's more intricate and generally requires specialized instructions or routines.
  4. Storing Text:
    1. ASCII Encoding: Text is typically stored using the ASCII (American Standard Code for Information Interchange) encoding on the PDP-11. Each character is represented by a unique 7-bit binary number. Because a byte on the PDP-11 can store 8 bits, ASCII characters occupy 7 of those bits, often with the 8th bit set to 0.
    2. Strings of text are generally sequences of these ASCII-encoded characters stored in contiguous memory locations.
  5. Addressing Modes:
    1. The PDP-11 boasts a rich set of addressing modes, enabling flexible ways to reference memory locations and registers. When your assembly program operates on numbers or text, understanding these modes is crucial as they determine how the operand's address is computed.
    2. For instance, the "Register" mode uses the contents of a register as the operand, while the "Autoincrement" mode uses the content of a register as a pointer, fetches the operand from that memory address, and then increments the register.
  6. Memory and Registers:
    1. Main Memory: The PDP-11's main memory is where your program's data resides during execution. Depending on the specific PDP-11 model and configuration, the amount of available memory can vary.
    2. Registers: The PDP-11 features eight general-purpose registers (R0 to R7). These registers can hold data, addresses, or both. They play a pivotal role in the execution of assembly programs, offering fast access to data.
In essence, the PDP-11, like all computers, utilizes a binary system to represent and store data. However, its specific architecture – encompassing 16-bit words, versatile addressing modes, and a blend of memory and registers – dictates the nuances of data storage and manipulation when programming in PDP-11 assembly language. Mastery of these fundamentals ensures efficient and effective programming on this iconic mini-computer.
In the next lesson we will examine how a computer stores non-negative integers such as 0, 23, and 318.

SEMrush Software