మాటల తూటాలు: ఆటో నగర్ సూర్య

నాగచైతన్యతో సమంత:
మన కుటుంబాలలో, ప్రేమ అంటేనే ఒక బూతు.  కుటుంబ గౌరవం, కులం, ఆచారం పేరుతో – మనల్ని మనం హింసించు కోవడమే సంస్కారం, సంప్రదాయం!

Advertisements

మంగళ హారతి పాట: శ్రీ సత్యనారాయణుని సేవకు రారమ్మా

శ్రీ సత్యనారాయణుని సేవకు రారమ్మా, మనసారా స్వామిని కొలిచీ హరతులీరమ్మా || 2 ||

నోచినవారికి నోచిన వరము, చూసిన వారికి చూసిన ఫలము || శ్రీ ||

స్వామిని పూజించే చేతులె చేతులట, ఆ ముర్తిని దర్శించే కనులే కన్నులట || 2 ||

తన కథవింటే ఎవ్వరికైనా జన్మ తరయించునట || శ్రీ ||

ఏ వేళైనా, ఏ శుభమైనా, కొలిచే దైవం, ఈ దైవం, ఆ … || శ్రీ ||

అన్నవరములో వెలసిన దైవం ప్రతి ఇంటికి దైవం….. || శ్రీ ||

అర్చన చేద్దామా మనసు అర్పన చేద్దామా, స్వామికి మదిలోనా కోవెల కడధామా

పధి కాలాలు పసుపు కుంకుమలిమ్మని కోరేమా || శ్రీ ||

మంగళమనరమ్మా జయమంగళమనరమ్మా, కరములు జోడించి శ్రీ నందనమలరంచి,

మంగళమగు శ్రీ సుందరమూర్తికి వందనమనరమ్మా || శ్రీ ||

మాటల తూటాలు

బహుశా నాకు భావ వ్యక్తీకరణ అంటే మక్కువ కావచ్చు, ఆ ఇష్టమే నాకు భాష మీద ప్రేమ (లేక పిచ్చి) పెంచింది. ఆ పిచ్చే నన్ను మాటల మీదకి, మాటల మాధ్యమాల మీదకి ఉసిగొల్పింది.

నన్ను, నా భావాలని, నేను పెరిగిన వాతవరణాన్ని నిలదీసి అడిగిన, కడిగిన మాటలే ఈ మాటల తూటాల సంకలనం

 

మంగళ హారతి పాట: చక్కాని సాంబశివునికి

పల్లవి: చక్కాని సాంబశివునికి జయ మంగళం, మా తండ్రి సోమెశ్వరునికి శుభ మంగళం,
సుందారి పార్వతమ్మకు నిత్య మంగళం ||

చరణం: పారేటి సర్పములనుబట్టీ, పరమేశ్వరుడు ధరియింపగా,
బుస్సుమనుచూ పడగాలెత్తి, మెడలోన హారముగా మెరియగ ||

చరణం: బూది నిండా బూసుకోని, నీటితొ పులిచర్మము గప్పిన,
మేటిగా రుద్రాక్షామాలలు, ప్రీతితో ధరియించిన స్వామికి ||

చరణం: మూడు కండ్లా మహా దేవుడూ, మురహరుడు పరమేశ్వరుడు,
ఆది అంతము లేని శివునికి, అచ్యుతునీ కన్నాతండ్రికి ||

చరణం: ముదము మలరగ జడలలోనా, ముద్దుల గంగను దాచిన వానికి,
ముద్దుల నెద్దు ఎక్కిన వానికి, మూడు జగములు యేలె స్వామికి ||

చరణం: ఉండే ఇళ్ళు, వంటా పాత్రా, కట్టుబట్టా లేని శివునకు,
కప్పెరాతో భిక్షామెత్తి, మూడు జగములు యేలె స్వామికి ||

భజన పాట: శ్రీ పార్వతీ దేవి (కాళహస్తి మహత్యం)

పల్లవి: శ్రీ పార్వతీ దేవి, చేకోవె శైలకుమారి || 2 ||
మా పూజలే తల్లి, గౌరీ శంకరీ, గౌరీ శంకరీ || 2 ||

చరణం: ప్రాపునీవె పాపహారి, పద్మ పత్ర నేత్రి || 2 ||
కాపాడరావమ్మ కాత్యాయిని || 2 ||

చరణం: నిన్ను నమ్మినాను తల్లి, అన్నపూర్ణ దేవి || 2 ||
పాలింపరావమ్మ పరమేశ్వరి || 2 ||

మంగళ హారతి పాట: రత్నఖచితా హస్తము వాడా

పల్లవి: రత్నఖచితా హస్తము వాడా, రత్నహారాలున్నావాడా, పసిడి కొండా వదిలి వచ్చావా,

వైకుంఠ వాసా, రాళ్ళ కొండలో దాగియున్నావా ||

చరణం: కృష్ణ దాటీ, పెన్న దాటీ, కొండగుట్టాలెన్నో దాటీ, ఏడు కొండాలెక్కియున్నావా,
ఓ వేంకటరమణా, మాట వినీ మాయమైతీవా ||

చరణం: అమ్మ నీపై అలగి ఉందో, అమ్మ పైన అలగీ నావో, అమ్మ నీదు చూడ వైతివీ,
ఓ దేవ దేవా, దిక్కు నీవే దీన రక్షకా ||

చరణం: కన్నుమూసీ నిన్ను తలచీ, కరములెత్తీ మ్రొక్కుచుంటీ, కాలదన్నీ కూలదోస్తావా,
ఓ వేంకటరమణా, బిడ్డలంతా ఏమి కావాలీ ||

చరణం: చాటు పరువూ, పురము వీడి, చేరినావు కొండాపైనా,  వెంకటేశ్వర్ల భజన సమాజం,
ఓ వేంకటరమణా, చేరదీసీ మమ్ము బ్రోవవా ||

Distributed File System – Part 1 (Retake)

This blog post series tries to take the user from traditional one-node file system to a distributed file system. This is first post of the series.

File System (of a scale-up world or in a single node):

The basic responsibilities of file system are:

  1. Support user objects and operations (outwardly)
    1. Directories – Create, Delete, Rename
    2. Files – Create, Rename, Read, Write, Expand, Shrink, Truncate, Delete
  2. Manage Disk Space (inwardly)
    1. Structure and organization of data on disk
    2. Allocation and de-allocation of disk space

* There are many more responsibilities that file systems own.  Such as: file permissions, symbolic links, etc.  They are all deliberately excluded for keeping things short (Need I say sweet Smile).

File System metadata typically contains:

  • Which data blocks constitute a file and their order
  • Which directories contains which files and the hierarchy

Let us take an example to get clarity.  In case of, Unix-like OS file systems:

  1. The very first block on the disk is usually Master Boot Record (MBR)
    1. LILO and GRUB are the popular boot loaders
  2. MBR contains the details of Disk Partitions (namely: drives)
  3. Disk Layout: MBR | Partition 1 | Partition 2 | …

image

Figure 1: Disk Layout

At disk level, VFS (virtual file system) operates.  It then mounts the drives (based on the file system to which the drive is setup for).

File System divides the whole drive space into blocks. The block is usually configurable and is part of OS installation configuration. The typical block size is 4 KB (for long time), recently it has moved up to 8 KB. Of these blocks, some blocks are used for file systems own metadata and the rest can be left for use by applications (or data).

image

Figure 2: ext2 File System – Disk Partition (or Drive) Layout

image

Figure 3: ext2 File System – Block Group Layout

  1. Drive Layout: Super Block | Block Group 1 | Block Group 2 | …
    1. SuperBlock: Contains the file system info, and drive info as a whole (like what type of file system it is, how many blocks are in use, how many blocks are free)
  2. Block Group Layout: Group Descriptors | Block Bitmap | iNode Bitmap | iNode Table | Data Blocks
    1. Group Descriptors: number of free and used blocks in this block group, number of free and used iNode count, block number of Block Bitmap, iNode Bitmap
    2. Block Bitmap: Each bit represents that particular block is free/used
    3. iNode Bitmap: Each bit represents that particular iNode is free/used
    4. Data Blocks

  1. Each file system object (directory, file, symbolic link) are represented in a metadata structure named iNode.
  2. Internally all iNodes are addressed by numbers (namely, iNode Number) – starting with 1
  3. iNode structure typically contains:
    1. Block Pointers – 12 Level 0, 1 Level 1, 1 Level 2, 1 Level 3
      1. Level 0 – 12 Pointers to Data Blocks
      2. Level 1 – Pointer to Block of Pointers to Data Blocks
      3. Level 2 – Pointer to Block of Pointers to Blocks of Pointers to Data Blocks
      4. Level 3 – Pointer to Block of Pointers to Blocks of Pointers to Blocks of Pointers to Data Blocks
    2. In case of Directory, Data Blocks contain details of immediate sub-directories and files.  For each item, there is an iNode number
    3. In case of Files, Data Blocks contain actual user data

Since, the first directory to create in the system is root “/”.  It usually comes in the very beginning (I think, iNode Number is 2).

The work flow to open a file “/usr/laxminro/olnrao.txt”

  • Get the iNode for “/”
  • Find the Data Block details from this iNode
  • From “/” Data Block, find the iNode Number of sub-directory “usr”
  • Get the iNode for “usr” (with iNode Number found above)
  • Find the Data Block details from this iNode
  • From “usr” Data Block, find the iNode Number of sub-directory “laxminro”
  • Get the iNode for “laxminro” (with iNode Number found above)
  • Find the Data Block details from this iNode
  • From “laxminro” Data Block, find the iNode Number of file “olnrao.txt”
  • Get the iNode for “olnrao.txt” (with iNode Number found above)
  • Find the Data Block details from this iNode
  • These data blocks contains the actual data (i.e. contents of “olnrao.txt” file)

Now that we have talked at very high level what a typical File System does and how it has implemented, let us talk about file system resources and their limitations (esp. Hard disks), usage patterns, and design choices made accordingly.

  • Hard disks are mechanical devices
    • Disk space was a scarce resource
      • Disk fragmentation was a serious concern
    • Disk bandwidth was and is a scarce resource
    • Seek time is in order of milliseconds
      • Random Reads and Random writes incur seek
    • Small writes and small reads were the norm

Applications tried to make sure they use very less disk space and bandwidth.  So, amount of data read from and written to disk was small.  Note that, each read or write could potentially incur a seek.  As a result, file systems introduced buffering.  Read buffering to serve the next possible read from buffers.  Write buffering to make sure to accumulate enough data before writing to disk.  Buffering also helped order the writes in such a way that seek is in one direction than to and fro.  Database Systems (esp. RDBMS) are one of the heavy users of file systems and if one delves into design choices made in RDBMS, one can easily how much a file system design choice and hardware limitations need to be dealt.  I will take a short detour to get a sense:

  • B+ Trees were the common form of on-disk storage of tables and indexes
    • For the record, heap based tables and hash based indexes do exist
  • Within DB file, space is allocated in terms of extents.  An extent is about 32/64/128 KB.
    • Extent system assumes that disk space is together (ex: on same platter)
    • Each extent is usually exclusive to a B+ Tree so that
    • Read and Write of a B+ Tree data is collocated as extent is on same platter, otherwise one page could be on one platter and another page of B+ Tree could be on different platter
  • A typical transaction adds/updates one/two rows in different tables
    • Ex: An order made by a customer, would result in one Order Table row, few Order Details Table rows, Payments table row, etc.
    • Even with extent model this above typical usage pattern demands that we need to touch multiple B+ Trees, which means small writes in different locations
    • Solution: Write ahead logging – Log files serve multiple purposes
      • Bringing ‘append’ and ‘chunk’ semantics of file system usage
      • Atomic way of committing or aborting a transaction
        • Either Commit Transaction or Abort Transaction
        • Atomicity is still possible with other techniques such as ‘Shadow Paging’ – but they are not successful because of limitations with Hard disks
      • Actual B+ Tree structures on-disk are updated as part of Checkpoint

Pardon me for the detour, coming back to disks.  We discussed mostly about optimizations on ‘write’.  At scale and for reads, the introduction of ‘cache tier’ came into play.

Apart from the heavy user like RDBMS, many applications use file system in different ways and most of the time write buffering and read buffering helped contain the problem.  It is also important to note that general file systems served wide variety of applications, work loads, and hence their design choices were limited.  That may be the reason, there are many file systems that have been designed for the tailor workloads.

The newer persistent memory technologies such as ‘Solid State Drives’, an electronic device, did not make things any better.  Why? you may ask.  While random reads are not a concern, random writes are.  Random small writes incur ‘erase block’ semantics of SSD.  Erase block is costly because it requires charge pump.  There are other problems such as Wear Leveling, etc.  This whole story is famously known as Write amplification.

Interested readers may read the following to get a pulse of hot trends in persistent memory world: Phase-change memory, Memristor,

File Systems continues to be an area of research with newer storage devices coming up.

Hopefully that has given the gist of a File System in one-node world.  That’s all for now, shall come back soon with more on the same topic.  Thanks for reading.

 

Thanks,

Laxmi Narsimha Rao Oruganti (alias: LaxmiNRO, alias: OLNRao)

Distributed and Consistent Hashing – Part 3

Windows Azure Cache (WA Cache) is an distributed in-memory cache. WA Cache provides a simple <Key, Value> based API. Like, Cache.Put (key, value), Cache.Get (key). You can correlate WA Cache API to that of a Hash. This blog post series tries to take the user from traditional one-node Hash to a distributed Hash. This is third post of the series (and assumes you have read first and second posts).

Distributed Systems are gaining momentum because of its promise to economies of scale.  The economy is possible due to the use of ‘commodity’ hardware.  The commodity hardware is cheap, but they pose higher degree of failure when compared to reliable (and costly) hardware.  As a result, distributed systems have been designed to work with different types of failures. 

Hardware Failures: Electrical Power Supply outages (#1),  Power Socket Failures, Network Switch Failures, Network Router Failures, Network Card Failures, Hard Disk Failures, Memory Chip Failures.

Hardware Failures in Windows Azure are discussed in Inside Windows Azure – Build 2011 Talk

Software Failures: Crashes due to bugs

Failures due to misconfiguration : Simple network misconfigurations can put node/rack/datacenter network communication in trouble.

#1 Electrical Power Supply Outages:

While Uninterrupted Power Supply (UPS) helps in case of power failures, it can only help as an alternative power source for short time (~1 hour).  Any long power outage or when UPS is not available, this problem needs to be dealt at upper layers. 

At hardware level: Redundancy is the mantra for many of the problems – redundancy in power supply, redundancy in communication paths, etc. 

At software level: Replication of data, replication of responsibility, and right integration with hardware redundancy.

Solutions must be well integrated across layers to get best results (or sometimes even the desired results).  The importance of integration becomes evident with an example.

Let us say a web server is deployed on two nodes.  So in case one node faults there exists another node to serve the requests.  If both these nodes are connected to same power socket, any fault in power socket would result both the nodes go down at the same time.  This essentially means that in spite of having two web server nodes, we landed in trouble.  Imagine if we have a way to make sure that two nodes are always connected to distinct set of resources, it would be much better model. 

The system of categorizing resources into group/set is thus necessary.  These groups are called Fault Domains (FD).  No two FDs share same power source, network switch, etc.  With these FDs in picture, at software level any redundancy system just have to make sure to place the redundancy across FDs. 

We have discussed redundancy as a solution at software layer to deal with faults.  In case of stateless software programs, just having another node would be sufficient.  Where as in case of stateful software programs, there is much more to be done.  For example, database systems.  Traditionally in scale-up world, RAID systems were used to make sure to protect against bad sectors (checksums), hard disk crashes (multiple copies on different disks), etc.  RAID storage is costly so can’t be the choice for distributed systems.  The other scale-up world technique has been data replication.  Replication is typically the place where FD knowledge is required to place copies of data in different FDs.

The moment replication comes into discussion it is important to call out the terminology:

Primary – The node is responsible to ‘own’ replication and interface with client.  Primary is usually only one.

Secondary – The node is responsible to ‘cooperate’ replication.  There can be multiple secondary nodes. 

Replication is a vast subject, but I will keep it short.  Depending on how Primary and Secondary agree to replication there are multiple methods.

Asynchronous Replication – In this model, changes in primary are committed and client is acknowledged without waiting for the secondary nodes to be updated.  Replication with secondary is either triggered or a background task takes up the responsibility of bringing secondary nodes up-to-date.  In short, replication happens asynchronously.  In this mode, if primary node fails, there could be data loss as secondary nodes are not up-to-date.  If the data is not super critical and ok to lose, this model is apt.

Synchronous Replication – In this model, for every change in primary it is synchronously replicated to secondary. Till secondary responds, changes on primary are not ‘committed’ (and so, client is not acknowledged). In short, replication happens synchronously. If secondary node is down, writes are blocked till the node is brought back up.

The no. of copies to be maintained is referred as ‘Replication Factor’.

If data is important, admin would opt for synchronous replication.  The higher the replication factor, the better fault tolerance.  But keeping client on hold till all secondary copies are updated makes an admin to chose lower replication factor.  It can be argued that admin is forced to think deeply about the replication factor with this model.  There are places where, one might need higher replication factor but not at the risk of increased operation times.  There comes the need for mixture of both synchronous and asynchronous replication models, namely Hybrid Replication.  In this model, one can chose no. of secondary nodes to be in synchronous mode and no. secondary nodes to be in asynchronous mode.  Here again, there are two choices where one can designate ‘fixed set’  of secondary nodes for synchronous replication or live with ‘any N’ secondary nodes to acknowledge.

It is also important to note that, a node can act as both primary and secondary for different replication units.  In case of database systems, each database is a replication unit.  So, a node can act as primary for database 1 and as a secondary for database 2. 

In case of Routing Client, client would typically know the primary to reach out for each replication unit.  Some systems allow clients to read from secondary.  Depending on which secondary nodes a client is allowed to read from, results in different levels of consistency.  Werner Vogels wrote great blog about different consistency models (Blog Post, Wikipedia Link).

Primary and Secondary communicate as part of replication and there are multiple models.

Pipeline model – The replication data flow is like a chain – Primary sends the data to first secondary node, first secondary nodes then sends the data to next secondary and so on so forth.  Windows Azure Storage, Hadoop Distributed File System use this model

Multi-Unicast Model – Primary sends replication data to all ‘synchronous secondary’ nodes separately and waits for acknowledgement from everyone

Multicast Model – Incase of hybrid replication with ‘any N’ acknowledgements – Primary sends replication data to all ‘secondary’ nodes (both synchronous and asynchronous secondary nodes) and wait for any N secondary nodes to acknowledge.  The set of N nodes that acknowledge vary for every data block or chunk or packet.

One major advantage of Windows Azure Cache over MemCacheD is it supports High Availability (the public name for replication).  Windows Azure Cache supports synchronous replication model (and no. of secondary nodes is fixed to 1 – and communication model is multi-unicast) and each partition is a replication unit. Cache is in an in-memory system and so the replication is limited to in-memory replication.   And, that is the catch!.  In any stateful system, a node reboot does not lose the state as the state is persisted on disk (either locally or remotely).  However, in case of in-memory system like Windows Azure Cache – a node reboot results in state loss.  Synchronous replication and node-reboot-leads-to-state-loss made us (Windows Azure Cache) to let clients commit when all secondary nodes are down as they don’t have any data that can be said would go out-of-date by allowing writes.  Windows Azure Cache (as on 1.8 SDK release) does not allow client to read from secondary nodes.

Many a times cache is used for data that need to be processed.  Processing involves code to be run.  It is a well known and established fact that keeping the code and data near makes systems complete tasks fast. 

In case of database systems, stored procedures help bring the code (or business logic) near to data. 

In Windows Azure Cache, we have Append, Prepend, Increment, Decrement API to help process value.  It would have been lovely if we had ‘stored procedure’ model instead of these individual API.  That way, any processing can be pushed to these ‘stored procedures’ and we could have simply shipped Append, Prepend, Increment, Decrement as ‘Microsoft owned’ stored procedures.

This is the last post of the series “Distributed and Consistent Hashing”.  Here are the important lessons/techniques to remember (deliberately explained in generic terms so as to carry forward them to use in other discussions).

– Data Nodes – The cache server nodes that actually store data

– Control Nodes or Metadata Nodes – Partition Manager is one such control node we have discussed that helps manage the data nodes and their partition ownership. 

– Routing Clients – Make client intelligent to talk to data nodes directly without the need to go thru control node

– Read from Secondary – Load balance, allowing different levels of consistency,

– Code and Data Distance – If code and data are near, tasks can be completed faster 

 

That is all for now, will come back with another series on taking a system from scale-up world to scale-out (or distributed) world.

  

Thanks,

Laxmi Narsimha Rao Oruganti (alias: LaxmiNRO, alias: OLNRao)

Integration – [Processor, Memory] Vs. [Visiting Places, People]

Today I want to try out explaining the integration between processor and memory using visiting places and people as the reference point.

Have you ever observed the in and out queues at various visiting places like large Zoo, famous Temple?  Can you reason out why they are the way they are? 

Zoo – Multiple entry gates and multiple exit gates

Temple – One entry gate and one exit gate

Why does not a temple have multiple gates?  Why does not a Zoo have only gate?

Zoo – After the entry gate, the possible ‘views’ are many.  One can go to animals view, another can go to birds view, yet another can go to trees view.  The more the views possible, the better consumption of people into views.  So, allowing more people at a stretch does not hurt but makes the system better.  Less entry gates only increase the queue lengths and results in insufficient Zoo usage.

Temple – The only ‘view’ is holy deity.  There is only one ‘view’.  So, allowing more people is going to make the situation worse.  You know how good the humans are at self disciplineSmile (Of course, there are exceptions). 

What does that observation bring, the in-flow and out-flow must be designed with actual ‘view’ or ‘consumption’ system in mind.  Any superior in-flow (many entry gates) designed without thinking of main consumer (temple) is going to create a mess. Any inferior in-flow (few entry gates) when main system (Zoo) is heavy consumption ready reduces the usage efficiency.

When it comes to computers, processor and memory are designed the same way. 

Processors are designed with ‘words’ pattern than ‘bytes’ pattern.  For example, you hear 16-bit processors, 32-bit processors, 64-bit processors.  So, 16-bit, 32-bit, 64-bit are words.  Processors process a word at a time.  The registers, algorithmic logic unit, accumulator, etc. all are in sync with ‘word’ pattern. 

Let us come to the memory and see a bit more into it. 

Byte addressable memory is a memory technology where every byte can be read/written individually w/o requiring to touch other bytes.  This technology is better for software as multiple types can be supported with ease.  For example: extra small int (1 byte), small int (2 bytes), int (4 bytes), long int (8 bytes), extra long int (16 bytes) can all be supported with just ‘length of the type in memory’ as design point.  No alignment issues, like small int must be on 2-byte address boundary, int must be on 4-byte address boundary, and so on so forth.  Surely, from the software point of view, byte addressable memory is a right technology.  But this memory is a bad choice for processor integration. 

Word addressable memory is a memory technology where one can read/write only word at a time.  This is better for processor integration as processors are design with ‘word’ consumption.  But, it suffers from memory alignment issue being surfaced to software and have to be dealt at these layers.  They also bring challenges like Endianness problem with different ‘word’ patterns (in processors).

From processor-memory integration, ‘word addressing’ wins.  From software-memory integration, ‘byte-addressing’ wins.

Hardware is manufactured in factories (and is hard to change post the fabrication).  Where as, software is more tunable/changeable/adaptable – change one line of code and recompile,  the change is ready in hands (deployment is a separate issue though).  So, the choice is on our face.  That is, choose the right memory for processors and let the problems be solved at upper layers like software, compilers.

So, compilers came up with techniques like padding.  Compilers also support packing to help developers make choice and override compiler inherent padding behaviors.

With all that understanding, let us take an example of simple primitive and reason to understand all these design choices.

Memory Copy:  Copy byte values from one memory location to another memory location

Signature: memcpy(source, sourceOffset, target, targetOffset, count)

It is very common for any program require copying of bytes from one location to another location (network stack is famous example).  In a simplistic code, memory copy primitive should be like (data types, bounds checking, etc. are excluded for brevity):

for (int offset = 0; offset < count; offset++) 
    target[targetOffset + offset] = source[sourceOffset + offset]

As a software programmer w/o knowing underlying design details, this looks like correct and performant code.  Well, software engineers are smart Smile and would love to learn.  We know that SDRAM is the memory technology, and the hardware is ‘word’ based.  That means, even if I were to read byte at address ‘x’ – the underlying hardware is going to fetch ‘word’ at a time into processor.  Processor then extracts the required byte (typically using ALU registers) from that word and passes the byte to the software program. 

What does this mean to the above code?

Assume source, target offsets are aligned on word boundary.  Let us say, word is 64-bit.

When for loop offset = 0, target memory location bytes from sourceOffset + offset to sourceOffset + offset + 8 are read (that is, one word).  Because the software requires (or asks) only first byte, it is extracted and other bytes are thrown away.  Again when for loop offset = 1, same location is read again from RAM, but a different byte (second byte) is extracted and given to software.  So on so forth, till offset = 7.

So, for offset = 0 to offset = 7 – the code is inherently reading the same word from RAM for 8 times.  So, why not fetch only once and use it in a single shot. Well that is what, memcpy primitive code does (a learned programmer’s code).  Here is a modified version:

// Copy as many ‘words’ as possible

for (int offset = 0; offset < (count – 8); offset+= 8)
    *(extra long int *) (target + targetOffset + offset) = *(extra long int *) (source + sourceOffset + offset)

// Copy remaining bytes that do not complete a ‘word’

for (/* continue offset value */; offset < count; offset++)
    target[targetOffset + offset] = source[sourceOffset + offset]

 

Well in reality, the memcpy code is not as simple as above.  Because, target and source offsets might be such a way that they are not word aligned.  If I am not wrong, memcpy could actually have assembly code directly (and some implementation does have assembly code).  After all, it is all about mov (to move word), and add (to increment offset) instructions (I remember my 8086 assembly programming lab sessions!).

This padding and packing also are super-important when one is worried about performance.  Padding helps in having the content/data/variables aligned.  Otherwise, efficient code like above won’t be useful at all and results in performance issues.

That is all for now, thanks for reading.  If you like it, let me know through your comments on blog.  Encouragement is the secret of my energy Smile.

Thanks,

Laxmi Narsimha Rao Oruganti (alias: OLNRao, alias: LaxmiNRO)